Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nimh 7 take 2 #25

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Nimh 7 take 2 #25

wants to merge 5 commits into from

Conversation

leej3
Copy link
Collaborator

@leej3 leej3 commented Jun 24, 2024

supercedes #20.

Rebased version of previous PR

Loom video

https://www.loom.com/share/2b2dfbd8ef334f8280ed676dbde66b4b?sid=f5e951bb-2362-434c-baae-b02fdee8bd17

How to test

  • Pull the branch
  • Run pip install -e .
  • Open a terminal tab
  • Run the image docker run --rm -p 8070:8070 elifesciences/sciencebeam-parser and keep it running for the rest of the testing period
  • Run osm pdf-xml-json "example_pdf_inputs/test_sample.pdf" {the output file}

Loom Video https://www.loom.com/share/d4d3819fce9c439d9799eee4c4a9cc22?sid=f8fc9515-dbd9-4ebb-89d1-b2e0ba1abffd

How to build the Docker image and run the Docker container

Navigate to the project's root directory and run docker-compose up --build

When the image is built and the containers are running, open another terminal and start the osm container in interactive mode using the command docker-compose run osm bash

You can do file conversions in the container using this command osm pdf-xml-json "path_to_file_name.pdf" output_file_path

Or use the command docker-compose run --rm osm osm pdf-xml-json "path_to_file_name.pdf" output_file_path to convert files in non-interactive mode

README.md Outdated Show resolved Hide resolved
osm_cli/cli/main.py Outdated Show resolved Hide resolved
osm_cli/cli/main.py Outdated Show resolved Hide resolved
osm_cli/cli/main.py Outdated Show resolved Hide resolved
osm_cli/cli/main.py Outdated Show resolved Hide resolved
osm_cli/converters/pdf_converter.py Outdated Show resolved Hide resolved
tests/test_file_processing.py Outdated Show resolved Hide resolved
@@ -16,7 +17,11 @@ def sample_pdf(tmp_path):

def test_pdf_converter(caplog, sample_pdf):
caplog.set_level(logging.INFO)
file_id = "test_file"
output_file = f'docs/examples/sciencebeam_xml_outputs/{file_id}.xml'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other note on output file. Tests/running the code should not write to the source tree, except perhaps the current directory.

@@ -16,7 +17,11 @@ def sample_pdf(tmp_path):

def test_pdf_converter(caplog, sample_pdf):
caplog.set_level(logging.INFO)
file_id = "test_file"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored this test. Please use pytest and its fixtures instead of unit-test. And avoid classes for testing unless absolutely necessary. Setup/teardown should ideally be through fixture.
Please don't remove tests. Add new tests freely but removing tests should be done with careful consideration... it generally indicates that API compatibility has been broken. This is early in the codebase development so the tests are toytests for now. Even still, you removed some useful validation rather than just adding additional validation... I combined the asserts made in both test versions that you wrote.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! We'll start using pytest and its fixtures from now on and avoid using classes for testing unless really needed.

We understand the importance of keeping existing tests for API compatibility. Our goal wasn't to remove valuable tests but to improve validation. We'll bring back the useful validations and merge them with the new assertions to ensure thorough test coverage.

runner = CliRunner()
result = runner.invoke(pdf_xml, [str(sample_pdf), "test_file"])
result = runner.invoke(pdf_xml, [str(sample_pdf), file_id])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept the runner invocation the same.

If you want to test the root command (I named that "osm", then add a separate test for that). As mentioned elsewhere the cli/pdf conversion functionality should ideally be tested independently to ensure separation of concerns.

@gitstart-nimhdsst gitstart-nimhdsst changed the base branch from main to NIMH-7 July 1, 2024 16:52
@gitstart-nimhdsst gitstart-nimhdsst changed the base branch from NIMH-7 to main July 1, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants