Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging TOL-10M dataset reproduction process #16

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

thompsonmj
Copy link
Contributor

No description provided.

@egrace479
Copy link
Member

egrace479 commented Jun 27, 2024

@thompsonmj please check the version I updated. My copy is still running on OSC, but it does seem to be working now. I also updated the directions in #14 to match these changes.

@egrace479 egrace479 linked an issue Jun 27, 2024 that may be closed by this pull request
@egrace479 egrace479 added the bug Something isn't working label Jun 27, 2024
Copy link
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set tag in folder name to match the version (release) of the data.

slurm/make-dataset-wds_reproduce.sh Outdated Show resolved Hide resolved
slurm/make-dataset-wds_reproduce.sh Outdated Show resolved Hide resolved
slurm/make-dataset-wds_reproduce.sh Outdated Show resolved Hide resolved
@hlapp
Copy link
Member

hlapp commented Jun 28, 2024

BTW is there a reason we're invoking the python interpreter through its path semi-hardcoded and into the conda environment, as opposed to simply activating the conda environment first (which presumably needs to be done anyway)?

As an aside, conda environments can get quite big in terms of storage they take up, and it's therefore generally not a good idea to place them into one's home directory on a shared HPC. (Home directories on shared HPCs typically have quite a limited storage quota, and group or project directories is normally where a lot more storage is available or at least where it can be added on demand.)

Co-authored-by: Elizabeth Campolongo <[email protected]>
slurm/download_eol.slurm Outdated Show resolved Hide resolved
@thompsonmj thompsonmj changed the title Debugging TOL-10M dataset reproduction process - unfinished Debugging TOL-10M dataset reproduction process Jul 18, 2024
@thompsonmj thompsonmj marked this pull request as ready for review July 18, 2024 15:18
@egrace479
Copy link
Member

@samuelstevens, this should work as described in updated directions in PR #14. If you encounter any issues or something seems off, please make suggestions there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset cannot be created
3 participants