Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

Open
mfhepp opened this issue Jan 28, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mfhepp
Copy link
Owner

mfhepp commented Jan 28, 2024

The project can be used for multiple purposes, like

  • script deployment
  • script development (without rebuilding the image for each change in the code)
  • proper Python development environment including testing, code linting, type checking, etc.
  • for running Jupyter notebooks in a more isolated way.

The usages differ in many aspects, namely

  • which environment files they require and from where to take those
  • if the Docker image should be shared across multiple usages

Currently, the images created are partially overlapping, which may cause problems in the long run.

Hence, it seems better to separate the three usage scenarios:

  1. Script deployment takes a script`s code and environment file for building and running an isolated Docker container with minimal privileges on the host. This can also be used for building isolated CLI versions of popular Python applications or packages, like copier or Nikola.
  2. Development environment. This would bundle everything needed for typical Python development workflows, including code formatting, linters, etc. The editor would run on the host, the working directory will be mapped to the container. As one is likely to develop multiple projects, each projects should have its one, pinable environment file (basically a version with the dev dependencies and the runtime dependencies). The dev dependencies could be the same for the entire user, the runtime ones will of course differ. Each project will typically have its own Docker image and its run script or alias and be run from its down directory. For some very simple projects, it may be handy to use a standard image with popular dependencies so that experiments and quick tests do not require a 1 GB image.
  3. Notebooks. There are actually two use-cases:
    • One or multiple standard notebook environments (and respective Kernels) to be run from anywhere on the machine for quick experiments and demos (like nbh <envname). The multiple environment can either be built inside the same image or, likely better, be independent ones.
    • A project-specific notebook environment with its own environment specification, e.g. for specific tasks in research projects. In here, the environment file and the startup script will be in that project folder.

One critical issue is that the identification of the proper image is determined by the image tag on that machine, so we must take care that we do not accidentally start the wrong image.

For script development, we can either use the fully-fledged dev environment or keep the current feature of mounting the src directory to inside the container.

So basically we would have the following commands:

# Build the script / project in the current folder using the environment file found therein with no development dependencies
# TODO: Pin versions or build from a pinned version
build

# Build a dev environment from the standard dev packages and the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build dev

# Build the standard notebook image from the standard notebook packages 
# plus the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build notebook

Now, one key issue is to determine the tag of the image at build and run-time.

Several ideas:

  1. Take the basename of the $PWD. But how to spot collisions? (like src in multiple projects)
  2. Get the basename from a file or script insite the pwd (text, yaml, simply the filename like IMAGENAME.py4docker)
  3. Use a local script / alias per each project if the global defaults are to be superseded,
    So both build.sh and run.sh have to check this, otherwise, they might start completely different images depending on from where they are being invoked.
@mfhepp mfhepp added the enhancement New feature or request label Jan 28, 2024
@mfhepp mfhepp self-assigned this Mar 13, 2024
@mfhepp
Copy link
Owner Author

mfhepp commented Apr 16, 2024

It's likely that I will split the project into

  • nbh for notebooks (with a set of system-wide environments and the option to use a local environment file) and
  • py4docker for a generic development environment that uses the environment file from the current working directory (and determines the name of the image from either hashing its content or hashing the realpath of the env.yaml in the PWD or similar), plus a "deployment mode" for the resulting project.

The overlap between both is relatively small, and this will make the project much less complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant