Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

mfhepp · 2024-01-28T17:30:26Z

The project can be used for multiple purposes, like

script deployment
script development (without rebuilding the image for each change in the code)
proper Python development environment including testing, code linting, type checking, etc.
for running Jupyter notebooks in a more isolated way.

The usages differ in many aspects, namely

which environment files they require and from where to take those
if the Docker image should be shared across multiple usages

Currently, the images created are partially overlapping, which may cause problems in the long run.

Hence, it seems better to separate the three usage scenarios:

Script deployment takes a script`s code and environment file for building and running an isolated Docker container with minimal privileges on the host. This can also be used for building isolated CLI versions of popular Python applications or packages, like copier or Nikola.
Development environment. This would bundle everything needed for typical Python development workflows, including code formatting, linters, etc. The editor would run on the host, the working directory will be mapped to the container. As one is likely to develop multiple projects, each projects should have its one, pinable environment file (basically a version with the dev dependencies and the runtime dependencies). The dev dependencies could be the same for the entire user, the runtime ones will of course differ. Each project will typically have its own Docker image and its run script or alias and be run from its down directory. For some very simple projects, it may be handy to use a standard image with popular dependencies so that experiments and quick tests do not require a 1 GB image.
Notebooks. There are actually two use-cases:
- One or multiple standard notebook environments (and respective Kernels) to be run from anywhere on the machine for quick experiments and demos (like nbh <envname). The multiple environment can either be built inside the same image or, likely better, be independent ones.
- A project-specific notebook environment with its own environment specification, e.g. for specific tasks in research projects. In here, the environment file and the startup script will be in that project folder.

One critical issue is that the identification of the proper image is determined by the image tag on that machine, so we must take care that we do not accidentally start the wrong image.

For script development, we can either use the fully-fledged dev environment or keep the current feature of mounting the src directory to inside the container.

So basically we would have the following commands:

# Build the script / project in the current folder using the environment file found therein with no development dependencies
# TODO: Pin versions or build from a pinned version
build

# Build a dev environment from the standard dev packages and the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build dev

# Build the standard notebook image from the standard notebook packages 
# plus the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build notebook

Now, one key issue is to determine the tag of the image at build and run-time.

Several ideas:

Take the basename of the $PWD. But how to spot collisions? (like src in multiple projects)
Get the basename from a file or script insite the pwd (text, yaml, simply the filename like IMAGENAME.py4docker)
Use a local script / alias per each project if the global defaults are to be superseded,
So both build.sh and run.sh have to check this, otherwise, they might start completely different images depending on from where they are being invoked.

The text was updated successfully, but these errors were encountered:

mfhepp · 2024-04-16T15:50:05Z

It's likely that I will split the project into

nbh for notebooks (with a set of system-wide environments and the option to use a local environment file) and
py4docker for a generic development environment that uses the environment file from the current working directory (and determines the name of the image from either hashing its content or hashing the realpath of the env.yaml in the PWD or similar), plus a "deployment mode" for the resulting project.

The overlap between both is relatively small, and this will make the project much less complex.

mfhepp added the enhancement New feature or request label Jan 28, 2024

mfhepp self-assigned this Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

mfhepp commented Jan 28, 2024 •

edited

Loading

mfhepp commented Apr 16, 2024 •

edited

Loading

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

Comments

mfhepp commented Jan 28, 2024 • edited Loading

mfhepp commented Apr 16, 2024 • edited Loading

mfhepp commented Jan 28, 2024 •

edited

Loading

mfhepp commented Apr 16, 2024 •

edited

Loading