Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydra exploration #703

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Hydra exploration #703

wants to merge 25 commits into from

Conversation

ernestum
Copy link
Collaborator

@ernestum ernestum commented Apr 26, 2023

Right now we use Sacred to run experiments/algorithms.
This PR is about exploring whether Hydra would be a good option for running experiments and constructing/configuring the CLI interface of imitation.
I implemented a CLI for AIRL using the Hydra framework to test the waters and get to know better Hydra.

Summary

Overall, Hydra seems to be very powerful but also very complex.

At first, I was hesitant due to the complexity, but after spending some time with the framework, I am convinced that
it would be not just a good replacement for Sacred, it would also allow to reduce the complexity of the core library.

Hydra grew organically.
At first there was OmegaConf and later Hydra was added as another layer of abstraction.
There is also hydra-zen to add even more abstraction layers on
top which I decided not to explore right now since that seemed like to many layers of abstraction and complexity at a time.

What was hard

I needed a couple of iterations to arrive at the current implementation of the AIRL command line interface because
there are so many degrees of freedom, and it was unclear what best practices were.
Implementing it "the obvious way" after just skimming the hydra documentation certainly failed.

There are a couple of concepts that need to be understood, and I often had trouble not mixing them up:

  • There are config groups and config packages which sound similar, but they are completely different concepts (I used both).
  • There are default lists that seem to apply to config groups (used in imitation_cli.airl).
  • There is config interpolation, which only applies to configs but not to config groups (I heavily use this in imitation_cli.airl).
  • There is the ability to override packages of config groups to apply them to different regions of the config (there is an example for that in config/airl_run.yaml).

The Benefits

I think the benefits that Hydra could bring to us are

  • a clear separation of concerns
  • the ability to easily launch grid sweeps and HP optimization runs
  • simplification of the core library utils (no more registries, no more parallel.py)
  • shell completion of algorithm configs!

Separation of Concerns

Hydra allows to cleanly separate:

  1. the algorithm implementations (the code in imitation.algorithms)
  2. constructing algorithm instances and utility objects from configuration (the code in imitation_cli.utils and imitation_cli.algorithm_configurations)
  3. specifying how the various parts of the configuration come together to form the overall configuration of a script (the upper half of imitation_cli.airl)
  4. the code to execute the entire script (the lower half of imitation_cli.airl)

For example, in the train_adversarial Sacred script, a lot of LOC are used for setting up the trainer algorithm:
In the main run function, the reward network, trajectories, RL algo and environment are constructed,
config flags are checked to decide where to get the RL algo from,
and the configuration is modified before being passed down to constructors.
This makes it twice the length of the equivalent script written with Hydra and distracts from the actual execution of the algorithm.

An example of separation of (2) and (3) is the following:
There are configurations for various kinds of reward networks (basic, shaped, ensemble) and for various kinds of RL algorithms (PPO, SAC etc.).
They all need some kind of gym environment to derive the observation and action spaces from.
When used together in e.g. AIRL, we want to make sure they use the same environment so all the pieces fit together.
With Hydra, I could specify the reward networks in one file, the RL algorithms in another file without them knowing about each other.
Ensuring that they both use the same environment in the context of executing AIRL only takes two lines.
It neither bleeds into the actual execution code nor into the code that defines how to configure and construct reward networks and RL algorithms.

Furthermore, everything we do with the registries can be solved more elegantly with the hydra config store.
Right now this concerns the creation of reward networks and loading of policies from various sources.
This would make the core library a lot less bloated with utility code.

Launching Grid Sweeps and HP Optimization

Hydra would allow to easily launch grid searches and Optuna sweeps based on the existing configuration.
See airl_optuna.yaml and airl_sweep_env_and_rewardnet.yaml in the config directory for examples.
It allows for sequential execution but also parallel execution with a set of plugins
(see, for example, the ray launcher plugin and the optuna sweeper plugin)
Our own hand-crafted parallel.py script will not be needed any more!

Possible Objections

  • Second major dependency to projects from the same org (Meta).
  • Hydra is not simple (but config management is not simple either).
  • I liked the analogies better, that Sacred used (experiment, ingredient etc.). To me, they were more "physical".
  • Change (but we wanted to refactor the scripts anyway).

Implementation Details

The code to construct algorithm instances and utility objects is found in imitation_cli.utils and imitation_cli.algorithm_configurations.
The CLI script for AIRL is in imitation_cli.airl.

Due to the lack of best practices, I made up my own:

Everything that needs to be created goes into its own utils package.
This is similar to the concept of ingredients in Sacred.
Examples for things to be created are, environments, optimizers, policies, reward networks, RL algorithms, schedules ...
A policy needs an environment and a RL algo needs an optimizer.
This introduces dependencies between the utils packages.
The configuration for our algorithm objects goes to the algorithm_configurations package to distinguish them from the utils.

Each of those packages defines (1) the configuration and (2) how to construct an object from a configuration.
Configurations are defined using Structured Configs
which are basically dataclasses that specify the configuration values and their types.
This is similar config() functions in Sacred though I like that it does not rely on inspecting the locals of a function.

Each of the configuration dataclasses has a static make() method which creates a new object given the configuration.
Here the Hydra instantiate function is really useful.
It allows instantiating a configuration object by calling said make function.
When there are nested configuration objects, it instantiates them recursively for us!
This means in the main script we just call instantiate(cfg.airl) to create the AIRL algorithm object with all the
needed ingredients.

This ties to configuration close to the code for instantiating objects from said configuration without luring us to mix
instantiation with execution.
Separation of concerns yay!

Each of those packages contains a Config class from which configuration variants inherit.
For example, to configure reward networks, there is the imitation_cli.utils.reward_network.Config which only defines an environment
(all reward networks need an environment).
For BasicRewardNet, BasicShapedRewardNet andRewardEnsemble we have the following class hierarchy
inside the imitation_cli.utils.reward_network package:

flowchart
    subgraph imitation_cli.utils.reward_network
        direction BT
        BasicRewardNet --> Config
        BasicShapedRewardNet --> BasicRewardNet
        RewardEnsemble --> Config
        RewardEnsemble -.-> BasicRewardNet
    end
Loading

(The reward ensemble uses composition to create multiple basic reward networks.)

In the AIRL configuration dataclass we have a reward_network field type-annotated with reward_network.Config
which allows us to use any kind of reward network with AIRL.

Some of the util packages have a register_configs() function.
Structured configurations must be explicitly stored in hydras ConfigStore.
They are stored within a group under a certain name.
For example, the environment configurations for CartPole, Pendulum, etc. are stored under the "environment" group.
Some configurations should be stored multiple times in various subgroups such as the schedule configuration
which we separately want to configure for the clip_range and the learning_range of PPO.
To do this in a uniform way some packages have a register_configs(group) function that stores the structured
configurations under the given group.
Some of those also take a default_environment or default_rng.
This is used to inject "interpolations" that ensure that by default everywhere the same environment is used
(though this can be changed if for some reason e.g. the expert's environment should be different from the imitator's environment).

In some cases, the configuration value type is a class and not an object.
For example, the ActorCriticPolicy wants an optimizer class, and a dictionary of optimizer kwargs.
This is a bit unfortunate because it does not allow to type-check the optimizer arguments.
In those cases I introduced an enum to map names to classes and wrap it with a simple configuration that extracts the enum value from the enum.

Some commands to try

Print all configuration groups and the config:

python src/imitation_cli/airl.py -h

Run with a specific environment:

python src/imitation_cli/airl.py environment.env_name="MountainCar-v0"

Inject a different environment to the AIRL gen_algo
All parts of the script run with MountainCar-v0, only the gen_algo environment is replaced by a CartPole-v0 configuration.
Note: this will obviously crash since the action/observation spaces don't match. This is only to demonstrate the flexibility of the framework.

python src/imitation_cli/airl.py environment.env_name="MountainCar-v0" [email protected]_algo.environment=gym_env  airl.gen_algo.environment.env_name="CartPole-v0"

Note: the [email protected]_algo.environment=gym_env is needed to ensure that a new environment config is placed at airl.gen_algo.environment in the configuration tree. After that we can the the env_name of this new config. Without [email protected]_algo.environment=gym_env we would have changed the env_name globally, since by default airl.gen_algo.environment is linked to the top-level environment configuration of the RunConfig

Do a quick run with fast evaluation, no checkpoints and limited number of total timesteps:

python src/imitation_cli/airl.py total_timesteps=20000 checkpoint_interval=-1 evaluation=fast_evaluation

TODO

  • Make sure to update the CLI documentation for hydra

@codecov
Copy link

codecov bot commented May 10, 2023

Codecov Report

Merging #703 (6760104) into master (bf99117) will decrease coverage by 0.01%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master     #703      +/-   ##
==========================================
- Coverage   96.42%   96.41%   -0.01%     
==========================================
  Files          93       93              
  Lines        8781     8785       +4     
==========================================
+ Hits         8467     8470       +3     
- Misses        314      315       +1     
Impacted Files Coverage Δ
src/imitation/algorithms/adversarial/common.py 96.44% <75.00%> (-0.39%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ernestum ernestum added this to the Release v2.x milestone May 24, 2023
…most of the interpolation magic into the main script and rename some util modules to make clear what they do.
…n.yaml and assemble the airl run config when registering in the config store.
…ve the global demonstrations field and rename the global venv field to environment.
…pendency was pulled out of the utils and made explicit in airl.py
@ernestum ernestum marked this pull request as ready for review June 1, 2023 13:21
@ernestum ernestum mentioned this pull request Jul 14, 2023
Copy link
Member

@qxcv qxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some low-level feedback. One last piece of low-level feedback: imitation_cli.utils is a non-intuitive place to put configs (shouldn't it be imitation_cli.configs or something?).

Higher-level feedback:

  • I'm a fan of the make() stuff! Seems like creating new objects from configs is super easy.
  • Looking at the PR description, it seems like the two biggest advantages of Hydra over Sacred seem to be that Hydra has in-built functionality that is equivalent to the current functionality of registries and parallel sweeps, meaning that the custom code for those can be deleted. Is this a reasonable characterization?
  • If so, I'm not sure how worthwhile it is to refactor all the remaining Sacred code—it depends on how long it takes, and how many users the CLI has. Judging by this PR, it seems like it takes quite a while to refactor components, although I expect you've already paid much of the upfront cost of learning Hydra and coming up with conventions. My suspicion is that we don't have many CLI users, but admittedly this is based on my own experience with ML library CLIs (I don't use them) and not talking to actual users.

Thoughts on the above?

@@ -209,6 +209,7 @@ def get_local_version(version: "ScmVersion", time_format="%Y%m%d") -> str:
"tensorboard>=1.14",
"huggingface_sb3>=2.2.1",
"datasets>=2.8.0",
"hydra-core>=1.3.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explicit dep on omegaconf (which is used directly in subsequent files).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add it because Hydra is heavily based on Omegaconf (it grew out of that project, see more details here), but it can't hurt to add it explicitly just in case Hydra decides to change the backend.



@dataclasses.dataclass
class Config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a bit confusing—won't users of this class always have to import it as AIRLConfig or something that disambiguates between the other configs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the convention to not directly import classes, so in a different file we would write:

from imitation_cli.algorithm_configurations import airl
...
my_conf = airl.Config()

"""Base class for activation function configs."""

activation_function_class: ActivationFunctionClass
_target_: str = "imitation_cli.utils.activation_function_class.Config.make"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to simplify this sort of thing so that it doesn't have to be repeated in every class?
(e.g. inheriting from a base class with an appropriate target property)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There probably is a super smart™ solution that is entirely incomprehensible to the casual user. This is one of the cases where I decided to accept some level of repetition for the sake of simpler understanding.

def register_configs(group: str):
cs = ConfigStore.instance()
for cls in ActivationFunctionClass:
cs.store(group=group, name=cls.name.lower(), node=Config(cls))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Programmatically generating names like this will make it fractionally harder to grep for where things are defined (e.g. if I do a case sensitive search for "tanh" then this file won't come up).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a really good point! In general configuring classes seems to require a lot of boilerplate code the way I do it here so I will try to find a smarter solution. Until I found one, what do you think about just removing .lower()?

@ernestum
Copy link
Collaborator Author

ernestum commented Aug 9, 2023

Thanks for the feedback @qxcv!

One last piece of low-level feedback: imitation_cli.utils is a non-intuitive place to put configs (shouldn't it be imitation_cli.configs or something?).

You are right, utils seems overly generic here. configs is closer to the truth and more specific but it clashes with the config folger right next to it. Here I really liked the analogies to real experiments, that sacred established, with terms like experiment and ingredient. What do you think about the name ingredient_configs? Or would this be confusing for past users of sacred?

  • Looking at the PR description, it seems like the two biggest advantages of Hydra over Sacred seem to be that Hydra has in-built functionality that is equivalent to the current functionality of registries and parallel sweeps, meaning that the custom code for those can be deleted. Is this a reasonable characterization?

One major advantage of hydra is (to me) that the utils/ingredients can be duplicated in various places of the configuration tree. E.g. consider the case where you want to figure out how well an agent transfers to a new environment. Then you want to configure a different environment for the training and the evaluation part of the code. In Hydra you can do this right from the command line without changing the script (see the "Inject a different environment to the AIRL gen_algo" in the PR description). In Sacred you would have to copy past the code for the "environment ingredient" to an "evaluation environment ingredient". The same thing applies to a "activation function" ingredient or an "optimizer" ingredient.

  • If so, I'm not sure how worthwhile it is to refactor all the remaining Sacred code—it depends on how long it takes, and how many users the CLI has. Judging by this PR, it seems like it takes quite a while to refactor components, although I expect you've already paid much of the upfront cost of learning Hydra and coming up with conventions. My suspicion is that we don't have many CLI users, but admittedly this is based on my own experience with ML library CLIs (I don't use them) and not talking to actual users.

We ourselves are heavy users of it with the benchmarks and the way we implemented those benchmarks proves, that our CLI encourages bad practices. Just to illustrate the way we execute benchmark code:

  1. There is a commands.py script to generate a number of bash commands
  2. Each of those bash commands starts one instance of our parallel.py script with different parameters
  3. Each of those parallel.py processes starts a number some of the experiment scripts (the actual CLI) in parallel using Ray

So for some reason we decided to write two layers of abstraction around the CLI to execute benchmarks. At that point of complexity we might as well have written a custom python script that uses the core API in combination with Ray.

Now there is the promise, that Hydra will solve this for us (because it has built-in parallelization using Ray). But maybe if we let go of the CLI we can use those resources to better maintain the core library instead. We already decided to exclude the CLI from semnatic versioning anyways. This is a major design decision so I would like to hear @AdamGleave s opinion as well and we should probably have a meeting about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants