-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train / evaluate multiple TF models in parallel #17
Comments
Since I'm not a tensorflow expert, there's probably something wrong with my tensorflow code. If we can't fix that, we might be able to use keras, which hides all of the details about graphs and sessions. Alternatively, we might be able to specify a different parallel backend for |
Using |
As it turns out, the default parallel backend was already using process-based parallelism. So perhaps the issue is that the additional processes fail because they can't allocate GPU memory, since tensorflow allocates the entire GPU by default. So maybe we should actually use a multi-threading backend, and try to make all threads use the same context but different graphs? I don't know. Further investigation required. |
The 3-layer MLP that we use generally does not utilize the entire GPU bandwidth, which means that we might be able to run multiple models on the same GPU in parallel and get some speedup. I'm not sure if this is feasible with Tensorflow and its Graphs / Sessions but I'm guessing that each MLP instance would probably need its own TF Graph and TF Session.
Assuming that all works, in phase 1 we can add parallelism easily with the
n_jobs
parameter ofcross_val_score()
, and for phase 2 we'd probably have to do it ourselves withmultiprocess
.The text was updated successfully, but these errors were encountered: