Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processing in batches is not very efficient when objective function execution time is not constant #10

Open
jschall opened this issue Jan 7, 2018 · 3 comments

Comments

@jschall
Copy link

jschall commented Jan 7, 2018

My objective function execution time is quite variable. This means nodes in the cluster (once there is a cluster) will be idle a lot of the time.

Is there a way to improve this?
It is obviously trivial to fix it for the first pass, but not for the second pass...

@paulknysh
Copy link
Owner

This is an interesting question. Unfortunately at the moment code relies heavily on the concept of parallel map, which means all evaluations within a batch have to start simultaneously (and wait until the longest evaluation in the batch is done). What you're asking is actually not very trivial I believe. But maybe using code as is will work out well for you eventually..

@jschall
Copy link
Author

jschall commented Jan 9, 2018

Do you have a recommendation for initial samples, subsequent samples, and batch size?

@paulknysh
Copy link
Owner

The total number of evaluations depends on the dimensionality of your problem. I remember using the code for a 4D case, the total number of evaluations was on the order of 100-200. For higher dimensions you'll need more evaluations. I can't tell you exact numbers. Generally, the code is designed to find an approximation to optimum with any number of evaluations provided. Once you find a good candidate solution, you can always refine it afterwards by using more appropriate search box for example.

As for initial/subsequent evaluations, I'd say splitting them equally (n equals m) should work good in most cases. The batch size should be as large as possible for more efficient use of parallelism. If let's say you have 20 cores available (means you can run 20 evaluations in parallel), feel free to set batch=20. But not more than 20 of course.

Let me know if you have more questions. Btw, how many parameters do you have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants