Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New NMT option - choose ClearML queue #151

Open
johnml1135 opened this issue Dec 11, 2023 · 11 comments
Open

New NMT option - choose ClearML queue #151

johnml1135 opened this issue Dec 11, 2023 · 11 comments

Comments

@johnml1135
Copy link
Collaborator

So we can run non-urgent jobs on the production instance of Serval. This would be a standard option possibly clearml_queue.

@johnml1135 johnml1135 added this to the Serval API 1.2 milestone Dec 11, 2023
@ddaspit
Copy link
Contributor

ddaspit commented Dec 11, 2023

This would also allow us to run jobs on A100 GPUs on QA.

@johnml1135 johnml1135 assigned johnml1135 and unassigned Enkidu93 Dec 12, 2023
@johnml1135 johnml1135 removed this from the Serval API 1.2 milestone Jan 3, 2024
@johnml1135
Copy link
Collaborator Author

@Enkidu93 - we may need to update how getting the queue works. One option is to add another parameter to the endpoint to specify the queue to preserve backwards compatibility. What do you think?

@johnml1135
Copy link
Collaborator Author

johnml1135 commented Jan 17, 2024

@pmachapman - is SF using the /translation/engines/queues endoint right now? If not, then we can change it without needing to deprecate first.

@pmachapman
Copy link
Collaborator

pmachapman commented Jan 17, 2024

@pmachapman - is SF using the /translation/engines/queues endoint right now? If not, then we can change it without needing to deprecate first.

@johnml1135 No, we are not using that endpoint. We get the queueDepth from the current-build endpoint.

@johnml1135 johnml1135 assigned Enkidu93 and unassigned johnml1135 Jan 31, 2024
@Enkidu93
Copy link
Collaborator

Enkidu93 commented Feb 8, 2024

I initially thought this would be very straightforward, but it'll take a bit more effort in order to if we want to keep the queueDepth functionality for non-default queues.

@Enkidu93
Copy link
Collaborator

Just wanted to record a couple ideas: 1) This option will be part of buildOptions and not a new parameter in Serval since Serval is agnostic (and ought to be) of what queues jobs are on. 2) It seems like the simplest way to make queue information available is to extend the queue endpoint to return a data structure outlining the allowed queues and their depths.

@ddaspit
Copy link
Contributor

ddaspit commented Feb 14, 2024

If the requirement is to be able to specify the priority for the build, then that is probably what we should do. We could add a priority property to the build, and let the engine interpret that however it wishes. That way clients can't abuse an engine's queueing system.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Feb 15, 2024

@johnml1135 In this morning's stand-up, we were discussing this issue and thought it would be beneficial if you could record here what the purpose is of being able to configure the queue (since you're the one who opened it). Where is this requirement coming from?

@johnml1135
Copy link
Collaborator Author

The requirement is unrealized right now, but would be if idx or another customer wanted dedicated gpus for their jobs. Another requirement would be to choose a different setup, either one for a quick turn test, or a dual GPU queue. As all of these needs are for the future and not urgent, I am fine pushing this off until it is needed. Also, I do not believe that prioritizing serval jobs is a requirement.

@Enkidu93
Copy link
Collaborator

So @johnml1135 @ddaspit , should I 'finish' this since I already have the logic and just add it as a build option OR should I commit what I have to a branch and revisit it when we have a firmer requirement?

@ddaspit
Copy link
Contributor

ddaspit commented Feb 15, 2024

Before proceeding further, I would like to clarify the requirements for this issue. We should meet to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🔖 Ready
Development

No branches or pull requests

4 participants