Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not predict with multithread? #6464

Open
Jason0401 opened this issue May 23, 2024 · 5 comments
Open

Can not predict with multithread? #6464

Jason0401 opened this issue May 23, 2024 · 5 comments
Labels

Comments

@Jason0401
Copy link

If parameter tree_learner in my model.txt is serial, can each tree in this model be predicted using multiple threads?
when I test it, I found only one thread with 100% CPU usage, all the otheer thread had zero CPU usage.

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

The tree_learner setting only affects training, not prediction.

You can pass num_threads through parameters for prediction. That tells LightGBM to parallelize prediction over rows in the input, so is used to improve throughput.

If you are predicting on only one row at a time, using multithreading won't improve the prediction speed and you'll only ever see one CPU core active.

@Jason0401
Copy link
Author

Jason0401 commented Jun 17, 2024

Thanks for your answer.
Only one row is predicted at a time in my case, I use LGBM_BoosterPredictForMatSingleRowFastInit and LGBM_BoosterPredictForMatSingleRowFast to predict and I guess it is the fastest method offered.

There is a parameter in LGBM_BoosterPredictForMatSingleRowFastInit:
const int data_type

I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double.
I'm curious why float-based processing isn't supported, I think it's faster.

@jameslamb
Copy link
Collaborator

I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double

Can you share some links or other evidence that makes you think this?

@Jason0401
Copy link
Author

During the prediction process, moving from the root node to a leaf node requires discrete access to a double-type feature array, which will cause cache miss.
Since each element of the float array takes up less space, the CPU cache can be used more efficiently when stored and accessed continuously in memory
I don't have evidence of actual testing at the moment, I'll give it a try when I have time.

@jameslamb
Copy link
Collaborator

Ok, we'd really appreciate specific evidence for the claim you're making (like links to the relevant parts of LightGBM's code). Otherwise, you're asking someone to do investigation that you've already done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants