Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bin collapse #321

Open
AffDk opened this issue Jun 14, 2024 · 3 comments
Open

Bin collapse #321

AffDk opened this issue Jun 14, 2024 · 3 comments

Comments

@AffDk
Copy link

AffDk commented Jun 14, 2024

I have a feature which is very much skewed to missing values (about 90%). When I run the BinningProcess on this feature and my binary target, it collapses the entire range of this feature into one bin versus the missing one. I tried to play with the parameters of OptimalBinning (passing on binning_fit_params) like min_n_bins, max_pvalue, max_n_prebins, max_bin_size and gamma and different metrics of divergence. But nothing seems to be changing this behavior. I understand that this may suggest that binning does not gain any information value for this feature but I thought that I could make the algorithm even seeks for a slight change by playing with the parameters and make it behave differently. Just to examine its behavior, I removed the missing values, then it could do the binning as I expected. Any suggestion?

@guillermo-navas-palencia
Copy link
Owner

Hi @AffDk. Did you try prebin parameters? min_prebin_size, for instance

@AffDk
Copy link
Author

AffDk commented Jun 14, 2024

Thanks. Yes. That helps but increases the computstion time, which I'd say it is expected. For categorical variables, I thought it should use the original categories but to my surprise, bin collapsing happens there too. Should I use the same trick or I can enforce it to use the original categories. Thanks again.

@guillermo-navas-palencia
Copy link
Owner

You can use the same parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants