Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In MLJ interface, classifier makes unordered class predictions for ordered training target #267

Closed
ablaom opened this issue Mar 17, 2024 · 3 comments · Fixed by #268
Closed

Comments

@ablaom
Copy link
Contributor

ablaom commented Mar 17, 2024

using Pkg
Pkg.activate(temp=true)
Pkg.add(["MLJBase", "EvoTrees", "MLJModels", "StatisticalMeasures"])
using MLJBase, EvoTrees, MLJModels, StatisticalMeasures

# define some data with `OrderedFactor` target (ordered `CategoricalVector`):

X = (; x=rand(10))
y = coerce(rand("ab", 10), OrderedFactor)
scitype(y)
# AbstractVector{OrderedFactor{2}}


# but in predictions `OrderedFactor` has become `Multiclass` (unordered factor):

model = (@load EvoTreeClassifier)()
mach = machine(model, X, y) |> fit!
yhat = predict(mach, X)
# UnivariateFinite{Multiclass{2}}(a=>1.0, b=>7.27e-6)


# This leads to warnings like this one:

roc_curve(yhat, y)
# ┌ Warning: Levels not explicitly ordered. Using the order ['a', 'b']. The "positive" level is b.       
# └ @ StatisticalMeasures ~/.julia/packages/StatisticalMeasures/hPDX2/src/roc.jl:28
# ([0.0, 0.25, 1.0], [0.0, 1.0, 1.0], Float32[0.5, 7.2653975f-6])
@jeremiedb
Copy link
Member

Would you have an example of a library implementing that support for OrderedFactor predictions?
Quickly, I think it would require adding the info of the original target level types into the info section of the model. Should not be too cumbersome to add, but I won't have time to look at this immediately.

@ablaom
Copy link
Contributor Author

ablaom commented Mar 22, 2024

Something like this:

import CategoricalDistributions as CD
using CategoricalArrays

y = categorical(collect("abbba"), ordered=true)

# store the following as part of learned parameters

L = CD.classes(y)
# 2-element CategoricalArray{Char,1,UInt32}:
#  'a'
#  'b'

isordered(L)
# true

# at prediction time:

probs = rand(5)
yhat = CD.UnivariateFinite(L, probs, augment=true)
# 5-element UnivariateFiniteVector{OrderedFactor{2}, Char, UInt32, Float64}:
#  UnivariateFinite{OrderedFactor{2}}(a=>0.758, b=>0.242)
#  UnivariateFinite{OrderedFactor{2}}(a=>0.661, b=>0.339)
#  UnivariateFinite{OrderedFactor{2}}(a=>0.993, b=>0.00658)
#  UnivariateFinite{OrderedFactor{2}}(a=>0.748, b=>0.252)
#  UnivariateFinite{OrderedFactor{2}}(a=>0.182, b=>0.818)

jeremiedb added a commit that referenced this issue Mar 31, 2024
jeremiedb added a commit that referenced this issue Mar 31, 2024
@jeremiedb jeremiedb mentioned this issue Mar 31, 2024
jeremiedb added a commit that referenced this issue Mar 31, 2024
@ablaom
Copy link
Contributor Author

ablaom commented Apr 2, 2024

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants