Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: externalize MapieClassifier methods into several non-conformity scores #476

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

thibaultcordier
Copy link
Collaborator

@thibaultcordier thibaultcordier commented Jul 1, 2024

Description

Outsource the MapieClassifier methods (LAC, APS, RAPS, TOPK) by extending the non-conformity score class to the classification task.

Fixes #467

Type of change

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Test that existing code continues to work
  • Test that the proposed solution can enable new non-conformity scores

Checklist

  • I have read the contributing guidelines
  • I have updated the HISTORY.rst and AUTHORS.rst files
  • Linting passes successfully : make lint
  • Typing passes successfully : make type-check
  • Unit tests pass successfully : make tests
  • Coverage is 100% : make coverage
  • Documentation builds successfully : make doc

@thibaultcordier thibaultcordier added enhancement New feature or request developers Proposed by developers. labels Jul 1, 2024
@thibaultcordier thibaultcordier self-assigned this Jul 1, 2024
@thibaultcordier thibaultcordier linked an issue Jul 1, 2024 that may be closed by this pull request
@thibaultcordier thibaultcordier changed the title UPD: externalize MapieClassifier methods into several non-conformity scores ENH: externalize MapieClassifier methods into several non-conformity scores Jul 1, 2024
@thibaultcordier thibaultcordier marked this pull request as ready for review July 3, 2024 16:20
@vincentblot28
Copy link
Collaborator

What a tour de force @thibaultcordier !! Great improvement to the classification API !
General comment about the example: as in the future we prefer to use directly the conformity scores (and the argument "method" might be removed) I think examples should follow the new way of using MapieClassifier

from sklearn.dummy import check_random_state

from mapie.conformity_scores.sets.naive import Naive
from mapie.conformity_scores.sets.utils import get_true_label_cumsum_proba
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add get_true_label_cumsum_proba to the class method ? because it's only used for APS and RAPS. So if it's class method of APS, as RAPS inherits from APS, it will also be a method of RAPS

)

# get the V parameter from Romano+(2020) or Angelopoulos+(2020)
vs = self._compute_vs_parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no vs need in the naive method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vs is needed when include_last_label == "randomized", do we have a condition that prohibits method='naive' and include_last_label='randomized'?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should, acutually naive method should always have include_last_label=True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is not currently verified, is that correct? Should this therefore be the subject of another PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add it in this PR, it will make the code easier to read

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LacombeLouis do you agree ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent is it linked to this issue #456 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not linked

conformity_scores: NDArray,
alpha_np: NDArray,
estimator: EnsembleClassifier,
agg_scores: Optional[str] = "mean",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you includ agg_scores and include_last_label to the kwargs as none of them is used for naive conformity score ?

Copy link
Collaborator

@LacombeLouis LacombeLouis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @thibaultcordier,
Thank you for this PR. Extremely well-thought-out and very detailed. Some small questions here and there. Looking forward to have this new PR included!

mapie/_machine_precision.py Show resolved Hide resolved
mapie/conformity_scores/bounds/residuals.py Show resolved Hide resolved
mapie/conformity_scores/bounds/residuals.py Show resolved Hide resolved
The conformity score is symmetrical.
This is appropriate when the confidence interval is symmetrical and
its range is approximatively the same over the range of predicted values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we point to a source for this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the point of this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we create an issue then?

mapie/conformity_scores/bounds/gamma.py Show resolved Hide resolved
y_pred, y_enc.reshape(-1, 1), axis=1
)
random_state = check_random_state(self.random_state)
random_state = cast(np.random.RandomState, random_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the check_random_state not already do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also manages the default parameters (if None or if integer)

random_state = check_random_state(self.random_state)
random_state = cast(np.random.RandomState, random_state)
u = random_state.uniform(size=len(y_pred)).reshape(-1, 1)
conformity_scores -= u * y_proba_true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather we use numpy to make multiplications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like that before, I didn't touch it during the refactoring

if estimator.cv == "prefit" or agg_scores in ["mean"]:
quantiles_ = compute_quantiles(conformity_scores, alpha_np)
else:
quantiles_ = (n + 1) * (1 - alpha_np)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use compute quantiles?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like that before, I didn't touch it during the refactoring

@@ -1606,28 +1612,16 @@ def test_method_error_in_fit(monkeypatch: Any, method: str) -> None:
mapie_clf.fit(X_toy, y_toy)


@pytest.mark.parametrize("method", WRONG_METHODS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we no longer perform this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test tested, in the case where the user changes the method value between fit and predict, that an error was reported.

But now, we manage non-conformity scores. Due to the logic of the refacto, this test in the code no longer makes sense because we do not modify the non-confnormity score as we want.

  1. We do not do this test for MapieRegressor
  2. We do not do this test if the estimator has changed between the fit and the predict



cs_list = [None, LAC(), APS(), TopK()]
method_list = [None, 'naive', 'aps', 'raps', 'lac', 'top_k']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add wrong methods here? (and a test to go with it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developers Proposed by developers. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extending the NCS class to the classification task
3 participants