Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Pydantic to v2 #31

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open

Upgrade Pydantic to v2 #31

wants to merge 51 commits into from

Conversation

kabirkhan
Copy link

@kabirkhan kabirkhan commented Jun 12, 2023

Overview

Adds support for using Pydantic v1 or v2.

All relevant Pydantic functions are extracted to utils that support Pydantic v1 and v2 style.
CI Tests run with both Pydantic v1 and v2 but type checking focuses on Pydantic v2.

Besides the dependency upgrade there is 1 significant functionality change which fixes a bug where Pydantic models cannot currently be resolved from a registered function. More details in this comment: https://github.com/explosion/confection/pull/31/files#r1248253486

result_dict[k] = v
if isinstance(v, BaseModel) and k not in resolved_object_keys:
result_dict[k] = v.model_dump()
validation.update(result_dict)
Copy link
Author

@kabirkhan kabirkhan Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change might seem a bit weird but it came up because Pydantic now treats Pydantic dataclasses and BaseModels the same way during serialization.

In Pydantic v1, calling model.dict() with an instance of a dataclass would not JSON serialize that dataclass, it would just return that dataclass.

This allowed for our Optimizer README example to work and resolve that dataclass instance.

e.g. with pydantic v1 this worked still:

import dataclasses
from typing import Union, Iterable
import catalogue
from confection import registry, Config

# Create a new registry.
registry.optimizers = catalogue.create("confection", "optimizers", entry_points=False)


# Define a dummy optimizer class.
@dataclasses.dataclass
class MyCoolOptimizer:
    learn_rate: float
    gamma: float


@registry.optimizers.register("my_cool_optimizer.v1")
def make_my_optimizer(learn_rate: Union[float, Iterable[float]], gamma: float):
    return MyCoolOptimizer(learn_rate, gamma)


# Load the config file from disk, resolve it and fetch the instantiated optimizer object.
config = Config().from_disk("./config.cfg")
resolved = registry.resolve(config)
optimizer = resolved["optimizer"]  # MyCoolOptimizer(learn_rate=0.001, gamma=1e-08)

However, when swapped to pydantic v2, the final line here was resolving the optimzer to a dict.

...
optimizer = resolved["optimizer"]  # {"learn_rate": 0.001, "gamma": 1e-08}

This has actually always been a bug, because previously we could not make MyCoolOptimizer into a Pydantic model, it had to be a dataclass (or any other class that wasn't a Pydantic model).

With this code above, we could make MyCoolOptimzer into a Pydantic model and we'd get a Pydantic model back.

@@ -580,32 +580,32 @@ def test_schedule():

cfg = {"@schedules": "test_schedule.v2"}
result = my_registry.resolve({"test": cfg})["test"]
assert isinstance(result, GeneratorType)
assert isinstance(result, Iterator)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the other somewhat controversial change is the removal of checks for the Generator type. Pydantic supports validating Python generators via the Iterator and Iterable types, but not through the GeneratorType. This is also how you're supposed to type stuff in Python so I think this should just be a documentation update but wanted to call it out.

@kabirkhan kabirkhan marked this pull request as ready for review June 30, 2023 20:19
@kabirkhan kabirkhan changed the title [wip] Upgrade Pydantic to v2 Upgrade Pydantic to v2 Jun 30, 2023
@@ -946,6 +1024,8 @@ def _update_from_parsed(
# Check numpy first, just in case. Use stringified type so that numpy dependency can be ditched.
elif str(type(value)) == "<class 'numpy.ndarray'>":
final[key] = value
elif isinstance(value, BaseModel) and isinstance(final[key], BaseModel):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case can occur from the above change to pass through resolved pydantic models

Copy link
Contributor

@adrianeboyd adrianeboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, we have many users asking about it!

Just to start out with some comments about the CI and mypy...

.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
Comment on lines 18 to 48
- os: windows-2019
python_version: "3.6"
- os: ubuntu-20.04
python_version: "3.6"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And please don't just remove python 3.6 from the CI like this!)

.github/workflows/tests.yml Outdated Show resolved Hide resolved
confection/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Kabir, thanks!

@adrianeboyd
Copy link
Contributor

I had this installed in the background for testing and have run into a few problems with pydantic v1.10 (since you can't use v2 with thinc/spacy right now):

$ spacy init config -p morphologizer /tmp/morphologizer.cfg
ℹ Generated config template specific for your use case
- Language: en
- Pipeline: morphologizer
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✘ Config validation error
nlp -> tokenizer	Promise(registry='tokenizers', name='spacy.Tokenizer.v1', args=[], kwargs={}) is not callable
{'lang': 'en', 'pipeline': ['tok2vec', 'morphologizer'], 'batch_size': 1000, 'disabled': [], 'before_creation': None, 'after_creation': None, 'after_pipeline_creation': None, 'tokenizer': {'@tokenizers': 'spacy.Tokenizer.v1'}}

Or with an existing config including a morphologizer with an internal tok2vec:

[components.morphologizer.model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"

[components.morphologizer.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.morphologizer.model.tok2vec.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = true
Config validation error
cfg.model.tok2vec -> embed	instance of Model expected
cfg.model.tok2vec -> encode	instance of Model expected
{'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 256, 'attrs': ['ORTH', 'SHAPE'], 'rows': [5000, 2500], 'include_static_vectors': True}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 256, 'depth': 8, 'window_size': 1, 'maxout_pieces': 3}}

@adrianeboyd
Copy link
Contributor

I don't currently understand what the differences are between the unit tests with catalogue (that pass) and the usage with thinc/spacy and pydantic v1.10 that fail above.

Let me temporarily add a CI test to make sure it's not something on my end.

adrianeboyd and others added 5 commits August 3, 2023 17:01
* use old implementation of modelfield copy

* ignore type error

* Update __init__.py

---------

Co-authored-by: Adriane Boyd <[email protected]>
@adrianeboyd
Copy link
Contributor

I'm trying to think if there's a better way to define/manage the PYDANTIC_V2 setting so that it doesn't lead to so many typing errors...

Comment on lines 121 to 125

- name: Test with spacy
run: |
python -m pip install spacy
python -m spacy init config -p tagger tagger.cfg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment that this is only intended to be a temporary addition to the CI. confection needs a unit test that catches this issue.

@svlandeg svlandeg added the enhancement New feature or request label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants