Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR - content hashing caching strategy for business rules. #650

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stuaxo
Copy link
Contributor

@stuaxo stuaxo commented Aug 11, 2022

When a workbasket is published it may clash with any of the unpublished workbaskets - however there are a class of business rules whose results should still be valid if the content of the model they reference hasn't changed.

This ADR proposes a content checksum mechanism for models, that checksum can be saved and used to check if a model has changed.

The content checksum only covers user editable fields.

@stuaxo stuaxo force-pushed the adr-trackedmodel-content-checksums-for-business-rules branch from 7fc822e to 9a5d21b Compare August 11, 2022 11:10

After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.

This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain whether there are fields that can be mutated but are not user editable. If there is a difference, and editable fields are a subset of mutable fields, then would mutable be a preferable term here?
Tiny typo "... check if to versions".

Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached == cached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saved to it's own field I guess.

The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence.


TrackedModel will gain an API named get_content_hash() that returns a sha256 hash.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it only matters that it's a sufficiently unique hash.


This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.

When a workbasket is published, the validity of all other unpublished workbaskets must be verified - with this system, most of the business rule checks would remain valid and the only the checksums need to be verified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... only the checksums need to be verified.

May be clearer to say "... only the content hash needs to be verified."?

Context
-------

After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about what can invalidate a check after it has completed. We know that after a check has been run, then newly published tariff data can cause some types of check to become invalid - an ME32 check against a Measure, for instance. So hashing doesn't help avoid the need to run a check again in those circumstances.
It would be worth pointing out that caveat here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is really important - I've updated the PR description to cover this, and will update the ADR to reflect this (date validity checkers will be their own adr)

@stuaxo stuaxo changed the title ADR on adding content checksums to TrackedModels. ADR - content hashing caching strategy for business rules. Aug 12, 2022
Copy link
Contributor

@gabelton gabelton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool stuff! I would love an example involving a current business rule, if at all possible, just so that I can visualise it / walk it through in my head, but looks great apart from that. Please excuse the pedantic spelling comments...


TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.

Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to give an example where two different types produce the same hash? It's probably trivial, but I have Monday brain

The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence.


TrackedModel will gain an API named get_content_hash() that returns a sha256 hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to be more specfic than "API"? Would it be a class_method? As I read it currently, it sounds a bit like the hash will be attached to the tracked model, which contradicts what's written on L28. I know that's not the intent, but just be good to clarify

Copy link
Contributor Author

@stuaxo stuaxo Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I guess I meant a method - e.g. .content_hash(), vs something we save forever in a field.

Consequences
------------

This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will part of this work involve creating a list of rules to which this kind of caching will be applicable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it doesn't need to be exhaustive - as it can be extended if needed later.


After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.

This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> mechanism
--> equivalent


This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.

In the simplest sense this means hashing fields that exclude the PK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's meant by "fields that exclude the PK" ?


TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.

Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> necessary


TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.

Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth adding a link to docs on python's default hashing?

Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> though

Consequences
------------

This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> becoming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants