-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asset Versioning with partitioned assets does not detect out-of-sync assets after incrementing code_version #22704
Comments
I can confirm that everything behaves as expected in a non-partitioned job. I don't see this limitation anywhere on the docs, however. |
Could be related: #22553 |
out of sync partitions has been a huuuge issue for me. Now showing downstream partitions as being out of sync has been deeply impactful (in a bad way). Not just when incrementing the version but also just updating parent partitions |
Is it just the UI that's broken or is there a fundamental problem in the backend? |
This query returns the correct information indicating that dagster does know which assets are stale for each partition. query AssetsByGroup($groupName: String!) {
assetNodes(group: {
groupName: $groupName,
repositoryName:"__repository__",
repositoryLocationName:"your_pkg.defs"
}) {
id
assetKey {
path
}
staleStatusByPartition(partitions:[
"first",
"second"
])
}
} What I am unsure about is how to launch a materialization for many partitions and have each run only include the un-synced assets for each partition. |
Looks like the GraphQL schema is built with a strong coupling to time-based partitions (which does not align with my system): input PartitionsByAssetSelector {
assetKey: AssetKeyInput!
partitions: PartitionsSelector
}
input PartitionsSelector {
range: PartitionRangeSelector!
} When launching a backfill, you can't specify a list of partitions per asset. You can only specify a range. I am seeing this over-fitting to time-based partitions a lot in Dagster's design. |
Oh actually, it looks like I can use Here's a prototype that materializes stale partitions of assets in a group: https://gist.github.com/sam-goodwin/d8dd76ad58a241cdb14deba9cb53c2bf Note It makes the assumption that the partitioning scheme of each asset in a group is the same (this may not be true for you) |
Just discovered that the following GraphQL query is extremely slow and can't be executed in parallel because it will crash dagster's SQL database: query AssetStaleStatus(
$groupName: String!,
$assetKey: AssetKeyInput!,
$partitionKeys: [String!]!,
$repositoryLocation: String!
) {
assetNodes(group: {
groupName: $groupName,
repositoryName:"__repository__",
repositoryLocationName: $repositoryLocation
}, assetKeys: [$assetKey]) {
id
staleStatusByPartition(partitions: $partitionKeys)
}
} |
Dagster version
dagster, version 1.7.10
What's the issue?
Asset versioning doesn't properly detect when a partitioned asset is out of sync after incrementing its version or an upstream asset's version. It always says "everything is up to date".
If I remove the
partitions_def
, everything works as expected. The problems seems to be isolated to partitioned assets.What did you expect to happen?
Every time I increment
code_version
in thesecond
asset, I expect it to be displayed as un-synced (it is not)I expected clicking "materialize un-sycned" to provide me a list with the
"second"
asset and not the "first" asset, but it tells me all assets are up to date:How to reproduce?
This is the simple repro I am using to test partitions + asset versioning.
Deployment type
None
Deployment details
I am just running
dagster dev
with the following config:Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: