Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not recognizing project as a dataset project if add datapackage after initial project create #1196

Open
rufuspollock opened this issue Jun 20, 2024 · 2 comments
Labels

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Jun 20, 2024

Bug description

Warning

🚩 2024-06-20 I've updated the repo names for the broken and working projects so working project is at the nice url. Broken is now ttps://datahub.io/@rufuspollock/jaan-tallinn-donations-broken. I've updated text below.

i've just published https://datahub.io/@rufuspollock/jaan-tallinn-donations-broken from https://github.com/rufuspollock/jaan-tallinn-donations - it should show up as a dataset project given i have a datapackage.yaml but that is not happening. Here's the result.

image

I've now just created the site again new and it works 🎉 https://datahub.io/@rufuspollock/jaan-tallinn-donations

image

Debugging

OK, so i think the source here are the steps by which i created the repo. Steps were something like:

  • Create README with frontmatter - rufuspollock/jaan-tallinn-donations@d78ccfe
  • Create site on DataHub Cloud ❌ did not work
    • ❓ Why not - shouldn't having frontmatter "just work" for being a dataset?
  • Move frontmatter out to datapackage.yaml
  • Try publishing again (auto-publishing in fact) ❌ still not working

So my guess here is that when it got initially published with just README and frontmatter it wasn't "seen" as a dataset project. And then even when the datapackage.yaml were added it didn't change the "type" of the project in the database (which is a 🐛)

Thoughts

  • we an add a nice new unit test for this in our ingest/processing code
  • Would be great to have a simple explicit way to designate the type of a project. Maybe a type frontmatter field in the main README. Can still have the magic inference too.
    • type may be a bit too generic (it gets used for everything). Maybe projectType or datahubType. That said type is simple and memorable!
  • ❓ completely remove the "magic" and require explicit type setting
    • issue with this is that it is not compatible (and backwards compatible) with simple Frictionless datasets
@Daniellappv Daniellappv changed the title Not recognizing project as a dataset project if add datapackage after initial project create 🐛 Not recognizing project as a dataset project if add datapackage after initial project create Jun 24, 2024
@olayway
Copy link
Member

olayway commented Jun 28, 2024

This is happening because there was no changes made to the corresponding README.md and so the sync workflow skipped it and its datapackage file wasn't processed. It will also happen if you make some changes to the datapackage that was previously correctly parsed if you don't make any changes to the README.md file (= datapackage changes won't go live). It's a bug of course.

@rufuspollock
Copy link
Member Author

rufuspollock commented Jun 29, 2024

@olayway great analysis.

IMO it's low priority as quite a rare bug so changing status.

@olayway olayway changed the title 🐛 Not recognizing project as a dataset project if add datapackage after initial project create Not recognizing project as a dataset project if add datapackage after initial project create Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 💤 Someday
Development

No branches or pull requests

2 participants