Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-str values in a Rodeo #32

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

CraftSpider
Copy link

Adds a new public trait, Internable, which is by default implemented for str, OsStr, Path, and [u8]. This is a noticeably breaking change, due to the addition of a V generic on most things, as well as regressions for a couple type-inference locations.

I can add in #26 now if desired, as this seems a good time to make that change, if breaking inference anyways.

fixes #23

impl<'a, K, V: ?Sized> iter::FusedIterator for Strings<'a, K, V> {}

/// Trait for types that can be interned within a `Rodeo`
pub trait Internable: Hash + Eq + AsRef<Self> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally public, though should maybe be an unsafe trait, as implementations are expected to round-trip correctly.

}

fn as_bytes(&self) -> &[u8] {
self.to_str()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust intentionally doesn't allow you to get an OsStr as bytes cross-platform

S: BuildHasher + Clone + Default,
{
#[cfg_attr(feature = "inline-more", inline)]
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let deser_map: HashMap<String, K> = HashMap::deserialize(deserializer)?;
let deser_map: HashMap<K, Vec<u8>> = HashMap::deserialize(deserializer)?;
Copy link
Author

@CraftSpider CraftSpider May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped because JSON requires hashmap keys be strings, and that would require escaping/unescaping complexity

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather let the deserializer manage things themselves, there's a lot more formats than json

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, though also, at least with this ordering it's less likely to break a lot of use cases. The tests use JSON, so I figured keeping it working would be considered more important than simply swapping the position here. Would you rather I swapped it back anyways?

@Kixiron
Copy link
Owner

Kixiron commented May 25, 2022

I did some work on this in my internable branch and I've got a few issues:

  • We can't assume that all internable types play nicely as byte slices. I attempted to implement a windows OsStr Internable implementation but was quickly beaten down by the lack of an allocation-less api for recovering an OsStr from anything, OsString::from_wide() requires allocating which totally defeats the purpose of anything we do. OsStr::encode_wide() is less than ideal but it's totally doable, we just have to align our byte slices to 2 instead of 1, but it's pointless when any conversion back from our stored wide strings to an OsStr requires allocation. rust/#95290 could hypothetically help us, but I honestly don't have a lot of hope it's getting implemented any time soon. I'm pretty sure that the issue with OsStr on windows was the reason I initially gave up on generalizing interned types and the situation unfortunately hasn't changed from then.
  • Serialization & deserialization with arbitrary types is really complicated once you give up the assumption that every internable type must be a u8 slice
  • Placing the value type as the second type parameter is hugely breaking and breaks in really confusing ways since currently the second type parameter is the hasher. It's fairly unavoidable that this PR breaks all previous usages, but I don't like how incredibly confusing it is when that happens

In my branch I pretty much just give up on the first two problems, I use OsStr::to_str().unwrap() and only implement Serialize/Deserialize for Rodeo<K, str, S> but I'm not a fan of either "solution"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request - Intern Byte Arrays
2 participants