Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Datahike implements clojure.lang.Counted in linear time #534

Open
phronmophobic opened this issue May 7, 2022 · 3 comments
Open
Labels
bug Something isn't working triage

Comments

@phronmophobic
Copy link
Contributor

phronmophobic commented May 7, 2022

What version of Datahike are you using?

0.5.1504

What version of Java are you using?

openjdk version "17" 2021-09-14 OpenJDK Runtime Environment Temurin-17+35 (build 17+35) OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode)

What operating system are you using?

MacOS 12.3.1

What database EDN configuration are you using?

{:store {:backend :mem}}
and
{:store {:backend :file}}

Describe the bug

The Datahike Db implements clojure.lang.Counted in linear time. However, "A class that implements Counted promises that it is a collection that implements a constant-time count".

I use a generic data inspector that relies on counted? to display data summaries in constant time and the db's implementation of counted makes the inspector unresponsive.

> (counted? mydb)
true
> (time (count mydb))
"Elapsed time: 2927.155375 msecs"
439010

What is the expected behaviour?

I would expect the datahike db to either implement clojure.lang.Counted in constant time or not implement that interface. The count function will still work even if clojure.lang.Counted is unimplemented. count will fall back to calling seq and counting the collection in linear time.

How can the behaviour be reproduced?

It's noticeable with any decently sized db using the following code.

> (counted? mydb)
true
> (time (count mydb))
"Elapsed time: 2927.155375 msecs"
439010
@phronmophobic phronmophobic added bug Something isn't working triage labels May 7, 2022
@jsmassa
Copy link
Collaborator

jsmassa commented May 19, 2022

You are correct, this requirement isn't fulfilled anymore in all cases. Sorry that it messes up your program now. I guess, we really should remove this implementation as long as we are allowing index structure without a Counted implementation.

The count function of a Datahike database directly translates to the count function of the index structure being used. Originally that used to be the persistent sorted set which implements Counted in constant time. I guess, that is why the interface is implemented in the first place. However, for the new index structure, the hitchhiker-tree, an implementation in constant time isn't possible due to its nature of having hanging operations that are only realized when the respective parts are touched again - for example when iterating over its elements. The peculiarities of this structure has caused us some pain as well, so in time we will do an evaluation on if it makes sense to keep it at all and think about different structures we can use.

For now, we are heavily working on enabling the file-backend for use with the persistent-set index and on migration tooling, so that ideally the switch will be possible without much effort.

For in-memory databases you can already use the persistent-set index by adding
{:index :datahike.index/persistent-set} to the database configuration. At this stage, I cannot recommend using it though as it hadn't been touched in a long time in the current release and most of the current tests were not run for this index. As long as you are not using history functions, it should be fine to use, but there are no guarantees. Better wait until the new durability changes are in as this will also include proper testing.

@phronmophobic
Copy link
Contributor Author

Sounds good. I found a workaround for my particular issue.

@jsmassa
Copy link
Collaborator

jsmassa commented May 30, 2022

Wonderful :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
Status: No status
Development

No branches or pull requests

2 participants