Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cost func #3

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Add cost func #3

wants to merge 2 commits into from

Conversation

Gobd
Copy link

@Gobd Gobd commented Apr 19, 2024

So much formatting sry :( Maybe this would work for benchmarks to be more fair, allow user supplied cost func to be more like other caches for benchmark.

Don't merge this since I changed the go.mod, just an example.

@Gobd Gobd marked this pull request as draft April 19, 2024 18:10
@kolinfluence
Copy link
Contributor

kolinfluence commented Apr 19, 2024

@Gobd i've explained the "cost" factor on reddit which i think is unnecessary additional added overhead cost to this lru, if you can do the ttl version as "cost" will be great. Will accept that as a pull then.
https://www.reddit.com/r/golang/comments/1c6rv7y/fastest_zero_allocation_golang_lru_package/

Can you check the reddit thread and maybe explain why this cost factor is necessary?

Having said so i can do a branch that does cost factor though or separate it out for special use case.

@Gobd
Copy link
Author

Gobd commented Apr 19, 2024

I think your reddit account and all posts you made are removed :( I don't think this adds any overhead and starts an options pattern that could be used for other things too, like how many shards or whatever else

@kolinfluence
Copy link
Contributor

kolinfluence commented Apr 19, 2024

@Gobd
here's the reddit post filtered.

i have no idea why they deleted the post, it was my first time posting on reddit for golang stuff.


i'll need to check out what this cost factor is about.
to me, an lru cache means all items are important. you never know which item should have a higher cost than the rest and the "cost/weight" depends on how frequently it is used. so the more frequently used, the higher the rank it should be.

the "cost" factor that i will implement in future will be based on TTL. so the greater the ttl, the more important it is? anyway, TTL will be added in future.

i still cant wrap my head around a "cost" factor on ttl LRU for now. i think this additional overhead of processing for a "cost" adds extra cost to the lru.

This is what may make Accelru stand out and be fair to lru comparison.
cpu to cpu and mem to mem use. that'll be fairer to compare.

when you limit by item capacity, you are not constraining to the device's resources available. but if you do so, that's where this lru will seemed "better", afterall, it's the memory limitation and how it's used as cache.

p.s. : let's use the "lowest memory allocation used by the others (other than phuslu/lru which doesnt have mem alloc)" by each category for testing against this LRU for the total cache. How about this for comparison? Let's see what happens.

one more thing. the lower bound of ~ 1.3MB memory allocation used by otter with 10000 items, accelru key value should be within it. so I can take this as the memory used. just put NewLRUCache(1300000,1). I'm curious about the hit ratio only then. then total memory used for accelru will be confined to 1.3mb for the 10k items. (which honestly, i think it's a bit too much. i would just use 150k mem for a 10k item lru cache of integers. but that's just me.) i can even estimate the hit ratio without actual runs, guess i'm quite a Pareto principle believer.

I'll take the lowest bound of "memory allocation used" by the cached compared (only for the ones u are comparing today) as cache capacity size for each of the sections you have done.

i can live with this. (just the lowest memory allocation used by each of the category will do)


i wont speculate why the post is deleted but if you can, do help post your findings on reddit etc.
You seemed very into lru. When you do have the findings, do mentioned, your benchmark work is great.

instead of deleting the thread, they deleted the whole post and filtered content preventing any mention of the repo.
I thought they would at least just delete only the thread.

@Gobd
Copy link
Author

Gobd commented Apr 19, 2024

Here's what 2 hit ratio benchmarks from there look like with cost of an item fixed at 1 using my branch. You can compare to the results at https://github.com/Gobd/benchmarks for these 2 traces to see the difference when treated more fairly compared to the other caches.

oltp
p8

@kolinfluence
Copy link
Contributor

kolinfluence commented Apr 19, 2024

@Gobd what was the NewLRUCache settings used?

Can you show the chart for this? i only modified this, capacity * 40bytes


func (c *Cloudxaas) Init(capacity int) {
        client := lru.NewLRUCache(int64(capacity*40), 1)
        c.client = client
}

func (c *CloudxaasString) Init(capacity int) {
        client := lru.NewLRUCache(int64(capacity*40), 1)
        c.client = client
}

i have issue with simulator chart generation, what software to install to see the chart for this? See the error message at the bottom

./bench.sh
2024/04/19 18:44:35 Simulation for cache phuslu at capacity 1000000 completed with hit ratio 72.92%
2024/04/19 18:44:35 Simulation for cache otter at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:35 Simulation for cache lru at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:36 Simulation for cache cloudxaas at capacity 1000000 completed with hit ratio 89.70%
2024/04/19 18:44:36 Simulation for cache lru at capacity 1000000 completed with hit ratio 72.95%
2024/04/19 18:44:37 Simulation for cache otter at capacity 1000000 completed with hit ratio 74.79%
2024/04/19 18:44:38 Simulation for cache arc at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:41 Simulation for cache arc at capacity 1000000 completed with hit ratio 76.49%
2024/04/19 18:44:42 Simulation for cache theine at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:44 Simulation for cache theine at capacity 1000000 completed with hit ratio 77.31%
2024/04/19 18:44:50 Simulation for cache phuslu at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:50 Simulation for cache otter at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:51 Simulation for cache cloudxaas at capacity 2000000 completed with hit ratio 89.70%
2024/04/19 18:44:52 Simulation for cache lru at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:55 Simulation for cache arc at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:55 Simulation for cache phuslu at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:57 Simulation for cache theine at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:57 Simulation for cache cloudxaas at capacity 3000000 completed with hit ratio 89.70%
2024/04/19 18:44:57 Simulation for cache ristretto at capacity 1000000 completed with hit ratio 2.15%
2024/04/19 18:44:58 Simulation for cache otter at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:02 Simulation for cache ristretto at capacity 2000000 completed with hit ratio 4.13%
2024/04/19 18:45:05 Simulation for cache lru at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:08 Simulation for cache arc at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:08 Simulation for cache phuslu at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:09 Simulation for cache cloudxaas at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:10 Simulation for cache theine at capacity 4000000 completed with hit ratio 89.70%
2024/04/19 18:45:11 Simulation for cache otter at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:12 Simulation for cache lru at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:17 Simulation for cache theine at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:18 Simulation for cache arc at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:19 Simulation for cache phuslu at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:20 Simulation for cache ristretto at capacity 3000000 completed with hit ratio 5.66%
2024/04/19 18:45:22 Simulation for cache otter at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:23 Simulation for cache cloudxaas at capacity 5000000 completed with hit ratio 89.70%
2024/04/19 18:45:24 Simulation for cache lru at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:28 Simulation for cache arc at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:29 Simulation for cache theine at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:31 Simulation for cache phuslu at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:32 Simulation for cache otter at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:32 Simulation for cache cloudxaas at capacity 6000000 completed with hit ratio 89.70%
2024/04/19 18:45:36 Simulation for cache lru at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:36 Simulation for cache ristretto at capacity 4000000 completed with hit ratio 7.63%
2024/04/19 18:45:38 Simulation for cache theine at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:40 Simulation for cache arc at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:42 Simulation for cache phuslu at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:43 Simulation for cache cloudxaas at capacity 7000000 completed with hit ratio 89.70%
2024/04/19 18:45:43 Simulation for cache otter at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:43 Simulation for cache ristretto at capacity 5000000 completed with hit ratio 9.46%
2024/04/19 18:45:47 Simulation for cache ristretto at capacity 6000000 completed with hit ratio 10.80%
2024/04/19 18:45:48 Simulation for cache lru at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:49 Simulation for cache theine at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:50 Simulation for cache arc at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:50 Simulation for cache ristretto at capacity 7000000 completed with hit ratio 12.84%
2024/04/19 18:45:51 Simulation for cache phuslu at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:51 Simulation for cache cloudxaas at capacity 8000000 completed with hit ratio 89.70%
2024/04/19 18:45:53 Simulation for cache ristretto at capacity 8000000 completed with hit ratio 15.50%
2024/04/19 18:45:53 All simulations are complete
|   CACHE   | 1,000,000 | 2,000,000 | 3,000,000 | 4,000,000 | 5,000,000 | 6,000,000 | 7,000,000 | 8,000,000 |
|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| otter     |     74.79 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
| theine    |     77.31 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
| ristretto |      2.15 |      4.13 |      5.66 |      7.63 |      9.46 |     10.80 |     12.84 |     15.50 |
| lru       |     72.95 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
| arc       |     76.49 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
| phuslu    |     72.92 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
| cloudxaas |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |     89.70 |
2024/04/19 18:45:53 simulate trace: create report: save chart: page load error net::ERR_FILE_NOT_FOUND
exit status 1

@Gobd
Copy link
Author

Gobd commented Apr 19, 2024

lru.NewLRUCache(int64(capacity), 1, lru.WithCostFunc(func(key, value []byte) int64 {
		return 1
	}))

So same as others, limit based on items not memory.

@kolinfluence
Copy link
Contributor

@Gobd what about memory to memory used comparison?
I'll think about how to optimize it in the way you mentioned.
It's not current priority but will work on that in future.

curious, possible to see any issues with same amount of memory used by like others on the hit ratio?
given same memory used without additional modification to original code.

@kolinfluence
Copy link
Contributor

kolinfluence commented Apr 19, 2024

@Gobd i thought about it and your initial capacity is still capped at memory used. e.g. 10000 = 10kb memory size. so the lower hit ratio is expected. you have to increase the initial capacity as mem size too

@kolinfluence
Copy link
Contributor

kolinfluence commented Apr 20, 2024

@Gobd if you really want to test the capacity limit then the way should be
total bytes of each item key, value * items capacity = memory used sized capacity
this will be a closer approximation to the number of items that will fit into the memory.

  1. if all the data of key value is equal in size, the result will be very accurate.

  2. if it's totally uneven with high deviations, measure by total memory used by the program is much better gauge of hit ratio.

pls check my remarks and publish the findings for total memory used is best to justify the effectiveness of cache for golang use.

@kolinfluence
Copy link
Contributor

@Gobd i've updated a faster, more memory efficient cache for your test, you can self define your own hashing algo in this one.

https://github.com/cloudxaas/gocache/tree/main/lrux/bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants