Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Cluster strategy

Alexandre Stein edited this page Aug 3, 2018 · 4 revisions

Goal

Clustering is for availability on reads and writes.

The way I would like to manage cluster is similar to Ceph CRUSH Map where client knows where to read or write data.

All clients and server in this case must have access and knowledge of the cluster to make it works properly. For this, some tools like Raft as mentined by @gedw99 can manage the map of the cluster.

Clustering Strategy

Using a hash function

Hashing the ID of the record returns a value which defines on which server to read or write the data.

The benefit is that almost every servers should have the same probability of getting the values.

Using highwayhash with 64 bits output hash the IDs:

  • ID-1 gives the integer 5383765017661843673 which has a modulo of 3 equal to 2
  • ID-2 gives the integer 12859092828615809664 which has a modulo of 3 equal to 0
  • ID-3 gives the integer 11354722109782349948 which has a modulo of 3 equal to 2

In this example one server will get 2 records one on an other and the last server don't get any. This example is not representative example because the spread is not equal on all servers. But with many records the division of the records should be as good as modulo strategy.

Using modulo

An other option would be to use a the IDs as big integer and to get the modulo out of it.

For example, a cluster with 3 servers would use a modulo of 3.

Using UTF-8 the IDs:

  • ID-1 gives the integer 1229204785 which has a modulo of 3 equal to 1
  • ID-2 gives the integer 1229204786 which has a modulo of 3 equal to 2
  • ID-3 gives the integer 1229204787 which has a modulo of 3 equal to 0

In this example the IDs are sequential and that way the spreading is optimal. Every next ID goes to the next server. If you have 12 records to retrieve all servers will read 4 sequential records from there storage.

Conclusion

Some tests are needed to define which strategy to use and we need to think of the compatibility with other languages in the case of hash. The algorithm used in this example is not present in many languages but it is very fast that why I used this one.

Clone this wiki locally