Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Data safety strategies

Alexandre Stein edited this page Aug 3, 2018 · 1 revision

Strategies

Replication

This is the preferred method for database. It's simple and the records can be restored from one node (for the key it holds). It can also provides higher read limit.

The problem is, if you have a big data set you need the space of your data set as many times you replicated it. For a 3GB data set and replicated 3 times you will need globally 9GB.

Reed-solomon Codes

Reed-solomon Codes is a mathematical method to give safety without coping the entire content. It encodes the content into chunks and you decide has many chunk you want and how many you are ready to lost before the data is not recoverable.

This give the possibility of keeping data safe without actually copy it.

In the case you have a data set of 3GB, the configuration is 3 chunks and only one chunk can be lost. The total volume to save is:

  • size * (nb_chunk / (nb_chunk - nb_can_be_lost))
  • 3*(3/(3-1))
  • 3*(3/2)
  • 3*1.5
  • 4.5GB

I found tow possible library for this:

Conclusion

Probably the best way to manage the issue is to have both. For big binary records reed-solomon seams like a very good choice.

But is definitely does not make sens to save or backup 30 bytes records with reed-solomon. Replication would be much simpler and efficient.

Clone this wiki locally