Even though the probability of a coordinated downtime that results in an unavailable file is negligible, the chance that a significant number of peers in a pool goes offline simultaneously still exists (e.g., a blackout, local disaster, fire).
To ensure the highest standard of durability (99.999999999%) and availability (99.95%), Cubbit employs 2 procedures:
- Heuristics on pool selection: you can check the 5 criteria used for peer selection in the Redundancy section.
- Lazy recovery strategy: we'll dive deeper below.
Lazy recovery strategy
If, after a series of cumulative disconnections, the redundancy of a file (i.e., the number of online shards that exceed ) is smaller than a chosen security threshold (being ), the Coordinator triggers a recovery procedure (called "Lazy recovery strategy"):
- The coordinator identifies the () alternative members of the pool that will replace the offline ones.
- The coordinator instructs a node to retrieve n shards from the damaged pool.
- The chosen node retrieves shards and inverts Reed Solomon to obtain an encrypted chunk (note that it is not necessary to know the AES key used to encrypt the file to invert the redundancy process).
- The node redistributes the recovered shards to the () new members of the original pool.
If you want to dive deeper into how Reed-Solomon error-correcting code works, check out the Redundancy documentation.