Today is a good day, the 15 minutes load average of the main Indefero virtual machine is now around 0.35, down from 2.1 yesterday at the same time and your repositories are safer than before as with a real time backup.
Real time is maybe a bit misleading, so here are the details of what is going on with your repository when you commit with Subversion or push with Git:
- As usual when you push/commit, the rights are checked and if you are authorized, the repository is updated.
- Just after the update, the Git post-update or the Subversion post-commit hooks are fired.
- The fired hook informs the warm standby through a REST api powered by node.js that your repository has been updated.
- The node.js coordinator schedule the retrieval of the latest version of your repository.
- When it is your turn, your repository is retrieved on the warm standby. At the moment, the backup is completed within 10 seconds of your update.
Why this approach and not a simple drbd to have direct copy of the data? Because of corruption. When you sync the data using git or svnsync, the SCM is doing checksums which are not performed with drbd. Using a block level replication means that if your main storage is getting corrupted, you are going to replicate the corruption and end up with a total loss of your data. It happened to me once and a friend of mine got a corrupted RAID 5 array last month. RAID at the server level or over the network is not a backup.
What is next?
- The same approach to backup your issue attachments and uploaded files (done, see update below);
- calculations of your repository size on the warm standby;
- additional warm standby to get 2 warm standby servers for each master;
- more testing before this approach can be considered as production ready. At the moment, the daily rsync backup is still running and pushing the data in another place.
Friday Update: The attachments and uploaded files are now in the loop. At the moment, the backup is performed at the instant the job arrives (for the repositories too). So, the time to get the backup is basically just the data transfer time and on a 1Gbps link, this means nearly instantaneous backup.