Reliability

VeloDB Cloud protects against three kinds of failure: availability zone loss, data loss or corruption, and region loss. Each is handled by a different feature, measured by two metrics — RPO and RTO.

RPO and RTO

RPO (Recovery Point Objective) is the maximum amount of data, measured as a window of time, that you can lose after a failure. An RPO of one hour means that up to one hour of writes can be lost. The RPO you can achieve depends on which protection you rely on, as described below.

RTO (Recovery Time Objective) is the maximum time to restore service after a failure, including the time to detect the failure and complete recovery. An RTO of one hour means that service resumes within one hour. When recovery involves restoring or importing data, the RTO grows with the data volume.

How each failure is handled

High Availability protects against the loss of a single availability zone. VeloDB Cloud deploys your warehouses across multiple availability zones, so committed data survives a zone failure and the RPO is zero. For automatic recovery, pair two clusters in different zones as a primary-standby virtual cluster: when the primary zone fails, VeloDB fails over to the standby automatically and keeps it synchronized in real time, so the switch is quick but not instant, because the failure must first be detected. A single cluster has no standby, so you recover it manually by creating a replacement cluster in another zone, which makes its RTO depend on when you act and how long the new cluster takes to start and warm its cache. See High Availability.

Recycle Bin recovers data that you removed by mistake. Run RECOVER to restore a database, table, or partition that you dropped or truncated, as long as it is still in the recycle bin. This recovery is best-effort rather than guaranteed, because VeloDB Cloud removes recycle-bin entries as it needs the resources, use backups when you need a guaranteed recovery point.

Backups protect against data corruption and bad writes. VeloDB Cloud copies your databases to object storage in the same region on the schedule that you set, and you restore from any retained backup. The backup RPO is the backup cycle you configure plus the time each backup takes to complete. How long a backup or restore takes depends on several factors, including your data volume, your table schemas, and the number of tables. See Backups.

Disaster Recovery protects against the loss of an entire region. Backups and multi-zone deployment both keep your data within a single region, so neither survives a regional outage. You export your data to another region so that a usable copy exists elsewhere, and you recover by importing it into a warehouse in that region. The export RPO is the export cycle you set plus the time each export takes to complete. See Disaster Recovery.

Recommendations

Periodically test the approach you adopt by running recovery drills with production-scale data to evaluate your real RPO and RTO, because the results depend on your data volume, cluster sizing, and application. When you need the lowest possible RPO and RTO across regions, run a warehouse in each of two regions and write to both, so that a regional outage leaves a current copy that can serve traffic immediately, at the cost of operating and loading two warehouses.

RPO and RTO​

How each failure is handled​

Recommendations​

RPO and RTO

How each failure is handled

Recommendations