Cast Disaster Recovery SLA for RPO and RTO

Our application is hosted on Google Cloud Platform, utilizing a combination of Cloud SQL, App Engine, and Cloud Functions, which are built to ensure high availability and resilience:

  1. Recovery Point Objective (RPO):
    • Cloud SQL: Automated backups and point-in-time recovery are enabled. Our RPO is designed to be minimal, with data potentially recoverable to any point within our backup retention period (up to 365 days, with a default of 7 days). This ensures that we can restore any lost data almost up to the moment before a disruptive event, depending on the last transaction log.
  2. Recovery Time Objective (RTO):
    • Cloud SQL: Our instance is highly available, with automatic failover to a standby in another zone. In the event of a zone failure, the system is typically restored within minutes. The RTO for our database is therefore kept to a minimum, generally within a few minutes post-incident.
    • App Engine and Cloud Functions: Both services offer high availability and manage infrastructure failures automatically, minimizing downtime. App Engine and Cloud Functions have RTOs aligned with their respective SLAs, aiming to restore service almost immediately after failure detection.
  3. Service Level Agreement (SLA):
    • Cloud SQL: 99.95% uptime for instances configured for high availability.
    • App Engine: 99.95% uptime for the standard environment and 99.9% for the flexible environment.
    • Cloud Functions: 99.5% uptime.
    • Additionally, our external email service provider, SendGrid, has its own SLA, which supports our communication resilience.

Our architecture is designed to ensure that both data integrity and service availability meet the highest standards, aligning with industry-leading SLAs. We continuously monitor and update our systems to adapt to emerging threats and ensure robust disaster recovery practices.