7 85 Switch-over routing strategy

Failover routing in Route 53 gives you disaster recovery on top of DNS. The principle: declare a primary endpoint in one Region and a secondary endpoint in another. As long as the primary's health check returns success, Route 53 answers all queries with the primary's IP. As soon as the primary is flagged unhealthy, Route 53 automatically returns the secondary's IP — the failover happens at the DNS layer, transparent to the client (within the TTL window).

In the console you create a record, pick Failover routing, TTL 60 seconds, type A. You set the primary endpoint (Ireland instance IP), the failover type (Primary), and attach the existing health check for Ireland. Then you add the secondary endpoint (Tokyo instance IP), failover type (Secondary), and attach the Tokyo health check. If both endpoints fail you can also wire up an alert.

Live failover test

  • Hit the failover URL — the response comes from the Ireland (primary) instance.
  • Stop the Ireland EC2 instance from the console.
  • Watch the health check status — within a few intervals it flips from "1" (healthy) to "0" (unhealthy).
  • Refresh the URL — Route 53 now serves the Tokyo (secondary) endpoint.
  • CloudWatch graphs visualise the moment the check transitioned and the failover triggered.

This pattern is the simplest way to give an application a hot standby in another Region without any code change on the application side: just two records, two health checks and you have a DNS-level failover policy ready for production incidents.

Summary

This lesson explores AWS Route 53's failover routing strategy, a disaster recovery mechanism that automatically directs traffic to a secondary resource when the primary becomes unavailable. The module demonstrates configuring a failover policy with primary and secondary instances across different AWS regions (Ireland and Japan), then validates the failover mechanism through a live test by simulating primary instance failure. Route 53's health checks continuously monitor resource status, enabling seamless automatic failover without manual intervention or user disruption.

Key points

  • Route 53 failover routing automatically switches traffic from a failed primary to a healthy secondary resource based on continuous health check monitoring
  • Failover policy setup requires defining primary and secondary resources with associated health checks in different geographical regions for true disaster recovery
  • Health check status directly triggers failover behavior—when primary returns unhealthy (status 0), Route 53 redirects all DNS queries to the secondary IP address
  • DNS TTL (Time To Live) and health check intervals determine failover detection speed, with 60-second TTL providing typical sub-minute failover response
  • CloudWatch integration provides real-time visibility into health check status changes and failover events for monitoring and alerting
  • Failover works transparently to end users—traffic automatically redirects when primary fails without requiring application reconfiguration or manual intervention

FAQ

How does Route 53 detect when to switch to the secondary resource?

Route 53 uses continuous health checks to monitor the primary resource's availability. When a health check fails and reports unhealthy status, Route 53 immediately updates DNS responses to return the secondary resource's IP address instead, triggering automatic traffic redirection.

What is the recommended geographical strategy for primary and secondary resources?

Best practice is to deploy primary and secondary resources in geographically distant regions to maximize availability and ensure true disaster recovery. The lesson demonstrates this using Ireland (eu-west-1) as primary and Japan (ap-northeast-1) as secondary for maximum resilience.

How quickly does failover occur after primary failure?

Failover speed depends on configured health check intervals and DNS TTL settings. With a 60-second TTL as shown in the lesson, failover detection and DNS propagation typically complete within that timeframe, enabling rapid automatic resource switching.