Friday, October 14, 2022

DNS in multicloud disaster recovery architectures

Multicloud adoption is rapidly increasing, which means enterprises can take advantage of the best services from each cloud. But at the same time, it can add complexity and new challenges when designing disaster recovery architectures.

A disaster recovery region is a common business requirement to guarantee business continuity in the event of a major disaster. Traditional disaster recovery requires deploying, operating, and maintaining infrastructure in a different geographical region, it is highly complex and expensive. The public cloud’s shared management and pay-as-you-go model is a cost-effective way to implement disaster recovery.

In this blog, we cover the key considerations of domain name system (DNS) design in multicloud environments. We also go over the options to trigger a disaster recovery and provide an overview of the DNS services currently available in Oracle Cloud Infrastructure (OCI).

Multicloud DNS architecture


When defining the multicloud DNS architecture, assessing the impact of the DNS resolution time on the overall application performance or user experience is critical. When a DNS request is performed, unless the DNS resolver has already received and cached the request, the DNS resolution process takes time.

Transactional workloads are particularly sensitive, and in some cases milliseconds of latency can be noticeable. So, setting up mechanisms that provide the lowest latency is important. These mechanisms are conditioned by the underlying DNS architecture, which can be different for public and private DNS zones.

Public zones

The internet DNS architecture is based on a global, decentralized system that performs delegation, from the internet root zone to top level domain to finally authoritative servers where the public zones are hosted. The internet DNS architecture is built on an anycast network, which helps provide lower latency during the delegation process, a basic level of load balancing, and resilience. With the delegation optimization included in the DNS public architecture, OCI offers traffic management steering policies, which contain rules that help serve intelligent responses to DNS queries.

You can use traffic management steering policies to steer traffic to OCI regions, but also to any publicly exposed resources, including other cloud providers and on-premises data centers. Disaster recovery architecture is one of the use cases that steering policies can cover.

Triggering a disaster recovery must be a decision taken by top management, and it isn’t expected to be automatic in high-availability architectures. Regardless, using traffic steering policies simplifies this process because a simple manual failover changes the DNS response of a public domain from an IP in the primary region to an IP in the disaster recovery region.

In addition to the failover steering policy, the following options are available in OCI:

◉ Load balancer: Distributes traffic over several servers to optimize performance
◉ Geolocation steering: Dynamically routes traffic requests based on originating geographic conditions
◉ ASN steering: Dynamically routes traffic requests based on the originating autonomous system number (ASN)
◉ IP prefix steering: Dynamically routes traffic requests based on originating IP prefix

Private zones

A private DNS zone contains data only accessible internally and isn’t exposed to the public internet. Private zones aren’t integrated with the public internet root zone.

Large global enterprises deploy DNS architectures based on a private root zone with domain delegation for their private zones. Enterprises get all the benefits of a decentralized system for their internal services. To achieve this goal, they deploy a private root zone and modify the DNS local resolvers to use this new private root zone, instead of the default internet root zone.

In a multicloud scenario, this point represents a challenge because the DNS services offered by cloud providers allow hosting of internal zones but restrict changing the root zone. By using the native DNS services of public cloud, you can’t define a multicloud DNS architecture based on a private root zone. So, the region or cloud where the private zone is hosted is critical to achieving lower latency and better performance required by modern services. Ideally, the private zone exists in the same cloud where the resources exist and as close as possible to the requester. In a multicloud deployment, each cloud commonly has private zones.

The only option available for resolving the private zones hosted in different cloud is to configure DNS forwarding rules. This static configuration must be configured in the DNS servers of each cloud. Another consideration when defining a disaster recovery architecture is to have separate DNS private zones for the primary and disaster recovery region. This structure permits DNS resolution between the primary and disaster recovery regions and allows end-to-end connectivity tests without impacting production.

The following example shows private zones in a multicloud disaster recovery architecture:

◉ Database region A: region-a.oci.corp
◉ Example type A tecord: test. region-a.oci.corp > 10.0.0.1
◉ Database region B: region-b.oci.corp
◉ Example type B record: test. region-b.oci.corp > 20.0.0.1
◉ Frontend region A: oci-dns.corp
◉ Example CNAME record (Production): test.oci-dns.corp > test. region-a.oci.corp
◉ Frontend region B: oci-dns.corp
◉ Example CNAME record (Disaster recovery): test.oci-dns.corp > test. region-b.oci.corp

The objective of defining a production private zone (oci-dns.corp) in a disaster recovery architecture is to be agnostic to the applications.

Example multicloud disaster recovery scenario


A common multicloud split-stack scenario is where Amazon Web Services (AWS) hosts the frontend and OCI hosts the database layer. For this scenario, we have three main configurations.

Disaster recovery regions for the frontend and database layers

Oracle Database, Database Skills, Database Jobs, Database Prep, Database Preparation, Database Guides, Oraccle Dabase Tutorial and Mateirals

From a technical point of view, this configuration is ideal. The frontend cloud region only has network connectivity and private DNS resolution with the OCI paired region, such as AWS production frontend with OCI production database and AWS disaster recovery frontend to OC disaster recovery database. Each AWS frontend region has a production private zone with CNAME records pointing to the paired region.

To trigger the disaster recovery from a DNS standpoint, you only need to steer the traffic to the relevant frontend layer. In frontend public zones, you can use a manual failover in the DNS traffic steering policy.

Disaster recovery region for the frontend layer

Oracle Database, Database Skills, Database Jobs, Database Prep, Database Preparation, Database Guides, Oraccle Dabase Tutorial and Mateirals

This disaster recovery architecture isn’t ideal but can happen for different technical and commercial reasons.

The diagram depicts the configuration with two regions for the frontend layer and a single region for the database layer. This configuration has important latency concerns because frontend region B can be a long distance from the database cloud region. For this design, you need network connectivity and DNS resolution from both frontend cloud regions to the database region.

To trigger the disaster recovery from a DNS standpoint, you only need to steer the traffic to the relevant frontend region. In frontend public zones, you can do a manual failover in the DNS traffic steering policy.

Disaster recovery region for the database layer

Oracle Database, Database Skills, Database Jobs, Database Prep, Database Preparation, Database Guides, Oraccle Dabase Tutorial and Mateirals

Like with the previous configuration, this disaster recovery architecture isn’t ideal.

The diagram depicts the configuration with one frontend cloud region and a primary and disaster recovery region for the database layer. This configuration has important latency concerns because the frontend region can be a long distance from the database cloud region B. For this design, you need network connectivity and DNS resolution from the frontend region to both database regions.

To trigger the disaster recovery, you must change the CNAME records defined in the frontend production private zone to resolve the records of the database region B.

Source: oracle.com

Related Posts

0 comments:

Post a Comment