pending...

December 2, 2020

AWS VPC Endpoint: The most forgotten AWS architectures component

Introduction:

The idea of writing an article about VPC Endpoints comes from my daily work with many customers. I have seen so many AWS architectures do not display VPC Endpoints components graphically, I mean architectures which illustrate connexions between VPC and supported AWS services like S3 or DynamoDB for instance.

When I ask the designer of this type of architecture if he was considering integrating VPC Endpoints, I get the answer that motivated me to write this article. 

Should we use the VPC Endpoints when illustrating/creating connexions between VPCs and supported AWS services or VPC endpoint services powered by AWS PrivateLink?

Before explaining the hidden role of AWS VPC Endpoints with real use cases, let’s begin with some definitions 🙂 

What is a VPC endpoint?

A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP addresses to communicate with resources in the service. Traffic between your VPC and the other service does not leave the Amazon network.

The definition of the VPC endpoint gives already a preview of some benefits. Not using an internet gateway, NAT device, NAT gateway VPN connection, or AWS Direct Connect connection means reducing costs for data transfer and data processing. 

For instance, you can refer to my last article to see how AWS NAT Gateway costs can climb easily.

Use cases of using AWS VPC Endpoints

For these use cases, here are some key concepts for VPC endpoints

  • VPC endpoint: The entry point in your VPC that enables you to connect privately to a service. The following are the different types of VPC endpoints.
    • Gateway endpoint: to connect VPC resources to S3 or DynamoDB.
    • Interface endpoint:  to connect to services powered by AWS PrivateLink (AWS services, services hosted by other AWS customers and partners in their own VPCs (referred to as endpoint services), and supported AWS Marketplace Partner services)
  • Endpoint service: Your own application or service in your VPC. Other AWS principals can create an endpoint from their VPC to your endpoint service.
  • AWS PrivateLink: A technology that provides private connectivity between VPCs and services.

You can find more details in AWS VPC Endpoint official documentation here

First Use Case (Standard)

Let’s begin with standard architecture which shows communication between EC2 instances and S3 bucket. 

Let’s assume we have 2 EC2 instances in the same VPC, one in a private subnet and the other one in a public subnet. Each instance sends 100 files of 1 Go every day to an S3 bucket in the same region.

If there is no VPC gateway endpoint for S3, we need to add a NAT gateway (or a NAT instance) to have access to the S3 bucket from the private subnet.

As a result we get the following architecture: 

As the diagram shows, we can see that the traffic between the 2 EC2 instances and the S3 bucket is crossing the internet network.

Unfortunately, this architecture is not aligned with some pillars of AWS Well-Architected Framework.

  • Cost optimization: The use of NAT Gateway to connect the EC2 instance in the private subnet generates additional costs: $186 / month (you can find the formula here). Besides, there is a Data Transfer Out to internet charges.
  • Security: The data is crossing the internet network. (Shout out to Security teams)
  • Performance Efficiency: Crossing the internet network adds latency to the requests from EC2 to our famous S3 bucket.

Here comes the importance of using VPC gateway Endpoint for S3 🙂

Using VPC Endpoint in this example of architecture helps to avoid misalignments with AWS Well-Architected Framework. 

This can be done by attaching the VPC Gateway Endpoint for S3 with our VPC and updating our route tables by adding a route S3 prefix list as Destination and VPC Endpoint Id as a Target. 

To add an additional layer of security, you can modify the endpoint policy that controls the use of the endpoint to access Amazon S3 resources (The default policy allows access by any user or service within the VPC), and also S3 bucket policies to allow only VPC Endpoint accessing to the S3 bucket. You can find more details here.

Finally, you can find below the right architecture which follows the best practices above for our standard use case:

Note: When creating VPC Endpoint, a new route is generated in the associated subnets’ route tables. The new route contains the regional prefix list of S3 service as a destination and the VPC endpoint Id as a Target. You can filter by region/service:” S3” in Amazon IP ranges JSON file to find the CIDR blocks behind the entries of the prefix list.

Second Use Case (Serverless Architecture)

Serverless Architecture does not mean there are no networking aspects 🙂 

For this use case, I will try to show you how to apply networking aspects and VPC Endpoints in particular in an example of Serverless Architecture.

Let’s assume we have a typical data serverless architecture, with AWS Lambda for ingestion from 3 external APIs triggered via Amazon EventBridge every hour. The data ingested is cleaned and prepared by Lambda and pushed into S3 Bucket. Then, AWS Glue comes to extract data from S3 Bucket, transform it following specific business rules, and load it into Aurora Serverless. (By the way, while writing this article AWS launches Amazon Aurora Serverless V2 which provides the ability to scale database workloads to hundreds of thousands of transactions in a fraction of a second)

As we can see, this data serverless architecture is well designed to describe the data flow between the data sources and the target database. Although, this architecture does not show the networking aspects behind the scenes.

If you create these resources without specifying a VPC, you will not have any control over your data in-transit !!

As I said at the beginning of this article, the VPC endpoint is the entry point in your VPC that enables you to connect privately to a service. So, to secure communication between our VPC and the Serverless Services, we need to attach a VPC Endpoint for each service.

As a result, if we assume that we are managing sensitive data, we need to launch our Lambda function in a private subnet of our VPC and create the following VPC endpoints:

  • Gateway endpoint for S3
  • Interface endpoint for AWS Glue
  • Interface endpoint for Aurora Serverless database

The diagram below shows the networking view of our data Serverless Architecture: 

As a reminder, An interface endpoint is an elastic network interface with a private IP address from the IP address range of your subnet. It serves as an entry point for traffic destined to a supported AWS service or a VPC endpoint service. 

Third Use Case (service powered by AWS PrivateLink)

For this use case and to illustrate how to connect to a third party service through AWS PrivateLink, I chose Datadog.

Datadog is an American monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

I will not go deep about Datadog features. So let’s assume we need to push logs from a fleet of EC2 instances to Datadog via Datadog Agent and forward all logs in CloudWatch to Datadog.

To reach the Datadog URL endpoint, our logs need to cross the internet network. So we need to route logs traffic of instances in a private subnet to NAT Gateway and logs traffic of instances in a public subnet to Internet Gateway

And to forward logs in CloudWatch to Datadog, we need to use Lambda Forwarder feature (Documentation here)

The diagram below shows the implementation of these 2 scenarios in real world:

By using Datadog AWS PrivateLink like is illustrated in the diagram below, you can connect your VPC to Datadog url endpoint via VPC Endpoint, you can check the Datadog official documentation for instructions here. It means our logs traffic will not cross the internet network and the NAT Gateway processing charges will reduce dramatically.

Unfortunately, Datadog AWS PrivateLink exists only in the us-east-1 region for the moment.

Conclusion:

Through these 3 different use cases, I tried to explain the role and the advantages of using VPC Endpoints, especially for Serverless architectures.

AWS recommends to follow the 5 pillars of Well-Architected Framework while you are designing architectures:

  • Operational Excellence
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimization

Following the 3 examples of this article, you can deduce that using VPC Endpoint is one of the best practices and it is 100% aligned with all the 5 pillars

Posted in AWS, Networking
1 Comment
  • meubelset

    thanks, very interesting 🙂

    2:50 pm July 5, 2021 Reply
Write a comment