Low latency global data ingestion with AWS Lambda

DevOps
by Michal Niec

Recently I’ve encountered an interesting problem. As a part of one of our projects we needed to create low latency global data ingestion API. The project is developed on AWS platform, so we decided to use AWS native tool for serverless computing – AWS Lambda triggered by AWS API Gateway. Our data ingestion app is expected to do two things – validate received json and put it into AWS Kinesis Firehose for further processing. One of the requirements is to respond to the end user as fast as possible with 200 HTTP code, no matter if payload is valid or not. So our initial setup looks like this:


IyK gJP4dZaesggwGYNQkdBhDPLtBwffXEtk13vIBud2B512JbyVfFB6wTlatTsu4v3j6kThig8C - - appliscale.io

We’ve used edge-optimized API Gateway endpoint, which means API requests are routed to the nearest CloudFront Point of Presence. We’ve created two lambdas – first one, injection lambda, just triggers another lambda where validation and further processing happens and immediately returns 200 to the end user. This is how injection lambda Python code looks:


import boto3
import logging

logger = logging.getLogger()

def click(event, context):
   lambda_client = boto3.client('lambda')
   try:
      invoke_lambda(event, lambda_client, 'validation')
   finally:
      return response()

def invoke_lambda(event, lambda_client, lambda_name):
   try:
      lambda_client.invoke(
         FunctionName=lambda_name,
         InvocationType='Event',
         Payload=json.dumps(event)
      )
   except Exception as e:
      logger.exception("Problems with invoking validation lambda: {0}".format(e)

def response():
   return {
      "statusCode": 200,
      'headers': {
         'Content-Type': 'application/json',
         'Access-Control-Allow-Origin': '*'
      },
   }


We’ve deployed our infrastructure and waited for requests to start coming.

First results

What we observed is that latency was not so good… API Gateway integration latency, which is the time between API gets the request and sends the response, was around 74 milliseconds. To get latency that the end user observes on his side you need to add user’s latency to the nearest edge location, which in my case is about 20 ms. So we had 94 ms latency, which was not as good as we expected.

I dug a little bit deeper and noticed that injection lambda execution time is around 54 ms. So API Gateway adds 20 ms latency. Why? API Gateway uses SSL and you can assume that 20 ms is a time needed for SSL handshake. Furthermore API Gateway forces all backend components to use SSL, so for example if you want to connect to a RDS or EC2 instance another handshake is needed.

Lambda@Edge

So it is time to try a different approach – Lambda@Edge. Lambda@Edge lets you respond to the end user faster – it runs on the edge location. My initial setup is just empty lambda triggered by CloudFront call. Of course we want to keep SSL, so we will not bypass 20 ms latency on CloudFront handshake. But Lambda@Edge responds quickly, under 1 ms. This gives 21 ms integration latency.


Now it is time to call our validation lambda from Lambda@Edge. And the result is not as good as you may expect. Injection lambda duration grows to 80 ms. As you remember in our initial setup lambda duration was around 54 ms. Why duration increased? Lambda@Edge is designed to respond quicker to end user than traditional lambda. But take a look at the diagram below. As you can see injection lambda runs on the edge location. But validation lambda runs in AWS datacenter, which is physically isolated from the edge location. That means AWS needs to open SSL connection between edge location and datacenter, which as you know takes time. You must also keep in mind that Lambda@Edge supports only nodeJS runtime for now, which can affect duration as well.


AeoUmjP1hNsmfbr0P Gwhtp7nIS9fPMY0R8XIkw25wCoJ68Ios186Ji61ggjnqr3nr32fpdSNXEan wqbHiIhmzEV rI4IQE7Zwae6PiD5Go7HA4lBFuUMJKUMkTB sorICZ18RD - - appliscale.io

While watching CloudWatch logs I’ve noticed one more interesting thing. Ingestion lambda runs almost twice longer than validation. And the difference between injection and validation lambda, beside some processing, is the service that is called.


TjdOJUpHiHdII6ZgXhvooJMGFbEascSbmCeU1RqubyhokQaRS1HOMaImNXKJu PujnyGL95LzUWCay Bi0JxQA2oNF1Ev4BseunP lmYKUuqA - - appliscale.io

So my next step was to put data directly to the AWS Kinesis Firehose from ingestion Lambda@Edge. Duration drops to around 30 ms, which is a lot better than 54 ms in the initial setup. But still within this 30 ms there is ~20 ms latency added because you need to connect from the edge location to datacenter to put data into Firehose.

Final setup

Maybe Lambda@Edge is not the solution in our case? Let’s try one more thing. This time instead of running lambda at edge location let’s return to our initial setup with API Gateway, but instead of chaining lambda functions, put data directly to Firehose. That way our injection lambda duration drops to 11 ms, which gives around 31 ms API Gateway integration latency.

Validation lambda can now be ran for batch of requests that are collected in the AWS Kinesis Firehose.

There is one more thing that can be done to reduce injection lambda duration. We were using Python for now. We’ve decided to move to some compiled language – our choice was Go.


7zQNQKHlFdqcY4kwONUA0FFkcmn 2a02KDCPVX9HgM7Y4ndckpvonuNADiNsORgzRxh9DmlKfmPLgADgch31uq0dV5LSFFfRWk lYgyqSeP8RGvXoo1 RwhCBXu ZOmVBIRMK8oY - - appliscale.io

You can see results on the graph below. Guess what time we’ve deployed our final setup 🙂


- - appliscale.io

Sources:

Appliscale delivers scalable, high-performance tech solutions, specializing in cloud migration, system architecture, and custom software to empower businesses worldwide.
Poland
Życzkowskiego 14
31-864 Kraków
1st floor
KPT Building
Ireland
Whiterock South
Wexford
Appliscale sp. z o.o. seated in Kraków, address: ul. prof. Michała Życzkowskiego 14, 31-864 Kraków District Court for the city of Krakow, 11th Commercial Division of the National Court Register registration number: 0000592380 Tax Identification Number: 9452189348 share capital: 100 000 zł.