Data Wrangling in the Cloud — Part II — Configuring Severless Lambda Functions to Write Data to DynamoDB

Kelly “Scott” Sims
6 min readDec 15, 2020

tl;dr In this section, we are going to create and configure an AWS Lambda function to automatically write processed data to dynamoDB for us. Python (or any language) can be substituted for this step if you don’t want to use Lambda. See PART V if you wish to create a custom ETL script

Ultimately we are going to use AWS EMR Spark to preprocess the data. But after that, we need some way to get it to the DB. Unfortunately, there is no easy method for writing from EMR directly to dynamoDB. So one of the two different methods we are going to use is Lambda functions. First, we need to create a table in DynamoDB. Navigate to that resource from your console. Once there, you should be greeted by this (if you’ve never created a table before).

Configure DynamoDB

Click “Create table”. I’m going to name my table “articles” and set the primary key to be a feature called “year”. I’m also going to add a sort key called “uid”. It’s out of scope for this series to go into full detail on designing a NoSQL DB, but there exists great RESOURCES that more than cover the topic for DynamoDB. However, I’ll cover why I chose “year” and “uid” to be the keys at a later time.

Initially, we will set the Write capacity units to 1000. This is needed so the Lambda functions don’t time out waiting for the capability to write to the table. Don’t freak out when you see the cost for this. You can dial it back down to 1 after we are done writing the data.

Configure IAM User Role

Next, we need to create a role in order to give our Lambda function access to S3 and DynamoDB. Once data is processed in the Spark cluster, we will write it back to S3. This is why Lambda needs access.

Log into AWS Console, go to Identity Access Management (IAM), and click “roles” on the left hand side

Click “create role”. In the next screen, leave the default AWS Service highlighted, and select Lambda, then click “Next: Permissions” at the bottom right

We want to give our lambda function full access to S3, DynamoDb, and Cloudwatch. Search the polices for S3FullAccess, AmazonDynamoDBFullAccess, and AWSOpsWorksCloudWatchLogs.

Click next until you get to the screen where you give your role a name. Name it whatever you want, then click create.

Configuring Lambda

After the role is created, search for and navigate to Lambda from the console. Then “Create Function”.

Give your function a name, select Python as the runtime, then in execution role, select the user role we just created

Once your function is created you should see the following

We need our lambda function to trigger when there’s data in our S3 Bucket. So select “Add trigger” and add the S3 trigger. Select the bucket in which the data is going to be written to. In my case, it is largedata bucket. We want the event type to be PUT. I’m also planning on creating a prefix in the bucket called clean/ . All the files will be csv, so we can add the prefix csv. Then click “add” at the bottom right.

Configuring Lambda Tests

We need a way to test that our Lambda trigger is working while developing it. I’ve created a test.csv file that has just a few rows of data. I’ve saved it in my S3 bucket at the location largedatabuckt/clean/test.csv.

In the upper right hand corner, you should see something similar to the image below

Click the drop down and search for “Amazon S3 Put”

It should then show you the trigger JSON that is sent to Lambda. Under the S3 key, change name to the bucket where your data is going to be written to. Under the object key, change the “key” key to the name of the your test file you want to be parsed during testing. As we can see below, I’ve set mine to the full path of my test file: “clean/test.csv”

The Code

Below is the actual Python code that, once triggered, will get the csv file, parse it, and write it to the proper dynamoDB. The comments in the code, hopefully, are self explanatory. Shoot me a message if you would like further clarification. There’s also a link to a YouTube video by NKT Studios that I referred to when trying this out the first time as well.

Once you have saved your code, you can click the “test” button in the upper right corner. If everything is successful, you should see:

If we navigate over to our DynamoDB table, we should now see data written to it.

Possible “Gotcha”

--

--