Build and deploy custom AWS layer using Docker and Serverless Framework

Though popular frameworks are usually available as public AWS Lambda layers, the time will come when you may wish to build your own layer instead. There could be many reasons for this, for example, you may need some less popular library, or you may wish to optimise the existing one, or you may just need to ship some custom static data. Building AWS layers requires some skills, expect to jump through a few hoops in a process, but nowadays the procedure is not that difficult as one might expect. Let's consider, for example, how to build and deploy NLTK library with WordNet dictionary dataset.

Basically, to create AWS layers you need to zip your files in appropriate directory (it could be nodejs, or python, or arbitrary subdirectory for data). As serverless framework comes with the ability to build custom layers, this step could be automated. And the only remaining problem of packaging correct binaries under different OS could be easily solved using docker. In this post I use lambci/lambda docker images to both build and test an NLTK layer.

So the first step is to build layer in the docker container. For this, one may wish to install NLTK package in a separate Python virtual environment to use it later for data download and to generate requirements.txt file with correct package dependencies. Dumping requirements.txt file requires pip freeze > requirements.txt command. Using pip install -r requirements.txt -t /opt/python command (where option -t stands for target), all the required packages could be installed to a /opt/python directory (where they reside in AWS lambda function environment).

WordNet dataset could be downloaded using nltk.downloader -d /opt/nltk_data wordnet command (to /opt/nltk_data subdirectory). The whole procedure is quite straightforward and could be tested in interactive bash session in docker, where commands could be entered manually. The only caveat is that in Dockerfile, when working with virtual environments, we cannot use standard activate command which is usually used on command prompt to adjust the PATH variable. In docker, on the other hand, commands are executed one by one, outside any bash sessions, so environment should be adjusted explicitly (using ENV PATH /opt/env/bin:/var/task/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin). Our dockerfile finally will look like this:

# Dockerfile to create nltk layer with wordnet dataset
FROM lambci/lambda:build-python3.7

RUN rm -rf /opt/* && mkdir -p /opt/nltk_data
RUN python3 -m venv /opt/env

ENV PATH /opt/env/bin:/var/task/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN python3 -m pip install --upgrade pip && python3 -m pip install nltk

RUN python3 -m nltk.downloader -d /opt/nltk_data wordnet

RUN pip freeze > requirements.txt

RUN pip install -r requirements.txt -t /opt/python

RUN rm -rf /opt/env

After an image is created, docker cp command can be used to copy layer to working directory. The whole process could be automated using build.sh file:

docker build -t nltk-lambda-layer -f Dockerfile .
docker run -d --name nltk-layer nltk-lambda-layer false
docker cp nltk-layer:/opt ./layer
docker cp nltk-layer:/var/task/requirements.txt .
docker rm nltk-layer

Finally, create a simple serverless.yml configuration file, in which location of the layer should be specified:

# serverless.yaml
service: python-nltk-layer

provider:
  name: aws

layers:
  nltk:
    path: layer

After deploying layer to AWS, we may start using it immediately. To test that it is working, lets create a simple HTTP AWS endpoint that will return word synonyms. To make WordNet dataset available, its location (/opt/nltk_data) should be specified in NLTK_DATA environment variable. Example code could be deployed to AWS and tested using curl. Alternatively, the handler could be invoked locally in docker:

# to test deployed AWS endpoint
curl https://XXXXX.execute-api.us-east-1.amazonaws.com/dev/synonyms/active
["participating", "active_voice", "alive", "active_agent", "combat-ready", "fighting", "active", "dynamic"]
# alternatively, invoke it locally
docker run --rm -e NLTK_DATA=/opt/nltk_data -v "$PWD":/var/task -v "$PWD"/../layer:/opt lambci/lambda:python3.7 handler.synonyms '{"pathParameters":{"word":"active"}}'

See source code on Github for details.