Build and deploy custom AWS layer using Docker and Serverless Framework
Though popular frameworks are usually available as public AWS Lambda layers, the time will come when you may wish to build your own layer instead. There could be many reasons for this, for example, you may need some less popular library, or you may wish to optimise the existing one, or you may just need to ship some custom static data. Building AWS layers requires some skills, expect to jump through a few hoops in a process, but nowadays the procedure is not that difficult as one might expect. Let's consider, for example, how to build and deploy NLTK library with WordNet dictionary dataset.
Basically, to create AWS layers you need to zip your files in appropriate directory (it could be nodejs, or python, or arbitrary subdirectory for data). As serverless framework comes with the ability to build custom layers, this step could be automated. And the only remaining problem of packaging correct binaries under different OS could be easily solved using docker. In this post I use lambci/lambda docker images to both build and test an NLTK layer.
So the first step is to build layer in the docker container. For this, one may wish to install NLTK package in a separate Python virtual environment to use it later for data download and to generate requirements.txt file with correct package dependencies. Dumping requirements.txt file requires pip freeze > requirements.txt
command. Using pip install -r requirements.txt -t /opt/python
command (where option -t stands for target), all the required packages could be installed to a /opt/python directory (where they reside in AWS lambda function environment).
WordNet dataset could be downloaded using nltk.downloader -d /opt/nltk_data wordnet
command (to /opt/nltk_data
subdirectory). The whole procedure is quite straightforward and could be tested in interactive bash session in docker, where commands could be entered manually. The only caveat is that in Dockerfile, when working with virtual environments, we cannot use standard activate
command which is usually used on command prompt to adjust the PATH variable. In docker, on the other hand, commands are executed one by one, outside any bash sessions, so environment should be adjusted explicitly (using ENV PATH /opt/env/bin:/var/task/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
). Our dockerfile finally will look like this:
# Dockerfile to create nltk layer with wordnet dataset FROM lambci/lambda:build-python3.7 RUN rm -rf /opt/* && mkdir -p /opt/nltk_data RUN python3 -m venv /opt/env ENV PATH /opt/env/bin:/var/task/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin RUN python3 -m pip install --upgrade pip && python3 -m pip install nltk RUN python3 -m nltk.downloader -d /opt/nltk_data wordnet RUN pip freeze > requirements.txt RUN pip install -r requirements.txt -t /opt/python RUN rm -rf /opt/env
After an image is created, docker cp
command can be used to copy layer to working directory. The whole process could be automated using build.sh file:
docker build -t nltk-lambda-layer -f Dockerfile . docker run -d --name nltk-layer nltk-lambda-layer false docker cp nltk-layer:/opt ./layer docker cp nltk-layer:/var/task/requirements.txt . docker rm nltk-layer
Finally, create a simple serverless.yml configuration file, in which location of the layer should be specified:
# serverless.yaml
service: python-nltk-layer
provider:
name: aws
layers:
nltk:
path: layer
After deploying layer to AWS, we may start using it immediately. To test that it is working, lets create a simple HTTP AWS endpoint that will return word synonyms. To make WordNet dataset available, its location (/opt/nltk_data) should be specified in NLTK_DATA environment variable. Example code could be deployed to AWS and tested using curl
. Alternatively, the handler could be invoked locally in docker:
# to test deployed AWS endpoint curl https://XXXXX.execute-api.us-east-1.amazonaws.com/dev/synonyms/active ["participating", "active_voice", "alive", "active_agent", "combat-ready", "fighting", "active", "dynamic"] # alternatively, invoke it locally docker run --rm -e NLTK_DATA=/opt/nltk_data -v "$PWD":/var/task -v "$PWD"/../layer:/opt lambci/lambda:python3.7 handler.synonyms '{"pathParameters":{"word":"active"}}'
See source code on Github for details.