Optimized Theano & Keras on AWS Lambda

I read this amazing blog post about running Keras + Theano models on AWS Lambda. However, these models have very high latency on Lambda because Theano optimizes models using GCC which is not available in Lambda. This post will explain how to make GCC available in Lambda and run fully optimized models. You can expect 8x boost in performance.
I expect the readers to have basic familiarity with AWS Lambda, EC2, Keras, and Theano.
Setup Keras & Theano on AWS Lambda
At the end of this section we will have a toy model running. The basic idea is to setup all dependencies on an EC2 instance running the AWS Lambda AMI. Then, from these dependencies, create a Lambda deployment. Lambda AMI will make the deployment consistent with Lambda function execution environment.
Start by launching an EC2 instance with the appropriate AMI. Get the AMI details from this page. I used a C4.4xlarge instance. Log in to the instance and do the following:
Create a basic handler for testing.
Copy this handler to the deployment folder of previous section and get the Lambda deployment payload ready.
# Add the following environment variables to Lambda function setup.
THEANO_FLAGS=base_compiledir=/tmp/.theano
KERAS_BACKEND=theano
If the Lambda function is setup correctly the handler code will work. The latency will be around 16 ms.
Setup GCC
Next, let us make GCC available.
Change the Lambda Function handler to download GCC payload and bootstrap it.
# Add the following environment variables to the Lambda function
GCC_EXEC_PREFIX=/tmp/gcc/lib/gcc/
THEANO_FLAGS=base_compiledir=/tmp/.theano,cxx=/tmp/gcc/bin/g++
At this point, lambda function invocation should fail with GCC complaining about header files OR static/dynamic libraries not found.
Setup Header Files & Static/Dynamic Libraries
To fix GCC errors from earlier section do the following on EC2 instance.
Change the Lambda function handler to download this new payload and bootstrap it.
#Add the following environment variables to Lambda setup
LIBRARY_PATH=$LIBRARY_PATH:/tmp/lib
CPATH=$CPATH:/tmp/lib:/tmp/lib/python2.7
At this point, everything should work. First lambda invocation will be slow as it downloads GCC and Lib artifacts. After that the latency will be around 2 ms !!
Conclusion
I extended the original work and added GCC support. This allows for ~8x boost in model performance. I hope this post is helpful :).
Disclaimer
I work for Amazon.com. This post is part of my personal exploration and learning. It does not represent Amazon in any way.