Skip to content

Latest commit



492 lines (341 loc) · 10.4 KB

File metadata and controls

492 lines (341 loc) · 10.4 KB

Snakes And Lambdas

  • what is lambda
  • why is lambda (for data science)
  • how is lambda (to actually use)
  • when is lambda (the right choice)


What is lambda?

  • serverless

    • it runs on servers, you just don't deal with that
  • scaleable

    • it only costs when running, you just pay more as it does more
  • micro-service

    • it only does one small thing, you just have lots of different ones


What is lambda?

  1. Start a new lambda
  2. Set up lambda with user code - "cold start"
  3. Accept event 1 and process
  4. Wait
    1. Accept event 2 and process - "warmed up"
    2. Timeout and kill lambda

There can be multiple concurrent machines too


What is lambda?

  • Pay only for what you use
  • Manage only what you have to
  • Deal with extra/new/bursty traffic seamlessly


What is lambda?

  • Invocation payload (request and response)
    • 6 MB (synchronous)
    • 256 KB (asynchronous)
  • Deployment package size
    • 50 MB (zipped, for direct upload)
    • 250 MB (unzipped, including layers)
    • 3 MB (console editor)


Why is lambda?

(for data science)

  • data-scientists != dev-ops professionals
    • but our work needs to be 'released'
  • all data projects != ensemble xg-boost Keras TPU shenanigans
    • "No ML is easier to manage than no ML" © @julsimon
  • data-projects != single-goal monolithic systems
    • separate concerns, code bases and complication


How is lambda?

(to actually use)

  1. write your python
  2. lambda your python
  3. ???
  4. profit

1. Write your python

from scipy import stats
x = np.random.random(10)
y = 1.6*x + np.random.random(10)
slope, intercept, r_value, p_value, std_err = 
    stats.linregress(x, y)

2. Lambda your python

  • event driven
    • an event is passed to a handler function
  • json formatted
    • events are json
    • handler functions return json


2. Lambda your python

import json
from scipy import stats
import numpy as np

def lambda_handler(event, context):
    x = np.random.random(10)
    y = 1.6*x + np.random.random(10)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) 
    return_body = {
        "m": slope, "c": intercept,"r2": r_value ** 2, 
        "p": p_value, "se": std_err
    return {"body": json.dumps(return_body)}


2. Lambda your python

full console


2. Lambda your python



2. Lambda your python


3. ???????

(1. Layers)

  • json is built in by default
    • so it boto3


  • lambda doesn't pip install ....



3.1 Layers



3.1 Layers


3. ???????

(2 Custom Layers)


  • New requirement needs pandas


  • Create custom layer
    • pre-compiled code on a specific path deployed as a .zip
    • for 'any package' * using some shell and docker
      • * YMMV


3.2 Custom Layers




export PKG_DIR="python"

rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR}

docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \
    pip install -r requirements.txt --no-deps -t ${PKG_DIR}


3.2 Custom Layers

chmod +x
zip -r . -i "python/*"

Then upload + create as layer with aws-cli or manually with console


3.2 Custom Layers

'any package' *

  • pandas
  • pymysql
    • lambda needs to be inside the same VPC
  • statsmodels

3. ???????

(3 api gateway)


  • How does your team use your work?



3.3 Api gateway

api gateway console


3.3 Api gateway

api gateway test


3.3 Api gateway

Get help from an adult (dev-ops professional)

ryker help


3.3 Api gateway

If you can't find an adult

  • be careful about exposing the api
    • not obvious how and where it can be accessed
      • resource policies
  • swagger is an api templating syntax
    • cloud formation
  • click the 'deploy api' button after every change
    • use multiple stages

3. ???????

(4 local dev - cloud deploy)


  • copy-pasta code into console is bad


  • use AWS SAM cli
    • local development + testing with docker
    • 'cloudy' deployment with cloudformation cli


3.4 local dev - cloud deploy

sam squirrel


3.4 local dev - cloud deploy

Usage: sam [OPTIONS] COMMAND [ARGS]...

  local     Run your Serverless application locally 
            for quick development & testing.
  logs      Fetch logs for a function
  deploy    Deploy an AWS SAM application. This is an alias 
            for 'aws cloudformation deploy'.
  build     Build your Lambda function code
  publish   Publish a packaged AWS SAM template to the AWS 
            Serverless Application Repository.
  init      Initialize a serverless application.
  validate  Validate an AWS SAM template.
  package   Package an AWS SAM application. This is an alias 
            for 'aws cloudformation package'.


3.4 local dev - cloud deploy


  • sam init
  • sam local generate-event apigateway aws-proxy
    • sam build
    • sam local invoke -e event.json ---~

3.4 local dev - cloud deploy

alias playitsam='sam build && sam local invoke -e event.json'
alias playitagainsam='sam build && sam local invoke -e'


3.4 local dev - cloud deploy

  • sam validate
  • sam package
  • sam deploy


3.4 local dev - cloud deploy

Transform: 'AWS::Serverless-2016-10-31'
    # This resource creates a Lambda function.
    Type: 'AWS::Serverless::Function'
      # This function uses the python 3.7 runtime.
      Runtime: python3.7
      # This is the Lambda function's handler.
      Handler: app.lambda_handler
      # The location of the Lambda function code.
      CodeUri: ./regression
      # Event sources to attach to this function. In this case, we are attaching
      # one API Gateway endpoint to the Lambda function. The function is
      # called when a HTTP request is made to the API Gateway endpoint.
            # Define an API Gateway endpoint that responds to HTTP GET at /regression
            Type: Api
                Path: /regression
                Method: GET

This enables CI/CD, which is a Good Thing ™


3.4 local dev - cloud deploy

ryker faint

Get help from an adult (dev-ops professional)

but if you can't, list 'em and flip 'em

aws lambda list-functions | cfn-flip

4. Profit


Surple have 3 lambda data services


linear regression

'Degree Days vs Energy = Efficiency'

  • user triggered event
  • queries specific data based on user selection
  • user facing visualisation
  • vpc cold starts

etl duration


time-series analysis

'Smart Targets'

  • scheduled for all meters as ETL to DB
  • highlight 'out of character' energy use
  • user facing visualisation and notifications

etl duration


anomaly detection

'Smart Alarms'

  • scheduled for all meters as ETL to DB
  • highlight 'extreme' energy use
  • user email and notifications
  • this was extra fun/complicated
    • Ask me how

Practical notes


When is lambda?

(the right choice)

Good case

  • 'traditional' models
    • regression, timeseries, hopefully more...
  • per 'reasonable' data set
    • for each
  • 'now in a minute'
  • (not actually a minute, more like seconds)
  • 'bursty'
    • some, or lots of people need it then no one does


When is lambda?

(the right choice)

Bad case

  • 'fancy' models
    • RAM limits, CPU limits
  • whole scale
    • across all
  • immediate response
    • can't afford a cold start: 'lambda your lambda'
  • 24:7 flat load
    • need 100% load 100% of the time


Thank you

Slides available: Twitter: @DaveParr

'Smart Alarms' actually runs in

Troll face


Runtime Layers


  • The thing I want to use isn't in Python
    • or Go, NodeJS, C#, Java


  • use Runtime Layers