Skip to content

Latest commit

 

History

History
492 lines (341 loc) · 10.4 KB

slides.md

File metadata and controls

492 lines (341 loc) · 10.4 KB

Snakes And Lambdas

  • what is lambda
  • why is lambda (for data science)
  • how is lambda (to actually use)
  • when is lambda (the right choice)

---~

What is lambda?

  • serverless

    • it runs on servers, you just don't deal with that
  • scaleable

    • it only costs when running, you just pay more as it does more
  • micro-service

    • it only does one small thing, you just have lots of different ones

---~

What is lambda?

  1. Start a new lambda
  2. Set up lambda with user code - "cold start"
  3. Accept event 1 and process
  4. Wait
    1. Accept event 2 and process - "warmed up"
    2. Timeout and kill lambda

There can be multiple concurrent machines too

---~

What is lambda?

  • Pay only for what you use
  • Manage only what you have to
  • Deal with extra/new/bursty traffic seamlessly

---~

What is lambda?

  • Invocation payload (request and response)
    • 6 MB (synchronous)
    • 256 KB (asynchronous)
  • Deployment package size
    • 50 MB (zipped, for direct upload)
    • 250 MB (unzipped, including layers)
    • 3 MB (console editor)

---~

Why is lambda?

(for data science)

  • data-scientists != dev-ops professionals
    • but our work needs to be 'released'
  • all data projects != ensemble xg-boost Keras TPU shenanigans
    • "No ML is easier to manage than no ML" © @julsimon
  • data-projects != single-goal monolithic systems
    • separate concerns, code bases and complication

---~

How is lambda?

(to actually use)

  1. write your python
  2. lambda your python
  3. ???
  4. profit

1. Write your python

from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = 1.6*x + np.random.random(10)
slope, intercept, r_value, p_value, std_err = 
    stats.linregress(x, y)

2. Lambda your python

  • event driven
    • an event is passed to a handler function
  • json formatted
    • events are json
    • handler functions return json

---~

2. Lambda your python

import json
from scipy import stats
import numpy as np

def lambda_handler(event, context):
    np.random.seed(12345678)
    x = np.random.random(10)
    y = 1.6*x + np.random.random(10)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) 
    return_body = {
        "m": slope, "c": intercept,"r2": r_value ** 2, 
        "p": p_value, "se": std_err
    }
    return {"body": json.dumps(return_body)}

---~

2. Lambda your python

full console

---~

2. Lambda your python

editor

---~

2. Lambda your python

editor


3. ???????

(1. Layers)

  • json is built in by default
    • so it boto3

PROBLEM

  • lambda doesn't pip install ....

SOLUTION

---~

3.1 Layers

layers

---~

3.1 Layers

editor


3. ???????

(2 Custom Layers)

PROBLEM

  • New requirement needs pandas

SOLUTION

  • Create custom layer
    • pre-compiled code on a specific path deployed as a .zip
    • for 'any package' * using some shell and docker
      • * YMMV

---~

3.2 Custom Layers

requirements.txt

pandas==0.23.4
pytz==2018.7

get_layer_packages.sh

#!/bin/bash

export PKG_DIR="python"

rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR}

docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \
    pip install -r requirements.txt --no-deps -t ${PKG_DIR}

---~

3.2 Custom Layers

execute.sh

chmod +x get_layer_packages.sh
./get_layer_packages.sh
zip -r pandas.zip . -i "python/*"

Then upload + create as layer with aws-cli or manually with console

---~  

3.2 Custom Layers

'any package' *

  • pandas
  • pymysql
    • lambda needs to be inside the same VPC
  • statsmodels

3. ???????

(3 api gateway)

PROBLEM

  • How does your team use your work?

SOLUTION

---~

3.3 Api gateway

api gateway console

---~

3.3 Api gateway

api gateway test

---~

3.3 Api gateway

Get help from an adult (dev-ops professional)

ryker help

---~

3.3 Api gateway

If you can't find an adult

  • be careful about exposing the api
    • not obvious how and where it can be accessed
      • resource policies
  • swagger is an api templating syntax
    • cloud formation
  • click the 'deploy api' button after every change
    • use multiple stages

3. ???????

(4 local dev - cloud deploy)

PROBLEM

  • copy-pasta code into console is bad

SOLUTION

  • use AWS SAM cli
    • local development + testing with docker
    • 'cloudy' deployment with cloudformation cli

---~

3.4 local dev - cloud deploy

sam squirrel

---~

3.4 local dev - cloud deploy

Usage: sam [OPTIONS] COMMAND [ARGS]...

Commands:
  local     Run your Serverless application locally 
            for quick development & testing.
  logs      Fetch logs for a function
  deploy    Deploy an AWS SAM application. This is an alias 
            for 'aws cloudformation deploy'.
  build     Build your Lambda function code
  publish   Publish a packaged AWS SAM template to the AWS 
            Serverless Application Repository.
  init      Initialize a serverless application.
  validate  Validate an AWS SAM template.
  package   Package an AWS SAM application. This is an alias 
            for 'aws cloudformation package'.

---~

3.4 local dev - cloud deploy

Workflow

  • sam init
  • sam local generate-event apigateway aws-proxy
    • sam build
    • sam local invoke -e event.json ---~

3.4 local dev - cloud deploy

alias playitsam='sam build && sam local invoke -e event.json'
alias playitagainsam='sam build && sam local invoke -e'

---~

3.4 local dev - cloud deploy

  • sam validate
  • sam package
  • sam deploy

---~

3.4 local dev - cloud deploy

Transform: 'AWS::Serverless-2016-10-31'
Resources:
  RegressionFunction:
    # This resource creates a Lambda function.
    Type: 'AWS::Serverless::Function'
    Properties:
      # This function uses the python 3.7 runtime.
      Runtime: python3.7
      # This is the Lambda function's handler.
      Handler: app.lambda_handler
      # The location of the Lambda function code.
      CodeUri: ./regression
      # Event sources to attach to this function. In this case, we are attaching
      # one API Gateway endpoint to the Lambda function. The function is
      # called when a HTTP request is made to the API Gateway endpoint.
      Events:
        RegressionApi:
            # Define an API Gateway endpoint that responds to HTTP GET at /regression
            Type: Api
            Properties:
                Path: /regression
                Method: GET

This enables CI/CD, which is a Good Thing ™

---~

3.4 local dev - cloud deploy

ryker faint

Get help from an adult (dev-ops professional)

but if you can't, list 'em and flip 'em

aws lambda list-functions | cfn-flip

4. Profit

surple

Surple have 3 lambda data services

---~

linear regression

'Degree Days vs Energy = Efficiency'

  • user triggered event
  • queries specific data based on user selection
  • user facing visualisation
  • vpc cold starts

etl duration

---~

time-series analysis

'Smart Targets'

  • scheduled for all meters as ETL to DB
  • highlight 'out of character' energy use
  • user facing visualisation and notifications

etl duration

---~

anomaly detection

'Smart Alarms'

  • scheduled for all meters as ETL to DB
  • highlight 'extreme' energy use
  • user email and notifications
  • this was extra fun/complicated
    • Ask me how

Practical notes

---~

When is lambda?

(the right choice)

Good case

  • 'traditional' models
    • regression, timeseries, hopefully more...
  • per 'reasonable' data set
    • for each
  • 'now in a minute'
  • (not actually a minute, more like seconds)
  • 'bursty'
    • some, or lots of people need it then no one does

---~

When is lambda?

(the right choice)

Bad case

  • 'fancy' models
    • RAM limits, CPU limits
  • whole scale
    • across all
  • immediate response
    • can't afford a cold start: 'lambda your lambda'
  • 24:7 flat load
    • need 100% load 100% of the time

---~

Thank you

Slides available: https://github.com/DaveParr/snakes_and_lambdas Twitter: @DaveParr


'Smart Alarms' actually runs in

Troll face

---~

Runtime Layers

PROBLEM

  • The thing I want to use isn't in Python
    • or Go, NodeJS, C#, Java

SOLUTION

  • use Runtime Layers