Serverless Computing, "Interactive Analytics", and Online Experiments


W. Ross Morrow  Research Computing Specialist and Developer , DARC

The DARC team sees a variety of online experimental work – Qualtrics surveys and other instruments – that require “interactive analytics.” For example, a research team wants to run a Qualtrics survey in which participants first answer one block of questions, and then get a second block of questions depending on the answers to the first. But determining what second block of questions depends on running a pre-trained python machine learning model which can’t (or can’t easily) be embedded as javascript in Qualtrics.

In this post we discuss how this can be done with “serverless computing” technologies. “Serverless” computing is an interesting feature of modern cloud platforms that aims to remove the need for you to run and maintain the “servers” behind webservices. That is, the literal computers that are set up to listen for network traffic and respond to it.

The basic principle is simple: you specify the code and the cloud provider runs it in a highly-available, fault-tolerant fashion. DARC has used two versions of this technology: AWS’s Lambda functions and Google Cloud Functions. We’ll occasionally refer to either with the generic term “cloud function” (lowercase), but a “Google Cloud Function” specifically with “Cloud Function” (capitalized). Neither service is perfect: Lambda functions are a bit more responsive during development and testing, but Cloud Functions are quite a bit easier to set up and actually work with.

So here’s roughly how the example above would work, as we cover in detail below: Presume you have your survey and (trained) machine learning model (say, in python). You build a cloud function that executes your model given the data from the first block of questions, and returns data defining what the second block should be. Defining this function with your cloud provider (AWS or Google) will give you a URL you can call. Then you can use custom javascript in Qualtrics to (i) call the URL, and thus execute the function, with response-specific data and (ii) interpret the response and use it to define the second block of the experiment.

In providing details we’ll actually switch this around and discuss the Qualtrics first. That is the “simpler” part, but also likely the most application-specific. But we’ll try to keep it general.

Qualtrics

We recommend storing your call URL as an embedded data field, and referencing that whenever you make calls to your cloud function. This way, you can make sure you only have to change one thing – the value of the embedded data – if your URL changes instead of possibly multiple locations in the custom javascript.

Cloud Function Implementation

Obviously you need a Google Cloud account to work with Cloud Functions. But once you do you can log into the console and access the specific Cloud Function page.

Requirements

To define your Cloud Function you need three things:

  • a main.py file, with a “handler” function (ours being “prediction”) for the Cloud Function to execute
  • a folder, model-obj, of all the relevant pickle files for any model objects
  • and a requirements.txt file specifying dependencies

The main.py file has two functions: prediction, the Cloud Function “handler”, and process, which actually does the processing:

def prediction( request ) :
    headers = { 'Access-Control-Allow-Origin': '*' }
    if request.method == 'OPTIONS':
        headers['Access-Control-Allow-Methods'] = 'GET'
        headers['Access-Control-Allow-Headers'] = 'Content-Type'
        headers['Access-Control-Max-Age'] = 'Content-Type'
        return ( '' , 204 , headers )
    return ( "prediction=" + process( request.args ) , 200 , headers )

This code allows the “handler” to be ready to handle cross-origin requests, which will occur if you call this Cloud Function from Qualtrics. Most of this is taken directly from Google’s documentation.

Google Cloud Functions written in python use Flask to handle HTTP requests. You can read about how Flask handles requests in python here to learn more. For us here, note that by passing request.args to process we assume we get the inputs as query parameters through a call like

https://us-central1-<account>.cloudfunctions.net/prediction?i1=1&i2=2&...

The process function, which is supposed to get this data and use it to evaluate the model(s), looks like:

def process( args ) :
    inputs , output = np.zeros( num_inputs ).reshape(1,-1) , []
    for i in range( num_inputs ) :
        if args.get( 'i%s' % (i+1) ) is not None : 
            inputs[0][i] = float( args.get( 'f%s' % (i+1) ) )
    for model in models:
        mdlobj = pickle.load( open( 'model-obj/' + str(model) + '.pkl' , 'rb' ) )
        pred = mdlobj.predict( inputs )[0]
        output.append( [ model , pred ] )
    return str(output)

This code simply parses inputs expected in the call into a vector, loads each model object, and then passes that vector to each model’s predict function. We’re presuming that the list models is stored as a global, say at the top of the file, and provides all the model object file names to use in creating the prediction. We expect these model objects to be in the folder model-obj, as suggested above. Your setup could be different, of course.

These operations depend on numpy and scikit-learn. We can include these in main.py with the usual

import numpy as np
import sklearn

statements. However will these modules exist in the environment used by the Cloud Function? Not unless we tell it to install them with our requirements.txt file, whose contents are simply:

numpy
sklearn

This is a list of packages that are required to run the function, and is standard in python. Google will install these packages, as usual, with pip.

Setup

To run these as a Cloud Function, archive main.py, model-obj/*.pkl, and requirements.txt together in a zip archive cloudfunc.zip. In a MacOSX/Unix terminal, you can do this with

$ zip -r cloudfunc.zip main.py requirements.txt model-obj/*.pkl
Make sure that when you make your archive you don't make an archive of the folder these files are in. You have to make an archive unpacks _exactly_ as it is when you are in the directory, as this is what Google will attempt to run from.

Then set up a Cloud Function in the console as follows:

You need to change the “Stage bucket” to one you own. But watching the memory is important, as model objects can be surprisingly large. In our example, they are about 500MB so we choose 1GB; you can (and should) use less memory if yours are smaller. You pay for the memory you use, so the smaller this is the cheaper your invocations will be. The “function to execute” is also important; it tells Google what to actually do when executing the function.

Google’s instructions for web console setup are here.

Note that Google is already telling you the address at which you can reach this function:

https://us-central1-<account>.cloudfunctions.net/prediction

as suggested above. A Cloud Function URL will always look like this, with

https://<region>-<account>.cloudfunctions.net/<function>

so you don’t really need Google to tell you what it is. You can write it out yourself given the region you choose, your account name, and the function’s name.

Once you’ve done that, you can hit “Create”. After the zip archive uploads, your browser should return to the main Cloud Function console. There you’ll see your function “deploying”:

After a while you’ll get the “green light” in the console

and that’s it!

Presuming there isn’t an error, of course. Google is actually pretty good about catching obvious errors in your code. Let’s put one in to see what happens. At the start of main.py, add the line

	print( "error" )

making sure you use a tab and not spaces in indenting this line. This time the deploy process will end with a warning icon:

In the prediction function’s General information tab, you should see an error message like:

If you like terminals, instead of web GUIs, you can install the gcloud tool and use the following command to deploy this function right from your machine:

$ gcloud functions deploy prediction \
	--runtime python37 \
	--trigger-http \
	--region us-central1 \
	--memory 1024MB \
	--entry-point prediction \
	--stage-bucket my-bucket
Deploying function (may take a while - up to 2 minutes)...

Note, though, that here we don’t use the zip archive. This tool will package everything in the current folder (unless you specify a different folder with the --source argument) for deployment. You should see the deployment working in your web console, if you have that open too.

When that completes, you get a nice print of the Cloud Function’s details:

availableMemoryMb: 1024
entryPoint: prediction
httpsTrigger:
  url: https://us-central1-<account>.cloudfunctions.net/prediction
labels:
  deployment-tool: cli-gcloud
name: projects/<account>/locations/us-central1/functions/prediction
runtime: python37
serviceAccountEmail: <account>@appspot.gserviceaccount.com
sourceArchiveUrl: gs://my-bucket/us-central1-projects/<account>/locations/us-central1/functions/prediction-cydzkbqbomri.zip
status: ACTIVE
timeout: 60s
updateTime: '2019-05-26T14:01:41Z'
versionId: '6'

Testing

We recommend testing your Cloud Function a couple ways.

If you can test a Cloud Function without alot of dependencies, particularly the large model objects we've mentioned, that's a good thing to try first. Upload a small zip archive with just the code and requirements, and then you can edit that code right in the Google Cloud Functions console to iron out obvious bugs first. This is faster and easier than working back and forth with the full package, if that is large.

First, use the Testing tab in the Cloud Function’s info page to run a spot check:

This should, of course, run but not really give you anything useful. When it doesn’t run, you can get valuable fast feedback about runtime errors in your code.

Second, use a browser to actually invoke the code, gauge the response, and review any logs in Stackdriver.

Performance

Our cloud function for our specific example returns results in 14-17 seconds, which is actually pretty slow. We suspect this time must be dominated by (i) installing numpy and sklearn and (ii) loading the model objects with every “invocation” (as we believe is required for a cloud function). If I start your code (modified a bit) up as a dedicated server, instead of a cloud function, these installs and loads can happen once and you can get responses more quickly (a second or so). But, as we discuss below, maintaining and scaling a server is nontrivial.

Costs

I have to mention costs, which are a bit complicated. In the attached “costs.xlx” I work up what I think the costs would be for 20,000 invocations of this function. I’m gathering you’re calling this once per experiment, and want 20,000 responses; but I may be wrong or misremember things. If you change the “invocations” number in the grey box, everything else should propagate. This is derived from Google’s Cloud Function Pricing page where you can read alot more. Bottom line is my expectation is that this scale of work would probably be cheap, like $5 or so. Running a dedicated server or container deployment would be more performant, if done right, but probably not any cheaper. But we (the DARC team) haven’t taken this path for an experiment before, so caveat emptor.

Lambda Implementation

Obviously you need an AWS account to work with Cloud Functions. But once you do you can log into the console and access the specific Lambda page.

Layers

Performance

Running Your Own Server

The simplicitly of the serverless approach isn’t apparent without the context of what is required to do the same thing with your own server.

Applications with “State”

Our main example above is “stateless” in that any particular call to the code in our cloud function is independent of any other call. The field of applications that can be accomodated using “serverless” cloud functions gets much larger when we allow our code to depend on some kind of global “state”, thereby making calls interdependent.

Dynamic Sampling

Suppose your “experiment” involves collecting responses to a survey defined by a fixed set of input data, where there are a finite number of inputs and you want the same number of responses per input. For example you have 100 items and your survey aims to gauge responses to specific descriptions of those items. If you want 5 observations per item, and want 20 observations per respondent, you might expect to gather data from 500/20 = 25 respondents. You could assign respondents 1 through 5 items 1 through 20, respondents 6 through 10 items 21 through 40, and so on. In principle, you would have 20 observations per respondent, and 5 observations per item.

In reality, data collection is never perfect: respondents leave experiments; networks fail; data gets lost or corrupted. If anyone drops out, or any data is lost, the setup breaks and you don’t have 5/20 observations per item/respondent. But the (good) data you do collect is likely data you want to be able to use, and any sub-selection of observations or respondents introduces sampling bias.

You might avoid this completely by analyzing only those experiments for which respondents completed all tasks.

You could, however, dynamically sample to at least try to approach the desired 5 observations per respondent as closely as possible. Let’s suppose you don’t pre-specify which respondents get what items. Each time a respondent gets presented with a task, the survey asks another server – perhaps a cloud function – for data about what to show. This server might even only need to randomly sample an item, with two caveats: (i) it must sample an item from the pool of items that that respondent hasn’t yet seen and (ii) is should sample from items that have been seen less than 5 times. We distinguish between “must” and “should” because both cannot be guaranteed to be satisfied. This type of sampling is possible, in fact rather easy, if we can update a “global state” with each request that keeps track of how many times each item has been seen and which items each respondent has seen.

On their own, cloud functions can’t have a “state” like this. But you can set one up using other services.

Active Learning

Let’s say your machine learning actually happens while you are collecting responses. That is, each response, or more likely some “batch” of responses, can change your model defining what to show future respondents.

The basic idea is probably

cloud function -> database -> trigger -> learning -> cloud function update

Summary