In this blogpost I will show you how to first create a simple Random Forest Classifier and then build an API with Flask. Let's start by building the model.

import numpy as np
import pandas as pd
from pathlib import Path
import pickle

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

p = Path('/notebooks/storage/data/Titanic')
df = pd.read_csv(f'{p}/train/train.csv', index_col=0)
df.tail(5)

What we want to predict is whether the Passenger has survived or not. We want a simple model, so let's only keep the following variables:

to_keep = ['Survived', 'Pclass', 'Sex', 'Age', 'Parch']
final_df = df.reset_index()[to_keep]

final_df.dtypes

Survived      int64
Pclass        int64
Sex          object
Age         float64
Parch         int64
dtype: object

final_df['Sex'] = np.where(final_df['Sex']=='male', 0, 1)

final_df.head()

Now that we've created our final dataframe, let's train the model.

First, we define the accuracy metric to see how our model is doing:

def accu(pred,y, threshold=0.5): return (np.round(pred-threshold+0.5) == y).mean()
def m_accu(m, xs, y): return np.round(accu(m.predict(xs), y), 3)

We then standardize our data, in this case only the age variable.

final_df = final_df.fillna(0)

scaler = StandardScaler()
final_df['Age'] = scaler.fit_transform(final_df['Age'].values.reshape(-1, 1))

Then we define the model:

def rf(xs, y, n_estimators=40, max_samples=500,
       max_features=0.5, min_samples_leaf=5, **kwargs):
    return RandomForestClassifier(n_jobs=-1, n_estimators=n_estimators,
        max_samples=max_samples, max_features=max_features,
        min_samples_leaf=min_samples_leaf, oob_score=True).fit(xs, y)

Then we split the data into train and validation.

col_names = ['Pclass', 'Sex', 'Age', 'Parch']
dep_var = 'Survived'

xs, xs_valid, y, y_valid = train_test_split(final_df[col_names], final_df[dep_var], test_size=0.33, random_state=42)

And finally let's train the model:

m = rf(xs, y);

m_accu(m, xs, y), m_accu(m, xs_valid, y_valid)

(0.836, 0.82)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_valid, np.round(m.predict(xs_valid)))

array([[165,  10],
       [ 43,  77]])

I guess that looks ok, so I will save the final model together with the standardizer.

def save_obj(obj, name ):
    with open(f'{name}.pkl', 'wb') as f:
        pickle.dump(obj, f)

def load_obj(name ):
    with open(f'{name}.pkl', 'rb') as f:
        return pickle.load(f)

filename = 'Flask_App/app/final_model.pkl'
pickle.dump(m, open(filename, 'wb'))
save_obj(scaler, 'Flask_App/app/standardizer')

Create an API with Flask

Now that we have our model we want to build an API. We will use Flask for this. This is how the folder structure should look like:

alt text

For our Flask App to run we need our terminal. Here are the bash commands you need to run to set up Flask:

python3 -m venv venv \ . venv/bin/activate \ pip install Flask \ pip install pandas numpy sklearn requests \ pip freeze > requirements.txt \ cd app \ export FLASK_APP=main.py \ export FLASK_ENV=development \ flask run

In our folder we create a virtual environment which I called venv. I then activate it an install Flask and some other packages we need to make our RF-Model work. I then save the requirements into a txt file. I go into the app folder, set the Flask App to main.py and the environment to development. Finally, I start flask. The output should look like this:

alt text

Next, we need to define a main.py and a utils.py. Let's start with the utils.py

With our API we want to send a GET request with values for our variables:

Pclass = 1
Age = 22.0
Sex = 0
Parch = 1

# load trained model and standardizer
PATH = "final_model.pkl"

loaded_model = pickle.load(open(PATH, 'rb'))

standardizer = load_obj('standardizer')

# create dataframe from input data
df = pd.DataFrame({'Pclass': [Pclass], 'Age': [Age], 'Sex':[Sex], 'Parch':[Parch]})

# define transform function
def transform_data(raw_data):
    raw_data = raw_data.fillna(0)
    raw_data['Age'] = standardizer.transform(raw_data['Age'].values.reshape(-1, 1))
    return raw_data

# define prediction function
def get_prediction(transformed_data):
    pred = loaded_model.predict_proba(transformed_data)
    return pred

Does it work?

get_prediction(transform_data(df))

array([[0.65921623, 0.34078377]])

Awesome! So our utils.py works, let's build our main.py.

from flask import Flask, request, jsonify
import csv
import pandas as pd

from utils import transform_data, get_prediction

app = Flask(__name__)

@app.route('/predict', methods=["GET"])
def predict():
    if request.method == 'GET':
        Pclass = request.args.get('Pclass')
        Age = request.args.get('Age')
        Sex = request.args.get('Sex')
        Parch = request.args.get('Parch')
        df = pd.DataFrame({'Pclass': [Pclass], 'Age': [Age], 'Sex':[Sex], 'Parch':[Parch]})
        transf_data = transform_data(raw_data)
        prediction = get_prediction(transf_data)
        prediction = prediction[0][1].item()
        # We take the first value of our predictions, representing the probability to survive.
        data = {'prediction': prediction}
        return jsonify(data)
    else:
        return jsonify({'error': 'Only GET requests possible'})

What this does is the following: it reads the data from the GET request, uses the utils.py for transform_data and get_predictions and returns the prediction for surviving.

Finally, we want to check our app. We build a test.py file where we send data to our "server", which will return a result. This is how the test.py file looks like:

import requests

# https://your-heroku-app-name.herokuapp.com/predict
# http://localhost:5000/predict

data = {'Pclass': 1, 'Age': 22.0, 'Sex': 0, 'Parch': 1}
r = requests.get("http://localhost:5000/predict", params=data)

Check our API:

Let's use our terminal to see whether our API is working.

alt text

Awesome! We built a Flask App to deploy our machine learning model and made it available as an API.

Stay tuned for the next blogpost! \ Lasse

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
PassengerId
887	0	2	Montvila, Rev. Juozas	male	27.0	0	0	211536	13.00	NaN	S
888	1	1	Graham, Miss. Margaret Edith	female	19.0	0	0	112053	30.00	B42	S
889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.45	NaN	S
890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.00	C148	C
891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.75	NaN	Q