Machine learning as a service (mlaas) with sklearn and algorithmia – 【126kr】

My most recent post about analyzing my university’s gym crowdedness over the last year using machine learning generated a lot of great responses — including:

First, I’m sorry you feel that way about the gym and I sincerely hope the oncoming wave of New Years Resolutioners doesn’t completely crush your dreams of working out forever.

Second, yes. R database connection I’m in the process of creating more predictive models for the other ten campus locations Packd tracks. Database weak entity Machine learning requires a lot of data, so it may take some time before the models are ready for training.


I’ve since realized this requires a lot more explanation, and it’s the subject of this post. Data recovery vancouver My process may not be the best, but hopefully by illustrating it, I can learn from my mistakes and give you a starting point. Level 3 data recovery The rest of this post assumes you have basic knowledge of machine learning as a concept, you know Python, and you’re familiar with APIs. Database blob We’ll use scikit-learn for this tutorial. Database examples Developing a Model Locally

The first step is to create a machine learning model locally (on your computer) to test things out. Database knowledge The data for this is located here for downloading. H2 database You’ll need to install:

We’ll need to create a prediction model using the data. Nexus 5 data recovery The following is taken from my kernel over at Kaggle : import numpy as np # linear algebra

from sklearn.preprocessing import StandardScaler df = pd.read_csv(“data.csv”) # or wherever your data is located # Extract the training and test data

Let’s pause here for a second. Data recovery uk I’ve breezed over some important details in the code above. Database collation First of all, I didn’t have to modify any of my data (do things like convert column values of “yes / no” to 1/0) because I already did that part behind the scenes. Database yugioh Second, choosing the correct model is not a trivial thing, and I seem to have pulled the Random Forest Regressor out of thin air. Top 10 data recovery software free (I didn’t really, check my first post for details). Data recovery vancouver bc Last, why did I set n_estimators to 140? Normally we’d want to do a Grid Search over our hyperparameters to find the optimal n_estimators, but I did the hard work behind the scenes already (it takes a while).

The last line is the most important — it actually performs the learning part of the algorithm. 7 data recovery suite crack It changes the state of the variable model. Database normalization definition From here on out, we can throw new data at the model to make predictions: test_output = model.predict(X_test) # predicted values

Great! This isn’t really predicting the future yet — the test data was actually a quarter of the original historical data ( more on the test/train data split ). Data recovery wizard If we want to really predict the future, we need to do a lot more work.

In order to predict how crowded the gym will be in the future, we need a new array of column values where everything except the target label, number_people , is filled in. Data recovery video The details of how to fill these in probably will take another post, but the general idea is to create several datetimes in the future, and for each one, add each feature value (the timestamp, day of the week, will it be a weekend, will it be a holiday, will it be the start of the semester, etc). Database query example For weather feature values, I used the Dark Sky API to make forecasts.

Let’s say you now have an array of times in the future for which you want to predict how crowded the gym will be. Database migration Here’s how you do it: X_predict = generate_prediction_data() # your special call here

Congratulations! The predictions variable now holds the predicted number of people for each of your datetimes in the future. Data recovery free Now, if you wanted, you could generate predictions for the rest of the year, or the century, or whenever, and call that your predicted values. Database vs spreadsheet However, this seems like it would take a huge amount of memory and probably wouldn’t be very accurate the further in the future you go. Database name sql It would be much better to store the model we created and only bring it out when we want to make predictions. Database management Here’s how in sklearn: from sklearn.externals import joblib # How to save the model:

joblib.dump(model, “prediction_model.pkl”) # You can quit your Python process here, even shut off your computer. Data recovery miami # Later, if you want to load the saved model back into memory:

It’s easy to store already-trained models in scikit-learn. H2 database viewer Of course, your users can’t call you on the phone every time they want predictions so you can generate them on your computer. Database replication They’ll need to make an API call of some kind to a service you provide that does the predicting. Database record Storing Your Model in the Cloud

You could store the prediction_model.pkl file on your server and load it whenever someone makes a request for predictions, but turns out loading this file uses up a lot of memory. Data recovery certification In my case, more than I was allotted by my hosting company. Database programs It’s better to host your machine learning model on an entirely different server. Raid 1 data recovery You have a few options here from big companies, including Google , Microsoft , and Amazon .

banner