Zero to Singularity: Create, Tune, Deploy and Scale a Deep Neural Network in 90 Minutes
This notebook is part of a masterclass held at IBM Think on 13th of February 2019 in San Fransisco
In this exercise you will train a Keras DeepLearning model running on top of TensorFlow.
Note: For sake of bringing the training runtime down we’ve done two things
1) Used a softmax regression model over a Convolutional Neural Network
2) Trained only for one epoch instead of 20
This leads to approx. 5% less accuracy
Authors
Romeo Kienzler - Chief Data Scientist, IBM Watson IoT
Krishnamurthy Arthanarisamy - Architect, Watson Machine Learning Software Lab, Bangalore
Prerequisites
Please make sure the currently installed version of Keras and Tensorflow are matching the requirememts, if not, please run the two PIP commands below in order to re-install. Please restart the kernal before proceeding, please re-check if the versions are matching.
1 | import keras |
Using TensorFlow backend.
Current: 2.2.4
Expected: 2.2.5
1 | import tensorflow as tf |
Current: 1.13.1
Expected: 1.15.0
IMPORTANT !!!
If you ran the two lines below please restart your kernel (Kernel->Restart & Clear Output)
1 | !pip install keras==2.2.5 |
1.0 Train a MNIST digits recognition model
We start with some global parameters and imports
1 | #some learners constantly reported 502 errors in Watson Studio. |
1 | import keras |
1 | batch_size = 128 |
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Training a simple model
First we’ll train a simple softmax regressor and check what accuracy we get
1 | model = Sequential() |
WARNING:tensorflow:From /opt/conda/envs/Python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /opt/conda/envs/Python36/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 14s 230us/step - loss: 0.3857 - acc: 0.8886 - val_loss: 0.3232 - val_acc: 0.9110
Accuracy: 0.911
1 | #some cleanup from the previous run |
You should see an accuracy of approximately 90%. Now lets define a hyper-parameter grid including different activation functions and gradient descent optimizers. We’re optimizing over the grid using grid search (nested for loops) and store each model variant in a file. We then decide for the best one in order to deploy to IBM Watson Machine Learning.
1 | #define parameter grid |
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 16s 269us/step - loss: 0.4272 - acc: 0.8839 - val_loss: 0.2807 - val_acc: 0.9172
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 14s 233us/step - loss: 0.4151 - acc: 0.8871 - val_loss: 0.2874 - val_acc: 0.9162
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 16s 261us/step - loss: 0.5419 - acc: 0.8611 - val_loss: 0.3225 - val_acc: 0.9081
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 14s 238us/step - loss: 0.3358 - acc: 0.9009 - val_loss: 0.2107 - val_acc: 0.9391
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 14s 239us/step - loss: 0.3241 - acc: 0.9076 - val_loss: 0.2284 - val_acc: 0.9358
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 18s 308us/step - loss: 0.3640 - acc: 0.8943 - val_loss: 0.2737 - val_acc: 0.9186
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 22s 374us/step - loss: 0.2554 - acc: 0.9270 - val_loss: 0.1215 - val_acc: 0.9625
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 22s 367us/step - loss: 0.2377 - acc: 0.9315 - val_loss: 0.1378 - val_acc: 0.9588
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 24s 402us/step - loss: 0.2918 - acc: 0.9167 - val_loss: 0.1622 - val_acc: 0.9528
Model evaluation
Let’s have a look at all the models and see which hyper parameter configuration was the best one. You should see that relu and rmsprop gives you > 95% of accuracy on the validation set
1 | ls -ltr * |
-rw-r----- 1 dsxuser dsxuser 2289 Apr 15 22:31 rklib.py
-rw-r----- 1 dsxuser dsxuser 3276560 Apr 15 22:33 ker_func_mnist_model_2.sigmoid.rmsprop.0.9172.h5
-rw-r----- 1 dsxuser dsxuser 3276560 Apr 15 22:33 ker_func_mnist_model_2.sigmoid.adagrad.0.9162.h5
-rw-r----- 1 dsxuser dsxuser 4905120 Apr 15 22:34 ker_func_mnist_model_2.sigmoid.adadelta.0.9081.h5
-rw-r----- 1 dsxuser dsxuser 3276560 Apr 15 22:34 ker_func_mnist_model_2.tanh.rmsprop.0.9391.h5
-rw-r----- 1 dsxuser dsxuser 3276568 Apr 15 22:34 ker_func_mnist_model_2.tanh.adagrad.0.9358.h5
-rw-r----- 1 dsxuser dsxuser 4905128 Apr 15 22:35 ker_func_mnist_model_2.tanh.adadelta.0.9186.h5
-rw-r----- 1 dsxuser dsxuser 3276568 Apr 15 22:35 ker_func_mnist_model_2.relu.rmsprop.0.9625.h5
-rw-r----- 1 dsxuser dsxuser 3276568 Apr 15 22:35 ker_func_mnist_model_2.relu.adagrad.0.9588.h5
-rw-r----- 1 dsxuser dsxuser 4905128 Apr 15 22:36 ker_func_mnist_model_2.relu.adadelta.0.9528.h5
-rw-r----- 1 dsxuser dsxuser 2887392 Apr 15 22:38 my_best_model.tgz
systemml:
total 0
__pycache__:
total 4
-rw-r----- 1 dsxuser dsxuser 1584 Apr 15 22:26 rklib.cpython-36.pyc
scratch_space:
total 0
mkdir :
total 4
drwxr-x--- 3 dsxuser dsxuser 4096 Apr 15 22:30
Now it’s time to create a tarball out of your favorite model, please replace the name of your favorite model H5 file with “please-put-me-here”
1 | !tar -zcvf my_best_model.tgz ker_func_mnist_model_2.relu.rmsprop.0.9625.h5 |
ker_func_mnist_model_2.relu.rmsprop.0.9625.h5
2.0 Save the trained model to WML Repository
We will use watson_machine_learning_client
python library to save the trained model to WML Repository, to deploy the saved model and to make predictions using the deployed model.</br>
watson_machine_learning_client
can be installed using the following pip
command in case you are running outside Watson Studio:
!pip install watson-machine-learning-client --upgrade
1 | from watson_machine_learning_client import WatsonMachineLearningAPIClient |
2020-04-15 22:57:03,361 - watson_machine_learning_client.metanames - WARNING - 'AUTHOR_EMAIL' meta prop is deprecated. It will be ignored.
Please go to https://cloud.ibm.com/, login, click on the “Create Resource” button. From the “AI” category, please choose “Machine Learning”. Wait for the “Create” button to activate and click on “Create”. Click on “Service Credentials”, then “New Credential”, then “Add”. From the new entry in the table, under “ACTIONS”, please click on “View Credentials”. Please copy the whole JSON object to your clipboard. Now just paste the JSON object below so that you are able to use your personal instance of Watson Machine Learning.
1 | wml_credentials={ |
1 | client = WatsonMachineLearningAPIClient(wml_credentials) |
1 | model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "RCZHANG", |
1 | published_model = client.repository.store_model(model="my_best_model.tgz", meta_props=model_props) |
1 | published_model_uid = client.repository.get_model_uid(published_model) |
3.0 Deploy the Keras model
1 | client.deployments.list() |
------------------------------------ ------------------- ------ -------------- ------------------------ --------------- -------------
GUID NAME TYPE STATE CREATED FRAMEWORK ARTIFACT TYPE
e69bb10d-67ea-4fa9-9788-4bbf80edac85 k1_keras_mnist_clt1 online DEPLOY_SUCCESS 2020-04-15T22:59:44.929Z tensorflow-1.15 model
------------------------------------ ------------------- ------ -------------- ------------------------ --------------- -------------
To keep your environment clean, just delete all deployments from previous runs
1 | # client.deployments.delete("PASTE_YOUR_GUID_HERE_IF_APPLICABLE") |
1 | created_deployment = client.deployments.create(published_model_uid, name="k1_keras_mnist_clt1") |
#######################################################################################
Synchronous deployment creation for uid: 'cc79b7b9-ae8b-4cf1-b7a3-165b104867ab' started
#######################################################################################
INITIALIZING
DEPLOY_IN_PROGRESS...
DEPLOY_SUCCESS
------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e69bb10d-67ea-4fa9-9788-4bbf80edac85'
------------------------------------------------------------------------------------------------
Test the model
1 | #scoring_endpoint = client.deployments.get_scoring_url(created_deployment) |
https://us-south.ml.cloud.ibm.com/v3/wml_instances/07660d0f-85d9-48fe-bc53-fe800754f9f8/deployments/e69bb10d-67ea-4fa9-9788-4bbf80edac85/online
1 | x_score_1 = x_test[23].tolist() |
The answer should be: 5
1 | predictions = client.deployments.score(scoring_endpoint, scoring_payload) |
And the answer is!... 5