One of the many facets of deep learning is the selection of appropriate model hyper parameters. How do we decide, for example, the number of hidden units in each layer? Or even the number of layers in our model? Or, arguably one of the most important hyperparameters, the learning rate? These types of architectural choices are governed by hyperparameter optimization techniques such as grid search, random search, and Bayesian optimization. In this post, I’ll cover how to do random search using Hyperas.

Hyperas is a wrapper of Hyperopt for Keras. It’s constantly evolving, so there are relatively few examples online right now aside from those provided in the readme. What happens if you want to build a more complicated model?

In my example below, the task is multiclass classification of epidemic curves. For example, each row of the training data looks like this:

[4, 5, 10, 15, 6, 2, 0, …, 0]

where the last column is the label of the epidemic curve (the example above belongs to class 0) and the epidemic runs for T days, where T is the number of elements in the vector minus 1 (for the class label). Each value is the count of the number of infectious individuals on each day. Day 1 has 4 infectious individuals; day 2 has 5, etc. The goal is to classify the epidemic curve into one of a set of pre-defined categories based on the shape of the curve. In this example, there are 8 categories.

I’ve provided an example using an LSTM. Keras is incredibly versatile and relatively user-friendly. If you’d rather use a GRU for example, all you have to do is exchange ‘LSTM’ for ‘GRU’ in this code.

To run the code below, you will need to have installed numpy, scipy, sklearn, keras, and hyperas. You will also need a separate file in the same directory:

The file globalvars.py contains one line: globalVar=0

More on the global variable later.

from __future__ import print_function from hyperopt import Trials, STATUS_OK, tpe, space_eval import keras.optimizers, keras.initializers from keras.regularizers import l2 from keras.callbacks import EarlyStopping, ModelCheckpoint from keras.layers import LSTM from keras.layers.core import Dense, Dropout, Activation from keras.models import Sequential from keras.utils import np_utils from sklearn.preprocessing import MinMaxScaler from hyperas import optim from hyperas.distributions import choice, uniform, conditional import numpy, json import globalvars def data(): train_file='train_pl.csv' trainset = numpy.loadtxt(train_file, delimiter=",") X = trainset[:, 0:(trainset.shape[1]-2)] Y = (trainset[:,trainset.shape[1]-1]).astype(int) scaler = MinMaxScaler(feature_range=(0, 1)) X_scaled = scaler.fit_transform(X) y_binary = np_utils.to_categorical(Y) num_per_class = int(float(X.shape[0])/(float(y_binary.shape[1]))) to_take = numpy.random.choice(num_per_class, int(num_per_class*0.2), replace=False) class_split = numpy.array_split(X_scaled, y_binary.shape[1]) val_list = [x[to_take] for x in class_split] big_list = [item for sublist in val_list for item in sublist] val_X = numpy.asarray(big_list) label_set = numpy.arange(0, y_binary.shape[1]) val_Y = numpy.repeat(label_set, int(num_per_class*0.2)) val_Y = np_utils.to_categorical(val_Y) setdiffval = set(range(num_per_class)) - set(to_take) setdiffval = list(setdiffval) X_train_vals = [x[setdiffval] for x in class_split] X_train = [item for sublist in X_train_vals for item in sublist] X_train = numpy.asarray(X_train) Y_train = numpy.repeat(label_set, int(num_per_class*0.8)) Y_train = np_utils.to_categorical(Y_train) x_train = X_train x_test = val_X y_train = Y_train y_test = val_Y return (x_train, y_train, x_test, y_test) def model(x_train, y_train, x_test, y_test): model = Sequential() model.add(LSTM({{choice([10, 20, 50, 100])}}, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=False)) model.add(Activation('relu')) if conditional({{choice(['one', 'two'])}}) == 'two': model.add(Dense({{choice([10, 20, 30, 100])}})) model.add(Activation('relu')) model.add(Dense(8)) model.add(Activation('softmax')) adam = keras.optimizers.Adam(lr={{choice([10**-6, 10**-5, 10**-4, 10**-3, 10**-2, 10**-1])}}, clipnorm=1.) rmsprop = keras.optimizers.RMSprop(lr={{choice([10**-6, 10**-5, 10**-4, 10**-3, 10**-2, 10**-1])}}, clipnorm=1.) sgd = keras.optimizers.SGD(lr={{choice([10**-6, 10**-5, 10**-4, 10**-3, 10**-2, 10**-1])}}, clipnorm=1.) choiceval = {{choice(['adam', 'sgd', 'rmsprop'])}} if choiceval == 'adam': optim = adam elif choiceval == 'rmsprop': optim = rmsprop else: optim = sgd model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer=optim) globalvars.globalVar += 1 filepath = "weights_mlp_hyperas" + str(globalvars.globalVar) + ".hdf5" earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=20, verbose=0, mode='auto') checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks_list = [earlyStopping, checkpoint] hist = model.fit(x_train, y_train, batch_size={{choice([64, 128])}}, epochs=100,verbose=2,validation_data=(x_test, y_test), callbacks=callbacks_list) h1 = hist.history acc_ = numpy.asarray(h1['acc']) loss_ = numpy.asarray(h1['loss']) val_loss_ = numpy.asarray(h1['val_loss']) val_acc_ = numpy.asarray(h1['val_acc']) acc_and_loss = numpy.column_stack((acc_, loss_, val_acc_, val_loss_)) save_file_mlp = 'mlp_run_' + '_' + str(globalvars.globalVar) + '.txt' with open(save_file_mlp, 'w') as f: numpy.savetxt(save_file_mlp, acc_and_loss, delimiter=" ") score, acc = model.evaluate(x_test, y_test, verbose=0) print('Final validation accuracy:', acc) return {'loss': -acc, 'status': STATUS_OK, 'model': model} if __name__ == '__main__': trials=Trials() best_run, best_model = optim.minimize(model=model, data=data,algo=tpe.suggest,max_evals=500,trials=trials) X_train, Y_train, X_test, Y_test = data() print("Evaluation of best performing model:") print(best_model.evaluate(X_test, Y_test)) print("Best performing model chosen hyper-parameters:") print(best_run) json.dump(best_run, open("best_run.txt", 'w'))

Let’s do a quick walkthrough.

First, a few tips:

Now for the walkthrough.

**data()**

In the data() function, I’ve read in my training data, which had 6400 rows.

Since the daily observed counts of infectious individuals are integer-based, and deep learning models perform better with real-valued input, I centre and scale my input values.

Then, I format my output (y_train and y_test) each as a matrix of categories. That is, for class 0, 1, …, 7 I create an identity matrix with the same number of rows as my training set and 8 columns that identifies the class of each epidemic (to_categorical is needed to do this).

Also, I wanted to use a portion of my training data for my validation set. There are a few ways to accomplish this, but I’ve chosen to do the splitting here in the data() function.

Since this is a time series model, I have to reshape my data to be compatible with a Keras time series model – the last two lines of the data() function ensure this compatibility.

**model()**

Now for the fun part!

The model definition is fairly straightforward in Keras. It’s all very modular, and modules can be swapped as necessary to achieve your desired model. Hyperas is also very intuitive: you can define a set of categories to perform random search for each desired parameter value.

For example, `{{choose[10, 20]}}`

means select either 10 or 20 with equal probability.

You can also allow the parameter value to be generated according to a distribution, such as `{{uniform(0, 1)}}`

.

Now, about the `globalVar`

part: this keeps track of which evaluation we’re on. For example, I define `max_evals=500`

in the `main`

function, which means I’ll try 500 different hyperparameter sets, so globalVar will iterate from 0 to 499.

Helpfully, Keras can be used with either the TensorFlow or the Theano backend. If you’re using the TensorFlow backend, you can probably send your results to TensorBoard. I’m using the Theano backend because I’ve read it’s faster (especially for time series models), so I have a separate way of visualizing my results, which I’ll go through below.

To run the code below, you will need to have installed matplotlib and h5py in addition to the modules above. This is called after_hyperas.py in my script (which I’ll show later)

#!/usr/bin/env python import numpy as np import sys, keras from keras import optimizers from keras.models import Sequential from keras.callbacks import EarlyStopping, ModelCheckpoint from keras.layers import Dense, LSTM, Activation from keras.wrappers.scikit_learn import KerasClassifier from keras.regularizers import l2 from sklearn.preprocessing import MinMaxScaler, LabelEncoder from keras.utils import np_utils from keras.utils.np_utils import to_categorical import matplotlib matplotlib.use('pdf') import matplotlib.pyplot as plt import matplotlib.patches as mpatches import argparse, os, re, string, sys import numpy as np import json def load_data(train_file, test_file): trainset = np.loadtxt(train_file, delimiter=",") # split into input (X) and output (Y) variables X = trainset[:, 0:(trainset.shape[1]-2)] #0:61 (count from 0, there are 62 cols) Y = (trainset[:,trainset.shape[1]-1]).astype(int) #column 63 counting from 0 scaler = MinMaxScaler(feature_range=(0, 1)) X_scaled = scaler.fit_transform(X) y_binary = to_categorical(Y) testset = np.loadtxt(test_file, delimiter=",") X_test = testset[:,0:(testset.shape[1]-2)] X_test_scaled = scaler.fit_transform(X_test) Y_test = (testset[:,testset.shape[1]-1]).astype(int) ytest_binary = to_categorical(Y_test) return (X_scaled, y_binary, X_test_scaled, ytest_binary) def plot_acc_loss(filename, savefigname): with open(filename, 'r') as f: acc_loss_file = np.loadtxt(filename, delimiter=" ", unpack=True) acc_fig = plt.figure() plt.plot(acc_loss_file[0,:], 'r', acc_loss_file[2,:], 'b') plt.title('Training and validation accuracy') plt.xlabel('Epoch') plotmax = np.array([acc_loss_file[0,:], acc_loss_file[2,:]]) plotmax = plotmax.flatten() plt.axis([0, 2, 0, max(plotmax)]) red_line = mpatches.Patch(color='red', label='training') blue_line = mpatches.Patch(color='blue', label='validation') plt.legend(handles=[red_line, blue_line], loc='lower right') savefig_acc = savefigname + '_acc.png' acc_fig.savefig(savefig_acc) loss_fig = plt.figure() plt.plot(acc_loss_file[1,:], 'r', acc_loss_file[3,:], 'b') plt.title('Training and validation loss') plt.xlabel('Epoch') plotmax = np.array([acc_loss_file[0,:], acc_loss_file[2,:]]) plotmax = plotmax.flatten() plt.axis([0, 2, 0, max(plotmax)]) red_line = mpatches.Patch(color='red', label='training') blue_line = mpatches.Patch(color='blue', label='validation') plt.legend(handles=[red_line, blue_line], loc='lower right') savefig_loss = savefigname + '_loss.png' loss_fig.savefig(savefig_loss) return 0 def get_confusion_matrix_one_hot(model_results, truth): '''model_results and truth should be for one-hot format, i.e, have >= 2 columns, where truth is 0/1, and max along each row of model_results is model result ''' assert model_results.shape == truth.shape num_outputs = truth.shape[1] confusion_matrix = np.zeros((num_outputs, num_outputs), dtype=np.int32) predictions = np.argmax(model_results,axis=1) assert len(predictions)==truth.shape[0] for actual_class in range(num_outputs): idx_examples_this_class = truth[:,actual_class]==1 prediction_for_this_class = predictions[idx_examples_this_class] for predicted_class in range(num_outputs): count = np.sum(prediction_for_this_class==predicted_class) confusion_matrix[actual_class, predicted_class] = count assert np.sum(confusion_matrix)==len(truth) assert np.sum(confusion_matrix)==np.sum(truth) return confusion_matrix def lstm_repeat(X, Y, Xtest, Ytest, params_to_use, num_reps=10): #Take the best parameters and run through the entire training set choiceval = params_to_use['choiceval'] batch_s = params_to_use['batch_size'] lr_choice_1 = params_to_use['lr'] lr_choice_2 = params_to_use['lr_1'] lr_choice_3 = params_to_use['lr_2'] condval = params_to_use['conditional'] if batch_s == 0: batch_s = 64 else: batch_s = 128 possible_lr = [10**-6, 10**-5, 10**-4, 10**-3, 10**-2, 10**-1] possible_layers = [10, 20, 30, 100] if choiceval == 0: optim = keras.optimizers.Adam(lr=possible_lr[lr_choice_1], clipnorm=1.) elif choiceval == 1: optim = keras.optimizers.RMSProp(lr=possible_lr[lr_choice_2], clipnorm=1.) else: optim = keras.optimizers.SGD(lr=possible_lr[lr_choice_3], clipnorm=1.) h_dim = params_to_use['LSTM'] h_2_dim = params_to_use['Dense'] hid_val = possible_layers[h_dim] hid_2_val = possible_layers[h_2_dim] if condval == 1: condval = True def lstm_model(hid_dim=10, hid_2_dim=10, optimizer='sgd', condval=False) model = Sequential() model.add(LSTM(hid_val, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=False)) model.add(Activation('relu')) if condval == 1: model.add(Dense(hid_2_val)) model.add(Activation('relu')) model.add(Dense(8)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer=optim) #Need this for all classification errors class_error = [] #X has 64000 examples, 800 of each class (8 classes). Take 20% of each class and make this the validation set. to_take = np.random.choice(800, int(800*0.2), replace=False) class_split = np.array_split(X, 8) #16 equal slices of 800 elements each val_list = [x[to_take] for x in class_split] big_list = [item for sublist in val_list for item in sublist] val_X = np.asarray(big_list) label_set = np.arange(0, 8) #0 to 4 val_Y = np.repeat(label_set, int(800*0.2)) val_Y = to_categorical(val_Y) setdiffval = set(range(800)) - set(to_take) setdiffval = list(setdiffval) X_train_vals = [x[setdiffval] for x in class_split] X_train = [item for sublist in X_train_vals for item in sublist] X_train = np.asarray(X_train) Y_train = np.repeat(label_set, int(800*0.8)) Y_train = to_categorical(Y_train) for j in range(num_reps): print 'on rep #', j filepath = "weights_lstm_2" + str(j) + ".hdf5" #for saving the best weights model = lstm_model(hid_dim=hid_val, hid_2_dim=hid_2_val, optimizer=optim, condval=condval) earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=100, verbose=0, mode='auto') checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks_list = [earlyStopping, checkpoint] hist = model.fit(X_train, Y_train, validation_data=(val_X, val_Y), epochs=500, callbacks=callbacks_list) model.load_weights(filepath) #everything subsequent should use the best weights - this works according to test.py scores = model.evaluate(X, Y) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) #training loss at the end of training. https://github.com/fchollet/keras/issues/605 h1 = hist.history acc_ = np.asarray(h1['acc']) loss_ = np.asarray(h1['loss']) val_loss_ = np.asarray(h1['val_loss']) val_acc_ = np.asarray(h1['val_acc']) #Save the accuracy and loss, and plot acc_and_loss = np.column_stack((acc_, loss_, val_acc_, val_loss_)) save_file_lstm = 'lstm_' + str(j) + '.txt' with open(save_file_lstm, 'w') as f: np.savetxt(save_file_lstm, acc_and_loss, delimiter=" ") print 'saved file', save_file_lstm if j == 1: #Plot the training and validation accuracy and loss for the entire training set. plot_acc_loss(save_file_lstm, 'testing') #Run the final, trained model on the test set and return a confusion matrix test_scores = model.evaluate(Xtest, Ytest) #evaluate returns the metric 'accuracy' on the test set print 'eval_scores', test_scores[1]*100 predict = model.predict(Xtest) #I think predict returns the class probabilities con_mat = get_confusion_matrix_one_hot(predict, Ytest) print con_mat save_con_mat = 'confusionmatrix_lstm' + '_' + str(j) + '.txt' np.savetxt(save_con_mat, con_mat, fmt='%i', delimiter=",") #Get the classification error per class, and the average classification error class_error_j = [] for i in range(0, con_mat.shape[1]): #for the number of classes class_i_correct = float(con_mat[i][i])/float(sum(con_mat[i])) class_error_j.append(1. - class_i_correct) print class_error_j class_error_j.append(sum(class_error_j)/con_mat.shape[1]) #the average classification error class_error.append(class_error_j) #each class's classification error, and the average classification error for this run print class_error print np.asarray(class_error) np.savetxt('class_error.txt', class_error, fmt = '%1.3f', delimiter=' ') return 0 if __name__=='__main__': best_params = json.load(open('best_run.txt')) X, Y, Xtest, Ytest = load_data('train_pl.csv', 'test_pl.csv') lstm_repeat(X, Y, Xtest, Ytest, best_params)</code>

The matplotlib function allows me to plot confusion matrices, one for each run. For an explanation of confusion matices, see the highest voted answer here.

When I run this entire set, I have a separate script called hyperas_and_after.sh

`#/bin/bash`

python hyperas_test_mlp.py && python after_hyperas.py

My full process for running all code is:

– `screen`

– create a virtual environment using `venv name_of_my_env`

– `source activate name_of_my_env`

– install the necessary packages (may need to preload libgfortran)

– `./hyperas_and_after.sh`

A disclaimer is important at this point: a lot of the code seen here is from various sources, compiled from answers in StackExchange and readmes. This post is mostly here to serve as a personal reminder of how to use Hyperas. Some of the places I got code from are:

get_confusion_matrix_one_hot: https://github.com/davidslac/mlearntut/blob/master/ex02_keras_train.py

LSTMs in Keras: http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/

Hyperas: https://github.com/maxpumperla/hyperas/blob/master/README.md