Gaussian Process Regression using GPyTorch

Building a scalable and flexible GP model using GPyTorch

5 min readSep 21, 2021

Gaussian Process, or GP for short, is an underappreciated yet powerful algorithm for machine learning tasks. It is a non-parametric, Bayesian approach to machine learning that can be applied to supervised learning problems like regression and classification. Compared to other supervised learning algorithms, GP has several practical advantages: it can work well on small datasets and has the ability to provide uncertainty measurements on the predictions.

In this tutorial, I am going to demonstrate how to perform GP regression using GPyTorch. GPyTorch is a Gaussian process library implemented using PyTorch that is designed for creating scalable and flexible GP models. You can learn more about GPyTorch on their official website.

Note: This tutorial is not necessarily intended to teach the mathematical background of GP, but rather how to build one using GPyTorch. I highly recommend reading the Chapter 2 of Gaussian Processes for Machine Learning for a very thorough introduction to GP regression.

Setup

Before we start, we first need to install the gpytorch library. You can do this either by pip or conda using the following command:

# Install using pip
pip install gpytorch# Install using conda
conda install gpytorch -c gpytorch

You can also check the requirements and installation instructions on their website here.

Note: If you want to follow along with this tutorial, you can find the notebook of this tutorial here.

Generating the data

Next, we need to generate a training data for our model. We will be modeling the following function:

The true function for our GP model: A sine function with Gaussian noise. Image by Author.

We will evaluate the above function on 15 equally-spaced points from [0,1]. The generated training data is depicted in the following plot:

The generated training data by evaluating the true function on 15 equally-spaced points from [0,1]. Image by Author.

Building the model

Now that we have generated our training data, we can start building our GP model. GPyTorch offers a flexible way for us to build GP models, by constructing the components of the model by ourselves. This is analogous to building neural networks in the standard PyTorch library. For most GP regression models, you will need to construct the following components:

A GP Model: For exact (i.e. non-variational) GP models we will use gpytorch.models.ExactGP .
A likelihood function: The likelihood function for GP regression, we commonly use gpytorch.likelihoods.GaussianLikelihood .
A mean function: The prior mean of the GP. If you don’t know which mean function to use, gpytorch.means.ConstantMean() is usually a good place to start.
A kernel function: The prior covariance of the GP. We’ll use the Spectral Mixture (SM) kernel (gpytorch.kernels.SpectralMixtureKernels()) for this tutorial.
A multivariate normal distribution: The multivariate normal distribution in GP (gpytorch.distributions.MultivariateNormal)

We can build our GP model by constructing the above components as follows:

Code for building a GP model with constant mean function and spectral mixture (SM) kernel.

Let me breakdown the above code line-by-line:

The above GP model has two main components: the__init__ and forward method.
The __init__ method takes the training data and a likelihood as the inputs and constructs whatever objects are necessary for the model’s forward method. This will most commonly include objects like a mean function and a kernel function.
The forward method takes in the data x and returns a multivariate normal distribution with the prior mean and covariance evaluated at x.
Finally, we initialize the likelihood function for the GP model. Here, we use the Gaussian likelihood, which is the simplest likelihood function that assumes a homoskedastic noise model (i.e. all inputs have the same noise).

Training the model

Now that we have built the model, we can train the model to find the optimal hyperparameters. Training a GP model in GPyTorch is also analogous to training a neural network in the standard PyTorch library. The training loop mainly consists of the following steps:

Setting all the parameter gradients to zero
Calling the model and computing the loss
Calling backward on the loss to fill in gradients
Taking a step on the optimizer

Note: By defining our custom training loop, we can have greater flexibility in training our model. For example, it is easy to save the parameters at each step of training or use different learning rates for different parameters.

The code for the training loop is given below:

Training loop for GP models in GPyTorch. We use the negative marginal log-likelihood as the loss function and Adam as the optimizer.

In the above code, we first put our model into training mode by calling model.train() and likelihood.train() . Then, we define the loss function and optimizer that we want to use in the training process. Here, we use the negative marginal log-likelihood as the loss function and Adam as the optimizer. We also need to set the number of iterations for the training loop, say 50 iterations.

Making predictions with the model

Finally, we can make predictions using the trained model. The basic routine of evaluating the model and making predictions is given in the following code:

Code for evaluating the GP model and plotting the model fit.

There are a few things going on in the above code:

We first generate the test data using 50 equally-spaced points from [0, 5].
We put the model into evaluation mode by calling model.eval() and likelihood.eval() .
We use gpytorch.settings.fast_pred_var() to get faster predictive distributions using LOVE.
When put into the eval mode, the trained GP model returns a MultivariateNormal containing the posterior mean and covariance. Thus, we can obtain the predictive mean and covariance matrix from the multivariate normal distribution.
Finally, we plot the mean and confidence region of the fitted GP model. The confidence_region() method is a helper method that returns 2 standard deviations above and below the mean.

The resulting plot is depicted below:

Plot of the fitted GP model given by the mean (blue line) and confidence region (shaded area). The observed data (black stars) is also plotted in the figure. Image by Author.

The black stars in the above plot represent the training (observed) data, while the blue line and the shaded area represent the mean and the confidence bounds respectively. Notice how the uncertainty is reduced close to the observed points. If more data points were added, we would see the mean function adjust itself to pass through these points and the uncertainty would reduce close to the observations.

Takeaways

In this tutorial, we have learned how to build a scalable and flexible GP model using GPyTorch. Like any other libraries, there are still a lot of cool things that you can do with GPyTorch which I didn’t cover in this tutorial. For example, you can utilize a GPU to accelerate your model training when implementing state-of-the-art algorithms like Deep GP or stochastic variational deep kernel learning.

If you would like me to make more tutorials on GPyTorch or if you have any other suggestions, please let me know in the comments!

Thanks for Reading!

Follow me on Medium for more
Let’s connect on LinkedIn

If you enjoyed this, you might also like:

I Taught My Computer to Classify Chinese Calligraphy Styles

Building a Chinese calligraphy classifier using fast.ai

towardsdatascience.com

Analyzing My Own LinkedIn Data

How I visualize my LinkedIn network to gain insights