Gaussian Process Regression using GPyTorch
Building a scalable and flexible GP model using GPyTorch
Gaussian Process, or GP for short, is an underappreciated yet powerful algorithm for machine learning tasks. It is a non-parametric, Bayesian approach to machine learning that can be applied to supervised learning problems like regression and classification. Compared to other supervised learning algorithms, GP has several practical advantages: it can work well on small datasets and has the ability to provide uncertainty measurements on the predictions.
In this tutorial, I am going to demonstrate how to perform GP regression using GPyTorch. GPyTorch is a Gaussian process library implemented using PyTorch that is designed for creating scalable and flexible GP models. You can learn more about GPyTorch on their official website.
Note: This tutorial is not necessarily intended to teach the mathematical background of GP, but rather how to build one using GPyTorch. I highly recommend reading the Chapter 2 of Gaussian Processes for Machine Learning for a very thorough introduction to GP regression.
Setup
Before we start, we first need to install the gpytorch
library. You can do this either by pip or conda using the following command:
# Install using pip
pip install gpytorch# Install using conda
conda install gpytorch -c gpytorch
You can also check the requirements and installation instructions on their website here.
Note: If you want to follow along with this tutorial, you can find the notebook of this tutorial here.
Generating the data
Next, we need to generate a training data for our model. We will be modeling the following function:
We will evaluate the above function on 15 equally-spaced points from [0,1]. The generated training data is depicted in the following plot:
Building the model
Now that we have generated our training data, we can start building our GP model. GPyTorch offers a flexible way for us to build GP models, by constructing the components of the model by ourselves. This is analogous to building neural networks in the standard PyTorch library. For most GP regression models, you will need to construct the following components:
- A GP Model: For exact (i.e. non-variational) GP models we will use
gpytorch.models.ExactGP
. - A likelihood function: The likelihood function for GP regression, we commonly use
gpytorch.likelihoods.GaussianLikelihood
. - A mean function: The prior mean of the GP. If you don’t know which mean function to use,
gpytorch.means.ConstantMean()
is usually a good place to start. - A kernel function: The prior covariance of the GP. We’ll use the Spectral Mixture (SM) kernel (
gpytorch.kernels.SpectralMixtureKernels()
) for this tutorial. - A multivariate normal distribution: The multivariate normal distribution in GP (
gpytorch.distributions.MultivariateNormal
)
We can build our GP model by constructing the above components as follows:
Let me breakdown the above code line-by-line:
- The above GP model has two main components: the
__init__
andforward
method. - The
__init__
method takes the training data and a likelihood as the inputs and constructs whatever objects are necessary for the model’sforward
method. This will most commonly include objects like a mean function and a kernel function. - The
forward
method takes in the datax
and returns a multivariate normal distribution with the prior mean and covariance evaluated atx
. - Finally, we initialize the likelihood function for the GP model. Here, we use the Gaussian likelihood, which is the simplest likelihood function that assumes a homoskedastic noise model (i.e. all inputs have the same noise).
Training the model
Now that we have built the model, we can train the model to find the optimal hyperparameters. Training a GP model in GPyTorch is also analogous to training a neural network in the standard PyTorch library. The training loop mainly consists of the following steps:
- Setting all the parameter gradients to zero
- Calling the model and computing the loss
- Calling backward on the loss to fill in gradients
- Taking a step on the optimizer
Note: By defining our custom training loop, we can have greater flexibility in training our model. For example, it is easy to save the parameters at each step of training or use different learning rates for different parameters.
The code for the training loop is given below:
In the above code, we first put our model into training mode by calling model.train()
and likelihood.train()
. Then, we define the loss function and optimizer that we want to use in the training process. Here, we use the negative marginal log-likelihood as the loss function and Adam as the optimizer. We also need to set the number of iterations for the training loop, say 50 iterations.
Making predictions with the model
Finally, we can make predictions using the trained model. The basic routine of evaluating the model and making predictions is given in the following code:
There are a few things going on in the above code:
- We first generate the test data using 50 equally-spaced points from [0, 5].
- We put the model into evaluation mode by calling
model.eval()
andlikelihood.eval()
. - We use
gpytorch.settings.fast_pred_var()
to get faster predictive distributions using LOVE. - When put into the eval mode, the trained GP model returns a
MultivariateNormal
containing the posterior mean and covariance. Thus, we can obtain the predictive mean and covariance matrix from the multivariate normal distribution. - Finally, we plot the mean and confidence region of the fitted GP model. The
confidence_region()
method is a helper method that returns 2 standard deviations above and below the mean.
The resulting plot is depicted below:
The black stars in the above plot represent the training (observed) data, while the blue line and the shaded area represent the mean and the confidence bounds respectively. Notice how the uncertainty is reduced close to the observed points. If more data points were added, we would see the mean function adjust itself to pass through these points and the uncertainty would reduce close to the observations.
Takeaways
In this tutorial, we have learned how to build a scalable and flexible GP model using GPyTorch. Like any other libraries, there are still a lot of cool things that you can do with GPyTorch which I didn’t cover in this tutorial. For example, you can utilize a GPU to accelerate your model training when implementing state-of-the-art algorithms like Deep GP or stochastic variational deep kernel learning.
If you would like me to make more tutorials on GPyTorch or if you have any other suggestions, please let me know in the comments!
Thanks for Reading!
If you enjoyed this, you might also like: