User Guide

There are 3 main actions needed to train and use the different models:

Initialization

Possible models

There are currently 8 possible Gaussian Process models:

  • GP corresponds to the original GP regression model, it is necessarily with a Gaussian likelihood.
    GP(X_train, y_train, kernel; kwargs...)
  • VGP is a variational GP model: a multivariate Gaussian is approximating the true posterior. There is no inducing points augmentation involved. Therefore it is well suited for small datasets (~10^3 samples)
    VGP(X_train, y_train, kernel, likelihood, inference; kwargs...)
  • SVGP is a variational GP model augmented with inducing points. The optimization is done on those points, allowing for stochastic updates and large scalability. The counterpart can be a slightly lower accuracy and the need to select the number and the location of the inducing points (however this is a problem currently worked on).
    SVGP(X_train, y_train, kernel, likelihood, inference, n_inducingpoints; kwargs...)
  • `MCGP is a GP model where the posterior is represented via a collection of samplers.

  • - MCGP(X_train, y_train, kernel, likelihood, inference; kwargs...)
  • OnlineSVGP is an online variational GP model. It is based on the streaming method of Bui 17', it supports all likelihoods, even with multiple latents.

    OnlineSVGP(kernel, likelihood, inference, ind_point_algorithm; kwargs...)
  • MOVGP is a multi output variational GP model based on the principle f_output[i] = sum(A[i, j] * f_latent[j] for j in 1:n_latent). The number of latent GP is free:
    MOVGP(X_train, ys_train, kernel, likelihood/s, inference, n_latent; kwargs...)
  • MOSVGP is the same thing as MOVGP but with inducing pointsa multi output sparse variational GP model, based on Moreno-Muñoz 18'.
    MOVGP(X_train, ys_train, kernel, likelihood/s, inference, n_latent, n_inducing_points; kwargs...)
  • VStP is a variational Student-T model where the prior is a multivariate Student-T distribution with scale K, mean μ₀ and degrees of freedom ν. The inference is done automatically by augmenting the prior as a scale mixture of inverse gamma
    VStP(X_train, y_train, kernel, likelihood, inference, ν; kwargs...)

Likelihood

GP can only have a Gaussian likelihood, while the other have more choices. Here are the ones currently implemented:

Regression

For regression, four likelihoods are available :

Classification

For classification one can select among

Event Likelihoods

For likelihoods such as Poisson or Negative Binomial, we approximate a parameter by σ(f). Two Likelihoods are implemented :

Multi-class classification

There is two available likelihoods for multi-class classification:

More options

There is the project to get distributions from Distributions.jl to work directly as likelihoods.

Inference

Inference can be done in various ways.

  • AnalyticVI : Variational Inference with closed-form updates. For non-Gaussian likelihoods, this relies on augmented version of the likelihoods. For using Stochastic Variational Inference, one can use AnalyticSVI with the size of the mini-batch as an argument
  • GibbsSampling : Gibbs Sampling of the true posterior, this also rely on an augmented version of the likelihoods, this is only valid for the VGP model at the moment.

The two next methods rely on numerical approximation of an integral and I therefore recommend using the classical Descent approach as it will use anyway the natural gradient updates. ADAM seem to give random results.

  • QuadratureVI : Variational Inference with gradients computed by estimating the expected log-likelihood via quadrature.
  • MCIntegrationVI : Variational Inference with gradients computed by estimating the expected log-likelihood via Monte Carlo Integration

[WIP] : AdvancedHMC.jl will be integrated at some point, although generally the Gibbs sampling is preferable when available.

Compatibility table

Not all inference are implemented/valid for all likelihoods, here is the compatibility table between them.

Likelihood/InferenceAnalyticVIGibbsSamplingQuadratureVIMCIntegrationVI
GaussianLikelihood✔ (Analytic)
StudentTLikelihood
LaplaceLikelihood
HeteroscedasticLikelihood(dev)(dev)
LogisticLikelihood
BayesianSVM(dev)
LogisticSoftMaxLikelihood(dev)
SoftMaxLikelihood
Poisson
NegBinomialLikelihood

(dev) means that the feature is possible and may be developped and tested but is not available yet. All contributions or requests are very welcome!

Model/InferenceAnalyticVIGibbsSamplingQuadratureVIMCIntegrationVI
VGP
SVGP
MCGP
OnlineSVGP
MO(S)VGP
VStP

Note that for MO(S)VGP you can use a mix of different likelihoods.

Additional Parameters

Hyperparameter optimization

One can optimize the kernel hyperparameters as well as the inducing points location by maximizing the ELBO. All derivations are already hand-coded (no AD needed). One can select the optimization scheme via :

  • The optimiser keyword, can be nothing or false for no optimization or can be an optimiser from the Flux.jl library, see list here Optimisers.
  • The Zoptimiser keyword, similar to optimiser it is used for optimizing the inducing points locations, it is by default set to nothing (no optimization)

PriorMean

The mean keyword allows you to add different types of prior means:

Training

Training is straightforward after initializing the model by running :

train!(model;  iterations=100, callback=callbackfunction)

Where the callback option is for running a function at every iteration. callbackfunction should be defined as`

function callbackfunction(model, iter)
    # do things here...
end

Prediction

Once the model has been trained it is finally possible to compute predictions. There always three possibilities :

  • predict_f(model, X_test; covf=true, fullcov=false) : Compute the parameters (mean and covariance) of the latent normal distributions of each test points. If covf=false return only the mean, if fullcov=true return a covariance matrix instead of only the diagonal
  • predict_y(model, X_test) : Compute the point estimate of the predictive likelihood for regression or the label of the most likely class for classification.
  • proba_y(model, X_test) : Return the mean with the variance of eahc point for regression or the predictive likelihood to obtain the class y=1 for classification.

Miscellaneous

🚧 In construction – Should be developed in the near future 🚧

Saving/Loading models

Once a model has been trained it is possible to save its state in a file by using save_trained_model(filename,model), a partial version of the file will be save in filename.

It is then possible to reload this file by using load_trained_model(filename). !!!However note that it will not be possible to train the model further!!! This function is only meant to do further predictions.

🚧 Pre-made callback functions 🚧

There is one (for now) premade function to return a a MVHistory object and callback function for the training of binary classification problems. The callback will store the ELBO and the variational parameters at every iterations included in iterpoints If `Xtestandy_test` are provided it will also store the test accuracy and the mean and median test loglikelihood