# User Guide

There are 3 main actions needed to train and use the different models:

## Initialization

### Possible models

There are currently 8 possible Gaussian Process models:

`GP`

corresponds to the original GP regression model, it is necessarily with a Gaussian likelihood.

` GP(X_train, y_train, kernel; kwargs...)`

`VGP`

is a variational GP model: a multivariate Gaussian is approximating the true posterior. There is no inducing points augmentation involved. Therefore it is well suited for small datasets (~10^3 samples)

` VGP(X_train, y_train, kernel, likelihood, inference; kwargs...)`

`SVGP`

is a variational GP model augmented with inducing points. The optimization is done on those points, allowing for stochastic updates and large scalability. The counterpart can be a slightly lower accuracy and the need to select the number and the location of the inducing points (however this is a problem currently worked on).

` SVGP(X_train, y_train, kernel, likelihood, inference, n_inducingpoints; kwargs...)`

`MCGP is a GP model where the posterior is represented via a collection of samplers.

`- MCGP(X_train, y_train, kernel, likelihood, inference; kwargs...)`

`OnlineSVGP`

is an online variational GP model. It is based on the streaming method of Bui 17', it supports all likelihoods, even with multiple latents.

` OnlineSVGP(kernel, likelihood, inference, ind_point_algorithm; kwargs...)`

`MOVGP`

is a multi output variational GP model based on the principle`f_output[i] = sum(A[i, j] * f_latent[j] for j in 1:n_latent)`

. The number of latent GP is free:

` MOVGP(X_train, ys_train, kernel, likelihood/s, inference, n_latent; kwargs...)`

`MOSVGP`

is the same thing as`MOVGP`

but with inducing pointsa multi output sparse variational GP model, based on Moreno-Muñoz 18'.

` MOVGP(X_train, ys_train, kernel, likelihood/s, inference, n_latent, n_inducing_points; kwargs...)`

`VStP`

is a variational Student-T model where the prior is a multivariate Student-T distribution with scale`K`

, mean`μ₀`

and degrees of freedom`ν`

. The inference is done automatically by augmenting the prior as a scale mixture of inverse gamma

` VStP(X_train, y_train, kernel, likelihood, inference, ν; kwargs...)`

### Likelihood

`GP`

can only have a Gaussian likelihood, while the other have more choices. Here are the ones currently implemented:

#### Regression

For **regression**, four likelihoods are available :

- The classical
`GaussianLikelihood`

, for**Gaussian noise** - The
`StudentTLikelihood`

, assuming noise from a**Student-T**distribution (more robust to ouliers) - The
`LaplaceLikelihood`

, with noise from a**Laplace**distribution. - The
`HeteroscedasticLikelihood`

, (in development) where the noise is a function of the input: $Var(X) = λσ^{-1}(g(X))$ where`g(X)`

is an additional Gaussian Process and`σ`

is the logistic function.

#### Classification

For **classification** one can select among

- The
`LogisticLikelihood`

: a Bernoulli likelihood with a**logistic link** - The
`BayesianSVM`

likelihood based on the**frequentist SVM**, equivalent to use a hinge loss.

#### Event Likelihoods

For likelihoods such as Poisson or Negative Binomial, we approximate a parameter by `σ(f)`

. Two Likelihoods are implemented :

- The
`PoissonLikelihood`

: A discrete Poisson process (one parameter per point) with the scale parameter defined as`λσ(f)`

- The
`NegBinomialLikelihood`

: The Negative Binomial likelihood where`r`

is fixed and we define the success probability`p`

as`σ(f)`

#### Multi-class classification

There is two available likelihoods for multi-class classification:

- The
`SoftMaxLikelihood`

, the most common approach. However no analytical solving is possible - The
`LogisticSoftMaxLikelihood`

, a modified softmax where the exponential function is replaced by the logistic function. It allows to get a fully conjugate model,**Corresponding paper**

### More options

There is the project to get distributions from `Distributions.jl`

to work directly as likelihoods.

### Inference

Inference can be done in various ways.

`AnalyticVI`

: Variational Inference with closed-form updates. For non-Gaussian likelihoods, this relies on augmented version of the likelihoods. For using Stochastic Variational Inference, one can use`AnalyticSVI`

with the size of the mini-batch as an argument`GibbsSampling`

: Gibbs Sampling of the true posterior, this also rely on an augmented version of the likelihoods, this is only valid for the`VGP`

model at the moment.

The two next methods rely on numerical approximation of an integral and I therefore recommend using the classical `Descent`

approach as it will use anyway the natural gradient updates. `ADAM`

seem to give random results.

`QuadratureVI`

: Variational Inference with gradients computed by estimating the expected log-likelihood via quadrature.`MCIntegrationVI`

: Variational Inference with gradients computed by estimating the expected log-likelihood via Monte Carlo Integration

[WIP] : AdvancedHMC.jl will be integrated at some point, although generally the Gibbs sampling is preferable when available.

### Compatibility table

Not all inference are implemented/valid for all likelihoods, here is the compatibility table between them.

Likelihood/Inference | AnalyticVI | GibbsSampling | QuadratureVI | MCIntegrationVI |
---|---|---|---|---|

GaussianLikelihood | ✔ (Analytic) | ✖ | ✖ | ✖ |

StudentTLikelihood | ✔ | ✔ | ✔ | ✖ |

LaplaceLikelihood | ✔ | ✔ | ✔ | ✖ |

HeteroscedasticLikelihood | ✔ | (dev) | (dev) | ✖ |

LogisticLikelihood | ✔ | ✔ | ✔ | ✖ |

BayesianSVM | ✔ | (dev) | ✖ | ✖ |

LogisticSoftMaxLikelihood | ✔ | ✔ | ✖ | (dev) |

SoftMaxLikelihood | ✖ | ✖ | ✖ | ✔ |

Poisson | ✔ | ✔ | ✖ | ✖ |

NegBinomialLikelihood | ✔ | ✔ | ✖ | ✖ |

(dev) means that the feature is possible and may be developped and tested but is not available yet. All contributions or requests are very welcome!

Model/Inference | AnalyticVI | GibbsSampling | QuadratureVI | MCIntegrationVI |
---|---|---|---|---|

VGP | ✔ | ✖ | ✔ | ✔ |

SVGP | ✔ | ✖ | ✔ | ✔ |

MCGP | ✖ | ✔ | ✖ | ✖ |

OnlineSVGP | ✔ | ✖ | ✖ | ✖ |

MO(S)VGP | ✔ | ✖ | ✔ | ✔ |

VStP | ✔ | ✖ | ✔ | ✔ |

Note that for MO(S)VGP you can use a mix of different likelihoods.

### Additional Parameters

#### Hyperparameter optimization

One can optimize the kernel hyperparameters as well as the inducing points location by maximizing the ELBO. All derivations are already hand-coded (no AD needed). One can select the optimization scheme via :

- The
`optimiser`

keyword, can be`nothing`

or`false`

for no optimization or can be an optimiser from the Flux.jl library, see list here Optimisers. - The
`Zoptimiser`

keyword, similar to`optimiser`

it is used for optimizing the inducing points locations, it is by default set to`nothing`

(no optimization)

#### PriorMean

The `mean`

keyword allows you to add different types of prior means:

`ZeroMean`

, a constant mean that cannot be optimized`ConstantMean`

, a constant mean that can be optimized`EmpiricalMean`

, a vector mean with a different value for each point`AffineMean`

,`μ₀`

is given by`X*w + b`

## Training

Training is straightforward after initializing the `model`

by running :

`train!(model; iterations=100, callback=callbackfunction)`

Where the `callback`

option is for running a function at every iteration. `callbackfunction`

should be defined as`

```
function callbackfunction(model, iter)
# do things here...
end
```

## Prediction

Once the model has been trained it is finally possible to compute predictions. There always three possibilities :

`predict_f(model, X_test; covf=true, fullcov=false)`

: Compute the parameters (mean and covariance) of the latent normal distributions of each test points. If`covf=false`

return only the mean, if`fullcov=true`

return a covariance matrix instead of only the diagonal`predict_y(model, X_test)`

: Compute the point estimate of the predictive likelihood for regression or the label of the most likely class for classification.`proba_y(model, X_test)`

: Return the mean with the variance of eahc point for regression or the predictive likelihood to obtain the class`y=1`

for classification.

## Miscellaneous

🚧 **In construction – Should be developed in the near future** 🚧

Saving/Loading models

Once a model has been trained it is possible to save its state in a file by using `save_trained_model(filename,model)`

, a partial version of the file will be save in `filename`

.

It is then possible to reload this file by using `load_trained_model(filename)`

. **!!!However note that it will not be possible to train the model further!!!** This function is only meant to do further predictions.

🚧 Pre-made callback functions 🚧

There is one (for now) premade function to return a a MVHistory object and callback function for the training of binary classification problems. The callback will store the ELBO and the variational parameters at every iterations included in iter*points If `X*test`and`

y_test` are provided it will also store the test accuracy and the mean and median test loglikelihood