{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "view-in-github"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/cc-ats/mlp_tutorial/blob/main/Lesson2_GPR.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RDiNwEKvJHkG"
      },
      "source": [
        "# Gaussian Process Regression Models"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2SnOux9yI52L"
      },
      "source": [
        "## Defining the Mueller-Brown Potential Energy Surface\n",
        "\n",
        "For the definition of Mueller-Brown potential, see [here](https://www.wolframcloud.com/objects/demonstrations/TrajectoriesOnTheMullerBrownPotentialEnergySurface-source.nb).\n",
        "\n",
        "In this tutorial, we will learn how to use a Gaussian Process Regression model to predict the energy and gradient of points on the Mueller-Brown potential energy surface.\n",
        "\n",
        "We will start by defining the function for the Mueller-Brown potential energy surface:\n",
        "\n",
        "$v(x,y) = \\sum_{k=0}^3 A_k \\mathrm{exp}\\left[ a_k (x - x_k^0)^2 + b_k (x - x_k^0) (y - y_k^0) + c_k (y - y_k^0)^2 \\right]$\n",
        "\n",
        "We will also define the derivatives with respect to $x$ and $y$ for the Muller-Brown potential energy surface:\n",
        "\n",
        "$\\frac{dv(x,y)}{dx}  = \\sum_{k=0}^3 A_k \\mathrm{exp}\\left[ a_k (x - x_k^0)^2 + b_k (x - x_k^0) (y - y_k^0) + c_k (y - y_k^0)^2 \\right]\\left [2a_k (x - x_k^0)+b_k(y - y_k^0) \\right]$\n",
        "\n",
        "$\\frac{dv(x,y)}{dy}  = \\sum_{k=0}^3 A_k \\mathrm{exp}\\left[ a_k (x - x_k^0)^2 + b_k (x - x_k^0) (y - y_k^0) + c_k (y - y_k^0)^2 \\right]\\left [b_k(x - x_k^0)+ 2 c_k(y - y_k^0) \\right]$\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gV1hsd1YJkZw"
      },
      "outputs": [],
      "source": [
        "from math import exp, pow\n",
        "\n",
        "def mueller_brown_potential_with_gradient(x, y):\n",
        "    A = [-200, -100, -170, 15]\n",
        "    a = [-1, -1, -6.5, 0.7]\n",
        "    b = [0, 0, 11, 0.6]\n",
        "    c = [-10, -10, -6.5, 0.7]\n",
        "    x0 = [1, 0, -0.5, -1.0]\n",
        "    y0 = [0, 0.5, 1.5, 1]\n",
        "    z = 0\n",
        "    dx = 0\n",
        "    dy = 0\n",
        "    for k in range(4):\n",
        "        # Scale the function by 0.1 to make plotting easier\n",
        "        z += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2))\n",
        "        dx += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2)) * (a[k] * 2 *(x-x0[k]) + b[k] * (y-y0[k]))\n",
        "        dy += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2)) * (b[k] * (x-x0[k])+ c[k] * 2 * (y-y0[k]))\n",
        "    return z, dx, dy"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6JvQVSM_J8Rp"
      },
      "source": [
        "### Generating Training Data\n",
        "\n",
        "First, we need to generate data to train the neural network, as done previously in Lesson 1. Displayed below are the max/min values of the surface and the size of the training and testing sets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "\n",
        "# generate x and y  on a grid\n",
        "x_range = np.arange(-1.8, 1.4, 0.1, dtype=np.float32)\n",
        "y_range = np.arange(-0.4, 2.4, 0.1, dtype=np.float32)\n",
        "X, Y = np.meshgrid(x_range, y_range)\n",
        "\n",
        "# compute the potential energy at each point on the grid\n",
        "mueller_brown_potential_with_vectorized = np.vectorize(mueller_brown_potential_with_gradient, otypes=[np.float32, np.float32, np.float32])\n",
        "Z, dX, dY = mueller_brown_potential_with_vectorized(X, Y)\n",
        "\n",
        "# keep only low-energy points for training\n",
        "train_mask = Z < 10\n",
        "X_train, Y_train, Z_train, dX_train, dY_train = X[train_mask], Y[train_mask], Z[train_mask], dX[train_mask], dY[train_mask]\n",
        "\n",
        "print(f\"Z_min: {np.min(Z)}, Z_max: {np.max(Z)}\")\n",
        "print(f\"Size of (future) training set: {len(Z_train)}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lGNPoXwCKm61"
      },
      "source": [
        "### Visualizing Training Data: 3-D Projection Surface\n",
        "\n",
        "We will now create a 3-D plot of our training data. To make the plot more readable, we will replace the points that have extremely high energy with nan (not a number). This will keep our $Z$ array the same shape and help us ignore the high energy region that we are not interested in."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 517
        },
        "id": "4bERnPqOJty7",
        "outputId": "7fc04eac-8f34-4c0b-a9ce-656ef772801d"
      },
      "outputs": [],
      "source": [
        "import plotly.graph_objects as go\n",
        "\n",
        "fig = go.Figure(data=[go.Surface(z=Z, x=X, y=Y, colorscale='rainbow', cmin=-15, cmax=9)])\n",
        "fig.update_traces(contours_z=dict(show=True, project_z=True))\n",
        "fig.update_layout(title='Mueller-Brown Potential', width=500, height=500,\n",
        "                  scene = dict(\n",
        "                      zaxis = dict(dtick=3, range=[-15, 15]),\n",
        "                      camera_eye = dict(x=-1.2, y=-1.2, z=1.2)))\n",
        "fig.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "F4cPQP6kMU4k"
      },
      "source": [
        "### Visualizing Training Data: Contour Surface\n",
        "\n",
        "To allow for an easier visualization of the potential energy surface, we can generate a 2-D contour surface."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 377
        },
        "id": "RF0uRqMQKz8g",
        "outputId": "b388be76-f305-4bfe-8116-d78520efd7f1"
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "def plot_contour_map(X, Y, Z, ax, title, colorscale='rainbow', levels=None):\n",
        "    if levels is None:\n",
        "        levels = [-12, -8, -4, 0, 4, 8, 10]\n",
        "    ct = ax.contour(X, Y, Z, levels, colors='k')\n",
        "    ax.clabel(ct, inline=True, fmt='%3.0f', fontsize=8)\n",
        "    ct = ax.contourf(X, Y, Z, levels, cmap=colorscale, extend='both', vmin=levels[0], vmax=levels[-1])\n",
        "    ax.set_xlabel(\"x\", labelpad=-0.75)\n",
        "    ax.set_ylabel(\"y\", labelpad=2.5)\n",
        "    ax.tick_params(axis='both', pad=2, labelsize=8)\n",
        "    cbar = plt.colorbar(ct)\n",
        "    cbar.ax.tick_params(labelsize=8)\n",
        "    ax.set_title(title, fontsize=8)\n",
        "\n",
        "\n",
        "fig, ax = plt.subplots(figsize=(3, 2.5), dpi=150)\n",
        "plot_contour_map(X, Y, Z, ax=ax, title='Mueller-Brown Potential')\n",
        "fig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KqK8LofANoFa"
      },
      "source": [
        "## Loading PyTorch and GPyTorch for GPR Learning\n",
        "\n",
        "Now we will install GPyTorch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yp-61GIZKSBX"
      },
      "outputs": [],
      "source": [
        "%pip install gpytorch --quiet\n",
        "\n",
        "import torch\n",
        "import gpytorch"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ADzZywraN8uk"
      },
      "source": [
        "## Defining a GPR Model for Energy Prediction\n",
        "\n",
        "First, we will learn how to use a GPR model to predict energies only.\n",
        "\n",
        "Let us define the variables in our function using a vector of input features with $D$ observables as $\\textbf{x}=[x_1,...,x_D]$. A set of $n$ configurations can be assembled into a training set $\\textbf{X}=[\\textbf{x}_1, ...,\\textbf{x}_n]$ with a corresponding set of observations $\\textbf{y}=[y_1,...,y_n]^T$.\n",
        "\n",
        "For noisy samples, we can assume that an observation $y$ is seperate from the underlying function $f(\\textbf{x})$ according to $y(\\textbf{x})=f(\\textbf{x})+\\mathit{ε}$, where the noise, $\\mathit{ε}$, follows a Gaussian distribution $\\mathit{ε}\\sim\\mathcal{N}(0,σ^2_n)$, where $\\sigma^2_n$ is the noise parameter. The prior distribution of underlying functions follows a Gaussian distribution $\\textbf{f}(\\textbf{X})\\sim\\mathcal{N}(\\textbf{0}, \\textbf{K}(\\textbf{X},\\textbf{X}))$, where $\\textbf{0}$ is the mean function and $\\textbf{K}$ is the covariance kernel matrix. The covariance kernel matrix is assembled based on a kernel function, $k$, that measures the simularities between input vectors:\n",
        "\n",
        "$\\textbf{K(X,X)}=\n",
        "\\begin{bmatrix}\n",
        "k(\\textbf{x}_1,\\textbf{x}_1) & \\ldots & k(\\textbf{x}_1,\\textbf{x}_n)\\\\\n",
        "\\vdots & \\ddots & \\vdots\\\\\n",
        "k(\\textbf{x}_n,\\textbf{x}_1) & \\ldots & k(\\textbf{x}_n,\\textbf{x}_n)\\\\\n",
        "\\end{bmatrix}$\n",
        "\n",
        "Here we used the radial basis function:\n",
        "\n",
        "${k}(\\textbf{x}_a,\\textbf{x}_b)=\\sigma^2_f\\mathrm{exp}(-\\frac{||\\textbf{x}_a-\\textbf{x}_b||^2}{2l^2})$\n",
        "\n",
        "where $\\sigma^2_f$ is the vertical variation parameter, and $l$ is the length parameter."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ijW4xiTAKm4T"
      },
      "outputs": [],
      "source": [
        "# Setup GPR Model: Taken directly From gpytorch tutorial with minor changes\n",
        "class ExactGPModel(gpytorch.models.ExactGP):\n",
        "    def __init__(self, train_x, train_y, likelihood):\n",
        "        super().__init__(train_x, train_y, likelihood)\n",
        "        self.mean_module = gpytorch.means.ZeroMean()\n",
        "        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())\n",
        "\n",
        "    def forward(self, x):\n",
        "        mean_x = self.mean_module(x)\n",
        "        covar_x = self.covar_module(x)\n",
        "        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n",
        "\n",
        "\n",
        "# turn NumPy arrays into PyTorch tensors\n",
        "X_gpr = torch.from_numpy(np.column_stack((X_train, Y_train)))\n",
        "Z_gpr = torch.from_numpy(Z_train)\n",
        "#Z_gpr is an array of output values (observations) from the Mueller-Brown potential energy function\n",
        "\n",
        "# Initialize Likelihood and Model\n",
        "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
        "model = ExactGPModel(X_gpr, Z_gpr, likelihood)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tfVlsUp2PNpx"
      },
      "source": [
        "### GPR Hyperparameters\n",
        "\n",
        "With the noise hyperparameter $\\sigma^2_n$, vertical variation parameter $\\sigma^2_f$, and the length scale parameter $l$, we can define\n",
        "\n",
        "$\\tilde{\\textbf{K}}=\\textbf{K}(\\textbf{X},\\textbf{X})+σ^2_n\\textbf{I}$\n",
        "\n",
        "with $\\textbf{I}$ being the identity matrix. The set of hyperparameters  $\\Theta = \\{σ^2_f, l, \\sigma^2_n\\}$ are optimized by maximizing the marginal likelihood log function:\n",
        "\n",
        "$\\mathrm{log}\\:p(\\textbf{y}|\\textbf{X},\\textbf{Θ})=-\\frac{1}{2}\\textbf{y}^\\mathrm{T}\\tilde{\\textbf{K}}^{-1}\\textbf{y}-\\frac{1}{2}\\mathrm{log}\\:|\\tilde{\\textbf{K}}|-\\frac{n}{2}\\mathrm{log}\\:2\\pi$\n",
        "\n",
        "We will create a plot to demonstrate that the negative marginal likelihood log ($-\\mathrm{log}\\:p$) function is smooth by holding the noise hyperparameter, $\\sigma^2_n$, constant, and varying the length scale, $l$, and vertical variation parameter, $σ^2_f$.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 452
        },
        "id": "jZJuALPTiS48",
        "outputId": "47e1a1f0-6201-4440-8513-48014ef1bd1e"
      },
      "outputs": [],
      "source": [
        "noise_value = 1.0\n",
        "# Create a list of values ranging from (0.1,0.1) to (4.9,4.9)\n",
        "scale_and_length = [[i*.1,j*.1] for i in range(1,50) for j in range(1,50)]\n",
        "\n",
        "x_plt = []\n",
        "y_plt = []\n",
        "z_plt = []\n",
        "\n",
        "for pair in scale_and_length:\n",
        "    # Initialize 3 hyperparameters\n",
        "    hypers = {\n",
        "        'likelihood.noise_covar.noise': torch.tensor(noise_value),\n",
        "        'covar_module.base_kernel.lengthscale': torch.tensor(pair[0]),\n",
        "        'covar_module.outputscale': torch.tensor(pair[1]),\n",
        "    }\n",
        "    model.initialize(**hypers)\n",
        "    # Initialize the function for calculating the marginal likelihood log\n",
        "    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
        "    output = model(X_gpr)\n",
        "    # Call the marginal likelihood log function\n",
        "    loss = -mll(output, Z_gpr)\n",
        "    x_plt.append(pair[0])       # Length Scale\n",
        "    y_plt.append(pair[1])       # Vertical variance (output scale)\n",
        "    z_plt.append(loss.item())   #-log p\n",
        "\n",
        "fig = plt.figure(figsize=(3,3), dpi=150)\n",
        "\n",
        "plt.subplot(1, 1, 1)\n",
        "ct = plt.tricontour(x_plt, y_plt, z_plt, colors='k')\n",
        "plt.clabel(ct, inline=True, fmt='%3.0f', fontsize=8)\n",
        "ct = plt.tricontourf(x_plt, y_plt, z_plt, cmap=plt.cm.rainbow, extend='both')\n",
        "plt.xlabel(\"length scale ($l$)\")\n",
        "plt.ylabel(\"output scale ($σ^2_f$)\")\n",
        "plt.colorbar().set_label(\"-Log Marginal Likelihood\",rotation=270, labelpad=12)\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JKs4ZTRjQKqt"
      },
      "source": [
        "## Training the Model\n",
        "\n",
        "Using the previously built class, we can now train the model. We will start with initial values for our hyperparameters and then optimize the hyperparameters until the desired number of iterations is reached. Then we will print the optimized hyperparameters."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "QJQaCzzrK0P0",
        "outputId": "7346a651-d6ef-43ef-b536-839aefb7d5cd"
      },
      "outputs": [],
      "source": [
        "model.train()\n",
        "likelihood.train()\n",
        "\n",
        "def train_model(model, likelihood, x_train, y_train, print_hp=False):\n",
        "  hypers = {\n",
        "      'likelihood.noise_covar.noise': torch.tensor(1.0),\n",
        "      'covar_module.base_kernel.lengthscale': torch.tensor(1.0),\n",
        "      'covar_module.outputscale': torch.tensor(1.0),\n",
        "  }\n",
        "  model.initialize(**hypers)\n",
        "  if print_hp:\n",
        "      # Print untrained hyperparameters\n",
        "      for param_name, param in model.named_parameters():\n",
        "        print(f'Parameter name: {param_name:42} value = {param.item():9.5f}')\n",
        "\n",
        "  training_iter = 100  # using 100 iterations\n",
        "  # Find optimal model hyperparameters using the SGD optimizer with a learning rate of 0.1\n",
        "  optimizer = torch.optim.SGD(model.parameters(), lr=0.1)\n",
        "\n",
        "  # \"Loss\" for GPs = -(marginal log likelihood)\n",
        "  mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
        "\n",
        "  for i in range(training_iter):\n",
        "    # Zero out the gradients from previous iteration\n",
        "    optimizer.zero_grad()\n",
        "    # Output from model\n",
        "    output = model(x_train)\n",
        "    # Calculate loss and backpropagation gradients\n",
        "    loss = -mll(output, y_train)\n",
        "    loss.backward() # updating the gradients\n",
        "    if (i+1) % 10 == 0:\n",
        "      if print_hp:\n",
        "        print('Iter %3d/%d - Loss: %.3f output scale: %.3f   length scale: %.3f   noise: %.3f' % (\n",
        "          i + 1, training_iter, loss.item(),\n",
        "          model.covar_module.outputscale.item(),\n",
        "          model.covar_module.base_kernel.lengthscale.item(),\n",
        "          model.likelihood.noise.item()\n",
        "        ))\n",
        "        optimizer.step()\n",
        "  if print_hp:\n",
        "    # Print Trained hyperparameters\n",
        "    for param_name, param in model.named_parameters():\n",
        "      print(f'Parameter name: {param_name:42} value = {param.item():9.5f}')\n",
        "\n",
        "train_model(model, likelihood, X_gpr, Z_gpr, print_hp=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rXx1tlE_RE_S"
      },
      "source": [
        "### Plotting Predicted, Reference, Difference, and Variance Surfaces\n",
        "\n",
        "The optimized model can be used to make predictions of the function (energy) at a new input space (configuration), $\\,\\textbf{x}^*$. Predictions will follow a Gaussian distribution:\n",
        "\n",
        "$\\mathcal{N}\\sim(μ^*,\\textbf{Σ}^*)$,\n",
        "\n",
        "where $μ^*=\\textbf{K}^{*\\mathrm{T}}\\tilde{\\textbf{K}}^{-1}\\textbf{y}$ and $\\textbf{Σ}^*=\\textbf{K}^{**}-\\textbf{K}^\\mathrm{*T}\\tilde{\\textbf{K}}^{-1}\\textbf{K}^*$.\n",
        "\n",
        "In the above equaiton,\n",
        "\n",
        "$\\textbf{K}^*=\\textbf{K}(\\textbf{X},\\textbf{x}^*)$ and $\\textbf{K}^{**}=\\textbf{K}(\\textbf{x}^*,\\textbf{x}^*)$\n",
        "\n",
        "We will use this to plot the GPR predicted surface (Predicted), the analytical surface (Reference), the difference between the predicted and analytical surfaces (Difference), and the variance of the predicted points (Variance)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model.eval()\n",
        "pred = model(torch.from_numpy(np.column_stack((X.flatten(), Y.flatten()))))\n",
        "Z_pred = pred.mean.detach().numpy().reshape(Z.shape)\n",
        "Z_var = pred.variance.detach().numpy().reshape(Z.shape)\n",
        "Z_diff = Z_pred - Z\n",
        "\n",
        "fig, ax = plt.subplots(2, 2, figsize=(6, 5.2), dpi=150)\n",
        "plot_contour_map(X, Y, Z_pred, ax=ax[0, 0], title='Z Predicted')\n",
        "plot_contour_map(X, Y, Z, ax=ax[0, 1], title='Z Reference')\n",
        "plot_contour_map(X, Y, Z_diff, ax=ax[1, 0], title='Difference', levels=[-4, -2, 0, 2, 4])\n",
        "plot_contour_map(X, Y, Z_var, ax=ax[1, 1], title='Difference', levels=[0, 1, 2, 3, 4])\n",
        "fig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gxjmxmbcRhfC"
      },
      "source": [
        "## Using a GPR model to Predict Energies and Gradients: Defining the Model\n",
        "\n",
        "Recall the covariance kernel matrix defined in Lesson 2.3:\n",
        "\n",
        "$\\textbf{K(X,X)}=\n",
        "\\begin{bmatrix}\n",
        "k(\\textbf{x}_1,\\textbf{x}_1) & \\ldots & k(\\textbf{x}_1,\\textbf{x}_n)\\\\\n",
        "\\vdots & \\ddots & \\vdots\\\\\n",
        "k(\\textbf{x}_n,\\textbf{x}_1) & \\ldots & k(\\textbf{x}_n,\\textbf{x}_n)\\\\\n",
        "\\end{bmatrix}$\n",
        "\n",
        "where the radial basis functions $k(\\textbf{x}_n,\\textbf{x}_n)$ are functions of the input feature vectors  $\\textbf{x}=[x_1,...,x_D]$\n",
        "\n",
        "Our input feature vectors are composed into a training set $\\textbf{X}=[\\textbf{x}_1, ...,\\textbf{x}_n]$ with observations $\\textbf{y}=[y_1,...,y_n]^T$\n",
        "\n",
        "Each observation $y(\\textbf{x})$ is dependent upon an underlying function, $f(\\textbf{x})$, as well as the noise, $\\mathit{ε}$.\n",
        "\n",
        "Derivatives of Gaussian processes, $\\frac{\\partial f(\\textbf{x})}{\\partial x}$, are themselves Gaussian processes; consequently, they can be incorporated into the training targets and used to make explicit predictions of the gradient: $ \\mathrm{\\textbf{y}}_\\mathrm{ext}=  \\left[y_1,...,y_n,\\frac{\\partial f(\\textbf{x}_1)}{\\partial x},...,\\frac{\\partial f(\\textbf{x}_n)}{\\partial x},\\frac{\\partial f(\\textbf{x}_1)}{\\partial y},...,\\frac{\\partial f(\\textbf{x}_n)}{\\partial y}\\right]^\\mathrm{T}$\n",
        "\n",
        "To account for the additional observations, the extended kernel is: $\\textbf{K}_\\mathrm{ext}(\\textbf{X,X}')=\n",
        "\\begin{bmatrix}\n",
        "\\textbf{K}(\\textbf{X,X}') & \\displaystyle\\frac{\\partial \\textbf{K}(\\textbf{X,X}')}{\\partial \\textbf{X}'} \\\\\n",
        "\\displaystyle\\frac{\\partial \\textbf{K}(\\textbf{X,X}')}{\\partial \\textbf{X}} & \\displaystyle\\frac{\\partial^2 \\textbf{K}(\\textbf{X,X}')}{\\partial \\textbf{X} \\partial \\textbf{X}'}\n",
        "\\end{bmatrix}$"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "8wu_8vcxR_nb",
        "outputId": "7515c938-a62f-4c4a-b02c-46d4fdd0227f"
      },
      "outputs": [],
      "source": [
        "class GPModelWithDerivatives(gpytorch.models.ExactGP):\n",
        "    def __init__(self, train_x, train_y, likelihood):\n",
        "        super(GPModelWithDerivatives, self).__init__(train_x, train_y, likelihood)\n",
        "        self.mean_module = gpytorch.means.ConstantMeanGrad()\n",
        "        self.base_kernel = gpytorch.kernels.RBFKernelGrad(ard_num_dims=2)\n",
        "        self.covar_module = gpytorch.kernels.ScaleKernel(self.base_kernel)\n",
        "\n",
        "    def forward(self, x):\n",
        "        mean_x = self.mean_module(x)\n",
        "        covar_x = self.covar_module(x)\n",
        "        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)\n",
        "\n",
        "\n",
        "X_gpr = torch.from_numpy(np.column_stack((X_train, Y_train)))\n",
        "Z_gpr = torch.from_numpy(np.column_stack((Z_train, dX_train, dY_train)))\n",
        "\n",
        "X_test = torch.from_numpy(np.column_stack((X.flatten(), Y.flatten())))\n",
        "Z_test = torch.from_numpy(np.column_stack((Z.flatten(), dX.flatten(), dY.flatten()))) # now including gradient observations in our training data\n",
        "\n",
        "likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=3) # Value + x-derivative + y-derivative\n",
        "model = GPModelWithDerivatives(X_gpr, Z_gpr, likelihood) # now our model uses gradients and energies for training\n",
        "\n",
        "use_gpu = torch.cuda.is_available()\n",
        "if use_gpu:\n",
        "  X_gpr = X_gpr.cuda()\n",
        "  Z_gpr = Z_gpr.cuda()\n",
        "  X_test = X_test.cuda()\n",
        "  Z_test = Z_test.cuda()\n",
        "  likelihood = likelihood.cuda()\n",
        "  model = model.cuda()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EnuwPHF1SCr_"
      },
      "source": [
        "## Training the Model\n",
        "\n",
        "Let $\\textbf{K}_\\textrm{ext}=\\textbf{K}_\\textrm{ext}(\\textbf{X},\\textbf{X}')+σ^2_n\\textbf{I}$\n",
        "\n",
        "As in Lesson 2.3.1, the hyperparameters $\\Theta = \\{σ^2_f, l, \\sigma^2_n\\}$ are optimized by maximizing the marginal likelihood log function, which now contains our extended kernel and observation set:\n",
        "\n",
        "$\\mathrm{log}\\:p(\\textbf{y}_\\textrm{ext}|\\textbf{X},\\textbf{Θ})=-\\frac{1}{2}\\textbf{y}_\\textrm{ext}^\\mathrm{T}\\tilde{\\textbf{K}}_\\textrm{ext}^{-1}\\textbf{y}_\\textrm{ext}-\\frac{1}{2}\\mathrm{log}\\:|\\tilde{\\textbf{K}}_\\textrm{ext}|-\\frac{n(m+1)}{2}\\mathrm{log}\\:2\\pi $\n",
        "\n",
        "where $m$ is the number of gradient observations"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ydMU4SCofdur",
        "outputId": "a53718a2-0542-4dfe-bbe8-ef5d4e26dd9a"
      },
      "outputs": [],
      "source": [
        "model.train()\n",
        "likelihood.train()\n",
        "\n",
        "def train_model(model, likelihood, x_train, y_train, print_hp=True):\n",
        "  training_iter = 100  # using 100 iterations\n",
        "  # Find optimal model hyperparameters using the Adam optimizer with a learning rate of 0.1\n",
        "  optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters\n",
        "\n",
        "  # \"Loss\" for GPs - the marginal log likelihood\n",
        "  mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
        "  for i in range(training_iter):\n",
        "      optimizer.zero_grad()\n",
        "      output = model(x_train)\n",
        "      loss = -mll(output, y_train)\n",
        "      loss.backward()\n",
        "      if (i+1) % 10 == 0:\n",
        "        if print_hp:\n",
        "          print(f'Iter {i+1:>5}/{training_iter:>} - Loss: {loss.item():6.3f} '\n",
        "          f'outputscale: {model.covar_module.outputscale:6.3f} lengthscales: {model.covar_module.base_kernel.lengthscale.squeeze()[0]:6.3f}'\n",
        "          f'{model.covar_module.base_kernel.lengthscale.squeeze()[1]:6.3f} noise: {model.likelihood.noise.item():6.3f}')\n",
        "\n",
        "        optimizer.step()\n",
        "\n",
        "#Y_gpr is an array of output values and gradients from the Mueller-Brown potential energy function\n",
        "train_model(model, likelihood, X_gpr, Z_gpr)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G1f1cQ6kSm_x"
      },
      "source": [
        "### Using the Model to Predict Energies and Gradients\n",
        "\n",
        "The optimized model can be used to make predictions of both the function and its gradients at a new input space, $\\,\\textbf{x}^*$. The expected value (E) of the function  at $\\,\\textbf{x}^*$ is given by:\n",
        "\n",
        "$\\mathrm{E}\\left[f(\\textbf{x}^*)|\\textbf{y}_\\mathrm{ext},\\textbf{x}^*,\\textbf{X},\\Theta\\right] = \\textbf{K}^*_\\mathrm{ext}(\\textbf{K}_\\mathrm{ext}+\\sigma^2_\\mathrm{n}\\textbf{I})^{-1}\\textbf{y}_\\mathrm{ext}$\n",
        "\n",
        "where $\\textbf{K}^\\ast_\\mathrm{ext}=\\textbf{K}_\\mathrm{ext}(\\textbf{x}^\\ast,\\textbf{X})$.\n",
        "\n",
        "The associated predictive variance (Var) is given by:\n",
        "\n",
        "$\\mathrm{Var}\\left[f(\\textbf{x}^*)|\\textbf{y}_\\mathrm{ext},\\mathrm{\\textbf{x}^\\ast},\\mathrm{\\textbf{X}},\\Theta\\right] = k(\\mathrm{\\textbf{x}^\\ast},\\mathrm{\\textbf{x}^\\ast})-\\textbf{K}^\\ast_\\mathrm{ext}(\\textbf{K}_\\mathrm{ext}+\\sigma^2_\\mathrm{n}\\textbf{I})^{-1}\\textbf{K}^{\\ast\\mathrm{T}}_\\mathrm{ext}$\n",
        "\n",
        "The predictions of the expected gradients are given by:\n",
        "\n",
        "$\\mathrm{E}\\left[\\frac{\\partial f(\\mathrm{\\textbf{x}^\\ast})}{\\partial q}\\bigg|\\mathrm{\\textbf{y}}_\\mathrm{ext},\\mathrm{\\textbf{x}^\\ast},\\mathrm{\\textbf{X}},\\Theta\\right] = \\frac{\\partial \\textbf{K}^\\ast_\\mathrm{ext}}{\\partial q}(\\textbf{K}_\\mathrm{ext}+\\sigma^2_\\mathrm{n}\\textbf{I})^{-1}\\textbf{y}_\\mathrm{ext}$\n",
        "\n",
        "where $q$ corresponds to the input ($x$ or $y$). The associated variances are given by:\n",
        "\n",
        "$\\mathrm{Var}\\left[\\frac{\\partial f(\\mathrm{\\textbf{x}^\\ast})}{\\partial q}\\bigg|\\mathrm{\\textbf{y}}_\\mathrm{ext},\\mathrm{\\textbf{x}^\\ast},\\mathrm{\\textbf{X}},\\Theta\\right] =\\frac{\\partial^2 k(\\mathrm{\\textbf{x}^\\ast},\\mathrm{\\textbf{x}^\\ast})}{\\partial q^2}-\\frac{\\partial \\textbf{K}^\\ast_\\mathrm{ext}}{\\partial q}(\\textbf{K}_\\mathrm{ext}+\\sigma^2_\\mathrm{n}\\textbf{I})^{-1}\\frac{\\partial \\textbf{K}^{\\ast\\mathrm{T}}_\\mathrm{ext}}{\\partial q}$\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VoeICKjcvxsv"
      },
      "outputs": [],
      "source": [
        "model.eval()\n",
        "likelihood.eval()\n",
        "\n",
        "with torch.no_grad(), gpytorch.settings.fast_computations(log_prob=False, covar_root_decomposition=False):\n",
        "    predictions = likelihood(model(X_test))\n",
        "    mean = predictions.mean"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xr80komLTvGg"
      },
      "source": [
        "### Plotting Predicted and Reference Surfaces and Gradients\n",
        "\n",
        "We can now plot the reference and GPR predicted energies and gradients."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "IiRcMiynv09f",
        "outputId": "339414e8-3fa3-456c-c424-b1bfb23c3805"
      },
      "outputs": [],
      "source": [
        "if use_gpu:\n",
        "    mean = mean.cpu()\n",
        "\n",
        "Z_pred = mean[:, 0].detach().numpy().reshape(Z.shape)\n",
        "dX_pred = mean[:, 1].detach().numpy().reshape(Z.shape)\n",
        "dY_pred = mean[:, 2].detach().numpy().reshape(Z.shape)\n",
        "\n",
        "fig, ax = plt.subplots(3, 2, figsize=(6, 7.6), dpi=150)\n",
        "plot_contour_map(X, Y, Z_pred, ax=ax[0, 0], title='Z Predicted')\n",
        "plot_contour_map(X, Y, Z, ax=ax[0, 1], title='Z Reference')\n",
        "plot_contour_map(X, Y, dX_pred, ax=ax[1, 0], title='dX Predicted')\n",
        "plot_contour_map(X, Y, dX, ax=ax[1, 1], title='dX Reference')\n",
        "plot_contour_map(X, Y, dY_pred, ax=ax[2, 0], title='dY Predicted')\n",
        "plot_contour_map(X, Y, dY, ax=ax[2, 1], title='dY Reference')\n",
        "fig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "naJFDS4MUU5w"
      },
      "source": [
        "## Model Performance and Training Set Size\n",
        "\n",
        "Now that our model works, we can look at how the size of our training set changes the accuracy of our model. Below you can see that as the size of the training set increases, the root-mean square error (RMSE) decreases and the $R^2$ increases."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fyVN9wjdVK6G",
        "outputId": "481b9d48-cb82-4e5f-9bd5-01248fdd9f5b"
      },
      "outputs": [],
      "source": [
        "import tabulate\n",
        "\n",
        "X_gpr = torch.from_numpy(np.column_stack((X_train, Y_train)))\n",
        "Z_gpr = torch.from_numpy(Z_train)\n",
        "\n",
        "X_test = X_gpr.detach()\n",
        "Z_test = Z_gpr.detach()\n",
        "\n",
        "def evaluate_model(train_x, train_z, test_x, test_z, model):\n",
        "  model.eval()\n",
        "  preds_train = model(train_x).mean\n",
        "  preds_test = model(test_x).mean\n",
        "  rmse_train = torch.sqrt((torch.mean(train_z - preds_train)**2))\n",
        "  rmse_test = torch.sqrt((torch.mean(test_z - preds_test)**2))\n",
        "  r2 = 1 - torch.sum((train_z-preds_train)**2)/torch.sum((train_z-torch.mean(train_z))**2)\n",
        "  q2 = 1 - torch.sum((train_z-preds_train)**2)/torch.sum((train_z-torch.mean(train_z))**2)\n",
        "  return rmse_train, r2, rmse_test, q2\n",
        "\n",
        "def reduce_training_set(train_x, train_z, new_size):\n",
        "  arr_index = np.arange(train_z.shape[0])\n",
        "  np.random.shuffle(arr_index)\n",
        "  return train_x[arr_index[:new_size],:], train_z[arr_index[:new_size]]\n",
        "\n",
        "def new_model(train_x, train_z):\n",
        "  likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
        "  model = ExactGPModel(train_x, train_z, likelihood)\n",
        "  model.train()\n",
        "  likelihood.train()\n",
        "  train_model(model, likelihood, X_gpr, Z_gpr, print_hp=False)\n",
        "  return model\n",
        "\n",
        "size_list = []\n",
        "rmse_train_list = []\n",
        "r2_list = []\n",
        "rmse_test_list = []\n",
        "q2_list = []\n",
        "\n",
        "training_set_sizes = [696, 600, 500, 400, 300, 200, 100]\n",
        "for set_size in training_set_sizes:\n",
        "  X_gpr, Z_gpr = reduce_training_set(X_gpr, Z_gpr, set_size)\n",
        "  model = new_model(X_gpr, Z_gpr)\n",
        "  rmse_train, r2, rmse_test, q2 = evaluate_model(X_gpr, Z_gpr, X_test, Z_test, model)\n",
        "  size_list.append(set_size)\n",
        "  rmse_train_list.append(rmse_train)\n",
        "  r2_list.append(r2)\n",
        "  rmse_test_list.append(rmse_test)\n",
        "  q2_list.append(q2)\n",
        "\n",
        "training_set_dict = {\n",
        "    'Training Set Size': size_list,\n",
        "    'Training Set RMSE': rmse_train_list,\n",
        "    'R^2': r2_list,\n",
        "    'Testing Set RMSE': rmse_test_list,\n",
        "    'Q^2': q2_list\n",
        "}\n",
        "\n",
        "print(tabulate.tabulate(training_set_dict, headers = 'keys', floatfmt=\"9.4f\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R0u-usaNVC9Y"
      },
      "source": [
        "### Comparing Kernels\n",
        "\n",
        "Now we can compare GPR results using a different kernel. Before we used the RBF kernel, but now we can compare the performance of the RBF kernel with the Matern and Rational Quadratic (RQ) kernels."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tqXYsniVKeoe",
        "outputId": "19bce22e-7a93-4b12-a979-12a1ebbce6a8"
      },
      "outputs": [],
      "source": [
        "# Setup GPR Model: Taken directly From gpytorch tutorial with minor changes\n",
        "class GPModel_kernel(gpytorch.models.ExactGP):\n",
        "    def __init__(self, train_x, train_Z, likelihood, kernel):\n",
        "        super().__init__(train_x, train_Z, likelihood)\n",
        "        self.mean_module = gpytorch.means.ConstantMean()\n",
        "        self.covar_module = kernel\n",
        "\n",
        "    def forward(self, x):\n",
        "        mean_x = self.mean_module(x)\n",
        "        covar_x = self.covar_module(x)\n",
        "        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n",
        "\n",
        "X_gpr = torch.from_numpy(np.column_stack((X_train, Y_train)))\n",
        "Z_gpr = torch.from_numpy(Z_train)\n",
        "\n",
        "kernels = [\n",
        "    gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()),\n",
        "    gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel()),\n",
        "    gpytorch.kernels.ScaleKernel(gpytorch.kernels.RQKernel())\n",
        "]\n",
        "\n",
        "def train_model_k(model, likelihood):\n",
        "  training_iter = 100\n",
        "  # Use the SGD optimizer\n",
        "  optimizer = torch.optim.SGD(model.parameters(), lr=0.1)\n",
        "  # \"Loss\" for GPs - the marginal log likelihood\n",
        "  mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
        "  for i in range(training_iter):\n",
        "    # Zero gradients from previous iteration\n",
        "    optimizer.zero_grad()\n",
        "    # Output from model\n",
        "    output = model(X_gpr)\n",
        "    # Calc loss and backprop gradients\n",
        "    loss = -mll(output, Z_gpr)\n",
        "    loss.backward()\n",
        "    optimizer.step()\n",
        "\n",
        "for kernel in kernels:\n",
        "  # Initialize Likelihood and Model\n",
        "  likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
        "  model = GPModel_kernel(X_gpr, Z_gpr, likelihood, kernel)\n",
        "  model.train()\n",
        "  likelihood.train()\n",
        "  train_model_k(model, likelihood)\n",
        "  rmse_train, r2, rmse_test, q2 = evaluate_model(X_gpr, Z_gpr, X_test, Z_test, model)\n",
        "  kernel_name = str(kernel.base_kernel).split('(')\n",
        "  #Print the kernel we are running above the rmse_train output.\n",
        "  print(f'Kernel: {kernel_name[0]}')\n",
        "  print(f'RMSE:   {rmse_train.item():>6.4f}')"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "include_colab_link": true,
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3.8.8 ('base')",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.13"
    },
    "vscode": {
      "interpreter": {
        "hash": "cd78fef2128015050713e82ca51c6520b11aee7c9ee8df750520bbbc7384cbaa"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}