{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "FMj6evQUH4rQ" }, "source": [ "# Feedforward Neural Network Models\n", "\n", "In this tutorial, we will learn how to use a neural network model to predict the energy of points on the Müller-Brown potential energy surface. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "G5gEWLszUpQV" }, "source": [ "## Defining the Müller-Brown Potential Energy Function\n", "\n", "For the definition of Müller-Brown potential, see [here](https://www.wolframcloud.com/objects/demonstrations/TrajectoriesOnTheMullerBrownPotentialEnergySurface-source.nb).\n", "\n", "$v(x,y) = \\sum_{k=0}^3 A_k \\mathrm{exp}\\left[ a_k (x - x_k^0)^2 + b_k (x - x_k^0) (y - y_k^0) + c_k (y - y_k^0)^2 \\right] $" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "N0JzzAZ1UZdx" }, "outputs": [], "source": [ "from math import exp, pow\n", "\n", "def mueller_brown_potential(x, y):\n", " A = [-200, -100, -170, 15]\n", " a = [-1, -1, -6.5, 0.7]\n", " b = [0, 0, 11, 0.6]\n", " c = [-10, -10, -6.5, 0.7]\n", " x0 = [1, 0, -0.5, -1.0]\n", " y0 = [0, 0.5, 1.5, 1]\n", " z = 0\n", " for k in range(4):\n", " # Scale the function by 0.1 to make plotting easier\n", " z += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2))\n", " return z" ] }, { "cell_type": "markdown", "metadata": { "id": "4GfIwnIKU9E-" }, "source": [ "### Generate Training Data\n", "\n", "First, we need to generate data to train the neural network.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "# generate x and y on a grid\n", "x_range = np.arange(-1.8, 1.4, 0.1, dtype=np.float32)\n", "y_range = np.arange(-0.4, 2.4, 0.1, dtype=np.float32)\n", "X, Y = np.meshgrid(x_range, y_range)\n", "\n", "# compute the potential energy at each point on the grid\n", "mueller_brown_potential_vectorized = np.vectorize(mueller_brown_potential, otypes=[np.float32])\n", "Z = mueller_brown_potential_vectorized(X, Y)\n", "\n", "# keep only low-energy points for training\n", "train_mask = Z < 10\n", "X_train, Y_train, Z_train = X[train_mask], Y[train_mask], Z[train_mask]\n", "\n", "print(f\"Z_min: {np.min(Z)}, Z_max: {np.max(Z)}\")\n", "print(f\"Size of (future) training set: {len(Z_train)}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "GHmBKzxwVRGf" }, "source": [ "### Visualizing Training Data: 3-D Projection Surface\n", "\n", "We will now create a 3-D plot of our training data. We are going to use the `plotly` library to create an interactive plot. We only plot the part of the potential energy surface that is below 15 (arbitrary unit). For the actual training, only the part that is below 10 (arbitrary unit) will be used." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 517 }, "id": "fj9-VeCjWRPR", "outputId": "86069b2e-543d-4d4e-c8a4-3fefd89275ea" }, "outputs": [], "source": [ "import plotly.graph_objects as go\n", "\n", "fig = go.Figure(data=[go.Surface(z=Z, x=X, y=Y, colorscale='rainbow', cmin=-15, cmax=9)])\n", "fig.update_traces(contours_z=dict(show=True, project_z=True))\n", "fig.update_layout(title='Mueller-Brown Potential', width=500, height=500,\n", " scene = dict(\n", " zaxis = dict(dtick=3, range=[-15, 15]),\n", " camera_eye = dict(x=-1.2, y=-1.2, z=1.2)))" ] }, { "cell_type": "markdown", "metadata": { "id": "S_BieFeHXuD7" }, "source": [ "### Visualizing Training Data: Contour Surface\n", "\n", "Now we will create a contour surface plot of our training data. Since we will plot similar contour surface plots later, we will define a function to do this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "def plot_contour_map(X, Y, Z, ax, title, colorscale='rainbow', levels=None):\n", " if levels is None:\n", " levels = [-12, -8, -4, 0, 4, 8, 10]\n", " ct = ax.contour(X, Y, Z, levels, colors='k')\n", " ax.clabel(ct, inline=True, fmt='%3.0f', fontsize=8)\n", " ct = ax.contourf(X, Y, Z, levels, cmap=colorscale, extend='both', vmin=levels[0], vmax=levels[-1])\n", " ax.set_xlabel(\"x\", labelpad=-0.75)\n", " ax.set_ylabel(\"y\", labelpad=2.5)\n", " ax.tick_params(axis='both', pad=2, labelsize=8)\n", " cbar = plt.colorbar(ct)\n", " cbar.ax.tick_params(labelsize=8)\n", " ax.set_title(title, fontsize=8)\n", "\n", "\n", "fig, ax = plt.subplots(figsize=(3, 2.5), dpi=150)\n", "plot_contour_map(X, Y, Z, ax=ax, title='Mueller-Brown Potential')\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": { "id": "LQUmBoB2cT4s" }, "source": [ "## Defining the Neural Network Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Basic Neural Network \n", "\n", "Below is a schematic of a neural network. Inputs ($x$, $y$) are given weights ($w$). For each neuron in the hidden layer a bias ($b$) is added to the weighted inputs to generate a value ($v$) for the activation function (tanh). The activation function decides the activity ($v'$) of each neuron in the hidden layer. Weights ($w'$), biases ($b'$), and the activation function on the neuron of the output layer produces the prediction ($V^{pred}$). " ] }, { "cell_type": "markdown", "metadata": { "id": "VXw5fMG99XMv" }, "source": [ "![NN_diagram.png]()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we define our neural network as a Python class. A function ($\\it{train\\_loop}$) is used to loop through our training data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "z3Si6tnBFj93" }, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "\n", "# Set default data type to float32\n", "torch.set_default_dtype(torch.float32)\n", "\n", "\n", "class NeuralNetwork(nn.Module):\n", " def __init__(self, n_hidden=20):\n", " \"\"\"\n", " Args:\n", " n_hidden (int): Number of neurons in the hidden layer.\n", " \"\"\"\n", " super().__init__()\n", " self.linear = nn.Sequential(\n", " nn.Linear(2, n_hidden), # Linear function taking 2 inputs and outputs data for n_hidden neurons.\n", " nn.Tanh(),\n", " nn.Linear(n_hidden, 1) # Linear function taking data from n_hidden neurons and producing one output value. \n", " )\n", "\n", " def forward(self, x): \n", " return self.linear(x)\n", "\n", "\n", "def train_loop(dataloader, model, loss_fn, optimizer):\n", " model.train() # Set model to training mode.\n", " num_batches = len(dataloader)\n", " train_loss = 0\n", " for X, y in dataloader:\n", " # Compute prediction and loss\n", " pred = model(X)\n", " loss = loss_fn(pred.squeeze(), y)\n", "\n", " # Backpropagation - using the loss function gradients to update the weights and biases.\n", " optimizer.zero_grad() # Zero out the gradients from the previous iteration to replace them. \n", " loss.backward() # Update the gradients of the loss function.\n", " optimizer.step() # Uses step() to optimize the weights and biases with the updated gradients. \n", "\n", " with torch.no_grad():\n", " train_loss += loss.item()\n", "\n", " train_loss /= num_batches\n", " return train_loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading PyTorch and Training Data\n", "\n", "After installing and importing pytorch, we will save our training data as a tensor data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import TensorDataset, DataLoader\n", "\n", "# turn NumPy arrays into PyTorch tensors\n", "X_tensor = torch.from_numpy(np.column_stack((X_train, Y_train)))\n", "Y_tensor = torch.from_numpy(Z_train)\n", "\n", "dataset = TensorDataset(X_tensor, Y_tensor)\n", "print(f\"Size of training set: {len(dataset)}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "TXpLmuWBcbpg" }, "source": [ "## Training the Model\n", "\n", "Now we can train the neural network model. We will finish our training when the desired number of epochs has been reached. We will also define some terms used for training below.\n", "\n", "Epochs - Number of forward/backward passes through the entire neural network. \n", "\n", "Learning Rate - Determines the step size as we try to minimize the loss function. A faster learning rate would have a larger step size.\n", "\n", "Stochastic Gradient Descent (SGD) - The algorithm used for minimizing the loss function.\n", "\n", "Note: In this example, there are 696 training points broken into 87 batches of batch size 8." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PGry3oNqGjof", "outputId": "6d220c99-5d29-4278-c11f-3be60e6e1ca7" }, "outputs": [], "source": [ "# Set random seed for reproducibility\n", "torch.manual_seed(314)\n", "\n", "# Hyperparameters\n", "batch_size = 32\n", "epochs = 1000\n", "learning_rate = 1e-2\n", "n_hidden = 20\n", "\n", "train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n", "model = NeuralNetwork(n_hidden)\n", "loss_fn = nn.MSELoss()\n", "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n", "\n", "print(\"Training Errors:\")\n", "for epoch in range(epochs):\n", " train_loss = train_loop(train_loader, model, loss_fn, optimizer)\n", " if (epoch + 1) % 100 == 0:\n", " print(f\"epoch: {epoch:>3d} loss: {train_loss:>6.3f}\")\n", "print(\"Done with Training!\")" ] }, { "cell_type": "markdown", "metadata": { "id": "aNfktUgEc17B" }, "source": [ "## Plotting Reference, Predicted, and Difference Surfaces\n", "\n", "Finally, we will plot the Müller-Brown potential energy surface using the analytical function (reference), using the neural network (predicted), and we will show the difference between the predicted and reference surfaces." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.eval() # Set model to evaluation mode\n", "Z_pred = model(torch.from_numpy(np.column_stack((X.flatten(), Y.flatten())))).detach().numpy().reshape(Z.shape)\n", "Z_diff = Z_pred - Z\n", "\n", "fig, ax = plt.subplots(3, figsize=(3, 7.5), dpi=150)\n", "plot_contour_map(X, Y, Z, ax=ax[0], title='(a) Reference')\n", "plot_contour_map(X, Y, Z_pred, ax=ax[1], title='(b) Predicted')\n", "plot_contour_map(X, Y, Z_diff, ax=ax[2], title='(c) Difference', levels=[-4, -2, 0, 2, 4])\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": { "id": "Jb4-6Rp_dDjn" }, "source": [ "## Taking a Look at the NN parameters\n", "\n", "In order to take a closer look at the neural network parameters, we will plug them into the linear functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fZiTIfxDN9el", "outputId": "66e2d9d5-7620-4a23-9679-9e78944f3af5" }, "outputs": [], "source": [ "print(\"model:\", model)\n", "\n", "for name, param in model.named_parameters():\n", " if name == 'linear.0.weight': weights0 = param.data.detach().numpy()\n", " elif name == 'linear.0.bias': bias0 = param.data.detach().numpy()\n", " elif name == 'linear.2.weight': weights2 = param.data.detach().numpy()\n", " elif name == 'linear.2.bias': bias2 = param.data.detach().numpy()\n", "\n", "xy0 = torch.tensor([-0.5, 1.5], dtype=torch.float32)\n", "z0 = model(xy0)\n", "\n", "#first linear function\n", "v1 = np.zeros(n_hidden)\n", "for i in range(n_hidden):\n", " v1[i] += weights0[i, 0] * xy0[0] + weights0[i, 1] * xy0[1] + bias0[i]\n", "\n", "#activation function\n", "v2 = np.zeros(n_hidden)\n", "for i in range(n_hidden):\n", " v2[i] = np.tanh(v1[i])\n", "\n", "#second linear function\n", "z_pred = 0.0\n", "for i in range(n_hidden):\n", " z_pred += weights2[0,i] * v2[i]\n", "z_pred += bias2[0]\n", "\n", "print(\"z0:\", z0, \"z_pred:\", z_pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "bM_h4sSEccJV" }, "source": [ "## A More Automated/Refined Implementation\n", "\n", "### A PyTorch-Lightning Implementation\n", "\n", "Below is a more professional implementation of the neural network that saves epoch infromation in the logs_csv/ directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ks_KCDRQ3tDk" }, "outputs": [], "source": [ "%pip install lightning --quiet\n", "import lightning.pytorch as pl\n", "from lightning.pytorch.loggers import CSVLogger" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 272 }, "id": "Ki02WaRgHFar", "outputId": "74783b71-fe19-4282-f2ea-1a2db3ff942d" }, "outputs": [], "source": [ "class NeuralNetworkLightning(pl.LightningModule):\n", " def __init__(self, model, learning_rate):\n", " super().__init__()\n", " self.model = model\n", " self.learning_rate = learning_rate\n", "\n", " def forward(self, x):\n", " return self.model(x)\n", "\n", " def training_step(self, batch, batch_idx):\n", " x, y = batch\n", " y_pred = self.model(x)\n", " loss = F.mse_loss(y_pred.squeeze(), y)\n", " self.log('train_loss', loss)\n", " return loss\n", "\n", " def configure_optimizers(self):\n", " # Using the Adam optimzation algorithm instead of SGD. \n", " optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)\n", " return [optimizer]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Set random seed for reproducibility\n", "pl.seed_everything(314, workers=True);\n", "\n", "# Hyperparameters\n", "batch_size = 32\n", "epochs = 1000\n", "learning_rate = 1e-2\n", "n_hidden = 20\n", "\n", "train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n", "csv_logger = CSVLogger('logs_csv', version=0)\n", "trainer = pl.Trainer(max_epochs=epochs, logger=csv_logger, deterministic=True)\n", "model = NeuralNetworkLightning(NeuralNetwork(n_hidden), learning_rate)\n", "trainer.fit(model, train_loader)" ] }, { "cell_type": "markdown", "metadata": { "id": "-lIsI61egS0Q" }, "source": [ "### Plotting Training Error\n", "\n", "Now we can plot the training error as the number of epochs increases.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 466 }, "id": "phUovRNxYrHE", "outputId": "2c2c4688-bc77-49c9-c824-06952372ab35" }, "outputs": [], "source": [ "import pandas as pd\n", "loss = pd.read_csv(\"logs_csv/lightning_logs/version_0/metrics.csv\")\n", "\n", "fig, ax = plt.subplots()\n", "ax.semilogy(loss[\"epoch\"], loss[\"train_loss\"])\n", "ax.set_xlabel(\"Epoch\")\n", "ax.set_ylabel(\"Training Errors\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "MexXVUU3gfOB" }, "source": [ "### Plotting Reference, Predicted, and Difference Surfaces\n", "\n", "Again, we plot the reference, predicted, and difference surfaces using the more refined neural network implementation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "OZFqz5UVGIrt", "outputId": "753d0404-7cf1-4588-d435-17e5fd4d7070" }, "outputs": [], "source": [ "model.eval() # Set model to evaluation mode\n", "Z_pred = model(torch.from_numpy(np.column_stack((X.flatten(), Y.flatten())))).detach().numpy().reshape(Z.shape)\n", "Z_diff = Z_pred - Z\n", "\n", "fig, ax = plt.subplots(3, figsize=(3, 7.5), dpi=150)\n", "plot_contour_map(X, Y, Z, ax=ax[0], title='(a) Reference')\n", "plot_contour_map(X, Y, Z_pred, ax=ax[1], title='(b) Predicted')\n", "plot_contour_map(X, Y, Z_diff, ax=ax[2], title='(c) Difference', levels=[-4, -2, 0, 2, 4])\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train with Energy and Gradient" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mueller_brown_potential_gradient(x, y):\n", " A = [-200, -100, -170, 15]\n", " a = [-1, -1, -6.5, 0.7]\n", " b = [0, 0, 11, 0.6]\n", " c = [-10, -10, -6.5, 0.7]\n", " x0 = [1, 0, -0.5, -1.0]\n", " y0 = [0, 0.5, 1.5, 1]\n", " dx = 0\n", " dy = 0\n", " for k in range(4):\n", " dx += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2)) * (a[k] * 2 *(x-x0[k]) + b[k] * (y-y0[k]))\n", " dy += 0.1 * A[k] * exp(a[k] * pow(x-x0[k], 2) + b[k] * (x-x0[k]) * (y-y0[k]) + c[k] * pow(y-y0[k], 2)) * (b[k] * (x-x0[k])+ c[k] * 2 * (y-y0[k]))\n", " return dx, dy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the potential energy gradient at each point on the grid\n", "mueller_brown_potential_gradient_vectorized = np.vectorize(mueller_brown_potential_gradient, otypes=[np.float32, np.float32])\n", "dX, dY = mueller_brown_potential_gradient_vectorized(X, Y)\n", "\n", "# keep only low-energy points for training\n", "dX_train, dY_train = dX[train_mask], dY[train_mask]\n", "\n", "dX_tensor = torch.from_numpy(np.column_stack((dX_train, dY_train)))\n", "dataset = TensorDataset(X_tensor, Y_tensor, dX_tensor)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_loop_with_gradient(dataloader, model, loss_fn, optimizer):\n", " model.train() # Set model to training mode.\n", " num_batches = len(dataloader)\n", " train_loss = 0\n", " train_loss_energy = 0\n", " train_loss_gradient = 0\n", " for X, y, dX in dataloader:\n", " # Compute prediction and loss\n", " y_pred = model(X)\n", " dX_pred = torch.autograd.grad(y_pred, X, grad_outputs=torch.ones_like(y_pred), create_graph=True)[0]\n", " loss_energy = loss_fn(y_pred.squeeze(), y)\n", " loss_gradient = loss_fn(dX_pred.squeeze(), dX)\n", " loss = loss_energy + loss_gradient\n", "\n", " # Backpropagation - using the loss function gradients to update the weights and biases.\n", " optimizer.zero_grad() # Zero out the gradients from the previous iteration to replace them. \n", " loss.backward() # Update the gradients of the loss function.\n", " optimizer.step() # Uses step() to optimize the weights and biases with the updated gradients. \n", "\n", " with torch.no_grad():\n", " train_loss += loss.item()\n", " train_loss_energy += loss_energy.item()\n", " train_loss_gradient += loss_gradient.item()\n", "\n", " train_loss /= num_batches\n", " train_loss_energy /= num_batches\n", " train_loss_gradient /= num_batches\n", " return train_loss, train_loss_energy, train_loss_gradient" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set random seed for reproducibility\n", "torch.manual_seed(314)\n", "\n", "# Hyperparameters\n", "batch_size = 32\n", "epochs = 1000\n", "learning_rate = 1e-3\n", "n_hidden = 20\n", "\n", "train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n", "model = NeuralNetwork(n_hidden)\n", "loss_fn = nn.MSELoss()\n", "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n", "\n", "print(\"Training errors:\")\n", "for epoch in range(epochs):\n", " X_tensor.requires_grad_(True)\n", " train_loss, train_loss_energy, train_loss_gradient = train_loop_with_gradient(train_loader, model, loss_fn, optimizer)\n", " if (epoch + 1) % 100 == 0:\n", " print(f\"epoch: {epoch:>3d} loss: {train_loss:>6.3f} energy loss: {train_loss_energy:>6.3f} gradient loss: {train_loss_gradient:>6.3f}\")\n", "print(\"Done with Training!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.eval() # Set model to evaluation mode\n", "Z_pred = model(torch.from_numpy(np.column_stack((X.flatten(), Y.flatten())))).detach().numpy().reshape(Z.shape)\n", "Z_diff = Z_pred - Z\n", "\n", "fig, ax = plt.subplots(3, figsize=(3, 7.5), dpi=150)\n", "plot_contour_map(X, Y, Z, ax=ax[0], title='(a) Reference')\n", "plot_contour_map(X, Y, Z_pred, ax=ax[1], title='(b) Predicted')\n", "plot_contour_map(X, Y, Z_diff, ax=ax[2], title='(c) Difference', levels=[-4, -2, 0, 2, 4])\n", "fig.tight_layout()" ] } ], "metadata": { "colab": { "include_colab_link": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3.8.8 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" }, "vscode": { "interpreter": { "hash": "cd78fef2128015050713e82ca51c6520b11aee7c9ee8df750520bbbc7384cbaa" } } }, "nbformat": 4, "nbformat_minor": 0 }