{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Лабораторная работа 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1) Полносвязная нейронная сеть ( Fully-Connected Neural Network)\n", "\n", "2) Нормализация по мини-батчам (Batch normalization)\n", "\n", "3) Dropout\n", "\n", "4) Сверточные нейронные сети (Convolutional Networks)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Лабораторные работы можно выполнять с использованием сервиса Google Colaboratory (https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d) или на локальном компьютере. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Полносвязная нейронная сеть" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В данной лабораторной работе необходимо будет реализовать полносвязную нейронную сеть, используя модульный подход. Для каждого слоя реализации прямого и обратного проходов алгоритма обратного распространения ошибки будут иметь следующий вид:\n", "\n", "```python\n", "def layer_forward(x, w):\n", " \"\"\" Receive inputs x and weights w \"\"\"\n", " # Do some computations ...\n", " z = # ... some intermediate value\n", " # Do some more computations ...\n", " out = # the output\n", " \n", " cache = (x, w, z, out) # Values we need to compute gradients\n", " \n", " return out, cache\n", "```\n", "\n", "\n", "\n", "```python\n", "def layer_backward(dout, cache):\n", " \"\"\"\n", " Receive dout (derivative of loss with respect to outputs) and cache,\n", " and compute derivative with respect to inputs.\n", " \"\"\"\n", " # Unpack cache values\n", " x, w, z, out = cache\n", " \n", " # Use values in cache to compute derivatives\n", " dx = # Derivative of loss with respect to x\n", " dw = # Derivative of loss with respect to w\n", " \n", " return dx, dw\n", "```\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=========== You can safely ignore the message below if you are NOT working on ConvolutionalNetworks.ipynb ===========\n", "\tYou will need to compile a Cython extension for a portion of this assignment.\n", "\tThe instructions to do this will be given in a section of the notebook below.\n", "\tThere will be an option for Colab users and another for Jupyter (local) users.\n" ] } ], "source": [ "from __future__ import print_function\n", "import time\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scripts.classifiers.fc_net import *\n", "\n", "from scripts.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n", "from scripts.solver import Solver\n", "from scripts.classifiers.cnn import *\n", "from scripts.layers import *\n", "from scripts.fast_layers import *\n", "\n", "\n", "%matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10.0, 8.0) \n", "plt.rcParams['image.interpolation'] = 'nearest'\n", "plt.rcParams['image.cmap'] = 'gray'\n", "\n", "# for auto-reloading external modules\n", "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "def rel_error(x, y):\n", " \"\"\" returns relative error \"\"\"\n", " return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n", "def print_mean_std(x,axis=0):\n", " print(' means: ', x.mean(axis=axis))\n", " print(' stds: ', x.std(axis=axis))\n", " print() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Загрузите данные из предыдущей лабораторной работы. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Для полносвязного слоя реализуйте прямой проход (метод affine_forward в scripts/layers.py). Протестируйте свою реализацию. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_inputs = 2\n", "input_shape = (4, 5, 6)\n", "output_dim = 3\n", "\n", "input_size = num_inputs * np.prod(input_shape)\n", "weight_size = output_dim * np.prod(input_shape)\n", "\n", "x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n", "w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n", "b = np.linspace(-0.3, 0.1, num=output_dim)\n", "\n", "out, _ = affine_forward(x, w, b)\n", "correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],\n", " [ 3.25553199, 3.5141327, 3.77273342]])\n", "\n", "\n", "print('Testing affine_forward function:')\n", "print('difference: ', rel_error(out, correct_out))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Для полносвязного слоя реализуйте обратный проход (метод affine_backward в scripts/layers.py). Протестируйте свою реализацию. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(10, 2, 3)\n", "w = np.random.randn(6, 5)\n", "b = np.random.randn(5)\n", "dout = np.random.randn(10, 5)\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n", "dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n", "db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n", "\n", "_, cache = affine_forward(x, w, b)\n", "dx, dw, db = affine_backward(dout, cache)\n", "\n", "print('Testing affine_backward function:')\n", "print('dx error: ', rel_error(dx_num, dx))\n", "print('dw error: ', rel_error(dw_num, dw))\n", "print('db error: ', rel_error(db_num, db))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте прямой проход для слоя активации ReLU (relu_forward) и протестируйте его." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n", "\n", "out, _ = relu_forward(x)\n", "correct_out = np.array([[ 0., 0., 0., 0., ],\n", " [ 0., 0., 0.04545455, 0.13636364,],\n", " [ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])\n", "\n", "# Compare your output with ours. The error should be on the order of e-8\n", "print('Testing relu_forward function:')\n", "print('difference: ', rel_error(out, correct_out))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте обратный проход для слоя активации ReLU (relu_backward ) и протестируйте его." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(10, 10)\n", "dout = np.random.randn(*x.shape)\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n", "\n", "_, cache = relu_forward(x)\n", "dx = relu_backward(dout, cache)\n", "\n", "# The error should be on the order of e-12\n", "print('Testing relu_backward function:')\n", "print('dx error: ', rel_error(dx_num, dx))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В скрипте /layer_utils.py приведены реализации прямого и обратного проходов для часто используемых комбинаций слоев. Например, за полносвязным слоем часто следует слой активации. Ознакомьтесь с функциями affine_relu_forward и affine_relu_backward, запустите код ниже и убедитесь, что ошибка порядка e-10 или ниже. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scripts.layer_utils import affine_relu_forward, affine_relu_backward\n", "np.random.seed(231)\n", "x = np.random.randn(2, 3, 4)\n", "w = np.random.randn(12, 10)\n", "b = np.random.randn(10)\n", "dout = np.random.randn(2, 10)\n", "\n", "out, cache = affine_relu_forward(x, w, b)\n", "dx, dw, db = affine_relu_backward(dout, cache)\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n", "dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n", "db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n", "\n", "# Relative error should be around e-10 or less\n", "print('Testing affine_relu_forward and affine_relu_backward:')\n", "print('dx error: ', rel_error(dx_num, dx))\n", "print('dw error: ', rel_error(dw_num, dw))\n", "print('db error: ', rel_error(db_num, db))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте двухслойную полносвязную сеть - класс TwoLayerNet в scripts/classifiers/fc_net.py . Проверьте свою реализацию, запустив код ниже. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "N, D, H, C = 3, 5, 50, 7\n", "X = np.random.randn(N, D)\n", "y = np.random.randint(C, size=N)\n", "\n", "std = 1e-3\n", "model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n", "\n", "print('Testing initialization ... ')\n", "W1_std = abs(model.params['W1'].std() - std)\n", "b1 = model.params['b1']\n", "W2_std = abs(model.params['W2'].std() - std)\n", "b2 = model.params['b2']\n", "assert W1_std < std / 10, 'First layer weights do not seem right'\n", "assert np.all(b1 == 0), 'First layer biases do not seem right'\n", "assert W2_std < std / 10, 'Second layer weights do not seem right'\n", "assert np.all(b2 == 0), 'Second layer biases do not seem right'\n", "\n", "print('Testing test-time forward pass ... ')\n", "model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n", "model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n", "model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n", "model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n", "X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n", "scores = model.loss(X)\n", "correct_scores = np.asarray(\n", " [[11.53165108, 12.2917344, 13.05181771, 13.81190102, 14.57198434, 15.33206765, 16.09215096],\n", " [12.05769098, 12.74614105, 13.43459113, 14.1230412, 14.81149128, 15.49994135, 16.18839143],\n", " [12.58373087, 13.20054771, 13.81736455, 14.43418138, 15.05099822, 15.66781506, 16.2846319 ]])\n", "scores_diff = np.abs(scores - correct_scores).sum()\n", "assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n", "\n", "print('Testing training loss (no regularization)')\n", "y = np.asarray([0, 5, 1])\n", "loss, grads = model.loss(X, y)\n", "correct_loss = 3.4702243556\n", "assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n", "\n", "model.reg = 1.0\n", "loss, grads = model.loss(X, y)\n", "correct_loss = 26.5948426952\n", "assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n", "\n", "# Errors should be around e-7 or less\n", "for reg in [0.0, 0.7]:\n", " print('Running numeric gradient check with reg = ', reg)\n", " model.reg = reg\n", " loss, grads = model.loss(X, y)\n", "\n", " for name in sorted(grads):\n", " f = lambda _: model.loss(X, y)[0]\n", " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n", " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ознакомьтесь с API для обучения и тестирования моделей в scripts/solver.py . Используйте экземпляр класса Solver для обучения двухслойной полносвязной сети. Необходимо достичь минимум 50% верно классифицированных объектов на валидационном наборе. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = TwoLayerNet()\n", "solver = None\n", "\n", "##############################################################################\n", "# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least #\n", "# 50% accuracy on the validation set. #\n", "##############################################################################\n", "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n", "\n", "pass\n", "\n", "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n", "##############################################################################\n", "# END OF YOUR CODE #\n", "##############################################################################" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.subplot(2, 1, 1)\n", "plt.title('Training loss')\n", "plt.plot(solver.loss_history, 'o')\n", "plt.xlabel('Iteration')\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.title('Accuracy')\n", "plt.plot(solver.train_acc_history, '-o', label='train')\n", "plt.plot(solver.val_acc_history, '-o', label='val')\n", "plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n", "plt.xlabel('Epoch')\n", "plt.legend(loc='lower right')\n", "plt.gcf().set_size_inches(15, 12)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Теперь реализуйте полносвязную сеть с произвольным числом скрытых слоев. Ознакомьтесь с классом FullyConnectedNet в scripts/classifiers/fc_net.py . Реализуйте инициализацию, прямой и обратный проходы." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "N, D, H1, H2, C = 2, 15, 20, 30, 10\n", "X = np.random.randn(N, D)\n", "y = np.random.randint(C, size=(N,))\n", "\n", "for reg in [0, 3.14]:\n", " print('Running check with reg = ', reg)\n", " model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n", " reg=reg, weight_scale=5e-2, dtype=np.float64)\n", "\n", " loss, grads = model.loss(X, y)\n", " print('Initial loss: ', loss)\n", " \n", " # Most of the errors should be on the order of e-7 or smaller. \n", " # NOTE: It is fine however to see an error for W2 on the order of e-5\n", " # for the check when reg = 0.0\n", " for name in sorted(grads):\n", " f = lambda _: model.loss(X, y)[0]\n", " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n", " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Попробуйте добиться эффекта переобучения на небольшом наборе изображений (например, 50). Используйте трехслойную сеть со 100 нейронами на каждом скрытом слое. Попробуйте переобучить сеть, достигнув 100 % accuracy за 20 эпох. Для этого поэкспериментируйте с параметрами weight_scale и learning_rate. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Use a three-layer Net to overfit 50 training examples by \n", "# tweaking just the learning rate and initialization scale.\n", "\n", "num_train = 50\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "weight_scale = 1e-2 # Experiment with this!\n", "learning_rate = 1e-4 # Experiment with this!\n", "model = FullyConnectedNet([100, 100],\n", " weight_scale=weight_scale, dtype=np.float64)\n", "solver = Solver(model, small_data,\n", " print_every=10, num_epochs=20, batch_size=25,\n", " update_rule='sgd',\n", " optim_config={\n", " 'learning_rate': learning_rate,\n", " }\n", " )\n", "solver.train()\n", "\n", "plt.plot(solver.loss_history, 'o')\n", "plt.title('Training loss history')\n", "plt.xlabel('Iteration')\n", "plt.ylabel('Training loss')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Повторите эксперимент, описанный выше, для пятислойной сети." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Use a five-layer Net to overfit 50 training examples by \n", "# tweaking just the learning rate and initialization scale.\n", "\n", "num_train = 50\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "learning_rate = 2e-3 # Experiment with this!\n", "weight_scale = 1e-5 # Experiment with this!\n", "model = FullyConnectedNet([100, 100, 100, 100],\n", " weight_scale=weight_scale, dtype=np.float64)\n", "solver = Solver(model, small_data,\n", " print_every=10, num_epochs=20, batch_size=25,\n", " update_rule='sgd',\n", " optim_config={\n", " 'learning_rate': learning_rate,\n", " }\n", " )\n", "solver.train()\n", "\n", "plt.plot(solver.loss_history, 'o')\n", "plt.title('Training loss history')\n", "plt.xlabel('Iteration')\n", "plt.ylabel('Training loss')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сделайте выводы по проведенному эксперименту. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ранее обновление весов проходило по правилу SGD. Теперь попробуйте реализовать стохастический градиентный спуск с импульсом (SGD+momentum). http://cs231n.github.io/neural-networks-3/#sgd Реализуйте sgd_momentum в scripts/optim.py и запустите проверку. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scripts.optim import sgd_momentum\n", "\n", "N, D = 4, 5\n", "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n", "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n", "v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n", "\n", "config = {'learning_rate': 1e-3, 'velocity': v}\n", "next_w, _ = sgd_momentum(w, dw, config=config)\n", "\n", "expected_next_w = np.asarray([\n", " [ 0.1406, 0.20738947, 0.27417895, 0.34096842, 0.40775789],\n", " [ 0.47454737, 0.54133684, 0.60812632, 0.67491579, 0.74170526],\n", " [ 0.80849474, 0.87528421, 0.94207368, 1.00886316, 1.07565263],\n", " [ 1.14244211, 1.20923158, 1.27602105, 1.34281053, 1.4096 ]])\n", "expected_velocity = np.asarray([\n", " [ 0.5406, 0.55475789, 0.56891579, 0.58307368, 0.59723158],\n", " [ 0.61138947, 0.62554737, 0.63970526, 0.65386316, 0.66802105],\n", " [ 0.68217895, 0.69633684, 0.71049474, 0.72465263, 0.73881053],\n", " [ 0.75296842, 0.76712632, 0.78128421, 0.79544211, 0.8096 ]])\n", "\n", "# Should see relative errors around e-8 or less\n", "print('next_w error: ', rel_error(next_w, expected_next_w))\n", "print('velocity error: ', rel_error(expected_velocity, config['velocity']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сравните результаты обучения шестислойной сети, обученной классическим градиентным спуском и адаптивным алгоритмом с импульсом. Какой алгоритм сходится быстрее." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_train = 4000\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "solvers = {}\n", "\n", "for update_rule in ['sgd', 'sgd_momentum']:\n", " print('running with ', update_rule)\n", " model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n", "\n", " solver = Solver(model, small_data,\n", " num_epochs=5, batch_size=100,\n", " update_rule=update_rule,\n", " optim_config={\n", " 'learning_rate': 5e-3,\n", " },\n", " verbose=True)\n", " solvers[update_rule] = solver\n", " solver.train()\n", " print()\n", "\n", "plt.subplot(3, 1, 1)\n", "plt.title('Training loss')\n", "plt.xlabel('Iteration')\n", "\n", "plt.subplot(3, 1, 2)\n", "plt.title('Training accuracy')\n", "plt.xlabel('Epoch')\n", "\n", "plt.subplot(3, 1, 3)\n", "plt.title('Validation accuracy')\n", "plt.xlabel('Epoch')\n", "\n", "for update_rule, solver in solvers.items():\n", " plt.subplot(3, 1, 1)\n", " plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n", " \n", " plt.subplot(3, 1, 2)\n", " plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n", "\n", " plt.subplot(3, 1, 3)\n", " plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n", " \n", "for i in [1, 2, 3]:\n", " plt.subplot(3, 1, i)\n", " plt.legend(loc='upper center', ncol=4)\n", "plt.gcf().set_size_inches(15, 15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте алгоритмы RMSProp [1] and Adam [2] с коррекцией смещения - методы rmsprop и adam . \n", "\n", "\n", "[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n", "\n", "[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Test RMSProp implementation\n", "from scripts.optim import rmsprop\n", "\n", "N, D = 4, 5\n", "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n", "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n", "cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n", "\n", "config = {'learning_rate': 1e-2, 'cache': cache}\n", "next_w, _ = rmsprop(w, dw, config=config)\n", "\n", "expected_next_w = np.asarray([\n", " [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n", " [-0.132737, -0.08078555, -0.02881884, 0.02316247, 0.07515774],\n", " [ 0.12716641, 0.17918792, 0.23122175, 0.28326742, 0.33532447],\n", " [ 0.38739248, 0.43947102, 0.49155973, 0.54365823, 0.59576619]])\n", "expected_cache = np.asarray([\n", " [ 0.5976, 0.6126277, 0.6277108, 0.64284931, 0.65804321],\n", " [ 0.67329252, 0.68859723, 0.70395734, 0.71937285, 0.73484377],\n", " [ 0.75037008, 0.7659518, 0.78158892, 0.79728144, 0.81302936],\n", " [ 0.82883269, 0.84469141, 0.86060554, 0.87657507, 0.8926 ]])\n", "\n", "# You should see relative errors around e-7 or less\n", "print('next_w error: ', rel_error(expected_next_w, next_w))\n", "print('cache error: ', rel_error(expected_cache, config['cache']))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Test Adam implementation\n", "from scripts.optim import adam\n", "\n", "N, D = 4, 5\n", "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n", "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n", "m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n", "v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n", "\n", "config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n", "next_w, _ = adam(w, dw, config=config)\n", "\n", "expected_next_w = np.asarray([\n", " [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n", " [-0.1380274, -0.08544591, -0.03286534, 0.01971428, 0.0722929],\n", " [ 0.1248705, 0.17744702, 0.23002243, 0.28259667, 0.33516969],\n", " [ 0.38774145, 0.44031188, 0.49288093, 0.54544852, 0.59801459]])\n", "expected_v = np.asarray([\n", " [ 0.69966, 0.68908382, 0.67851319, 0.66794809, 0.65738853,],\n", " [ 0.64683452, 0.63628604, 0.6257431, 0.61520571, 0.60467385,],\n", " [ 0.59414753, 0.58362676, 0.57311152, 0.56260183, 0.55209767,],\n", " [ 0.54159906, 0.53110598, 0.52061845, 0.51013645, 0.49966, ]])\n", "expected_m = np.asarray([\n", " [ 0.48, 0.49947368, 0.51894737, 0.53842105, 0.55789474],\n", " [ 0.57736842, 0.59684211, 0.61631579, 0.63578947, 0.65526316],\n", " [ 0.67473684, 0.69421053, 0.71368421, 0.73315789, 0.75263158],\n", " [ 0.77210526, 0.79157895, 0.81105263, 0.83052632, 0.85 ]])\n", "\n", "# You should see relative errors around e-7 or less\n", "print('next_w error: ', rel_error(expected_next_w, next_w))\n", "print('v error: ', rel_error(expected_v, config['v']))\n", "print('m error: ', rel_error(expected_m, config['m']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обучите пару глубоких сетей с испольованием RMSProp и Adam алгоритмов обновления весов и сравните результаты обучения." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Получите лучшую полносвязную сеть для классификации вашего набора данных. На наборе CIFAR-10 необходимо получить accuracy не ниже 50 % на валидационном наборе." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "best_model = None\n", "################################################################################\n", "# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might #\n", "# find batch/layer normalization and dropout useful. Store your best model in #\n", "# the best_model variable. #\n", "################################################################################\n", "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n", "\n", "pass\n", "\n", "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n", "################################################################################\n", "# END OF YOUR CODE #\n", "################################################################################" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Получите оценку accuracy для валидационной и тестовой выборок. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n", "y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n", "print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n", "print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Нормализация по мини-батчам\n", "\n", "Идея нормализации по мини-батчам предложена в работе [1]\n", "\n", "[1] Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\", ICML 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте прямой проход для слоя батч-нормализации - функция batchnorm_forward в scripts/layers.py . Проверьте свою реализацию, запустив следующий код:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Check the training-time forward pass by checking means and variances\n", "# of features both before and after batch normalization \n", "\n", "# Simulate the forward pass for a two-layer network\n", "np.random.seed(231)\n", "N, D1, D2, D3 = 200, 50, 60, 3\n", "X = np.random.randn(N, D1)\n", "W1 = np.random.randn(D1, D2)\n", "W2 = np.random.randn(D2, D3)\n", "a = np.maximum(0, X.dot(W1)).dot(W2)\n", "\n", "print('Before batch normalization:')\n", "print_mean_std(a,axis=0)\n", "\n", "gamma = np.ones((D3,))\n", "beta = np.zeros((D3,))\n", "# Means should be close to zero and stds close to one\n", "print('After batch normalization (gamma=1, beta=0)')\n", "a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n", "print_mean_std(a_norm,axis=0)\n", "\n", "gamma = np.asarray([1.0, 2.0, 3.0])\n", "beta = np.asarray([11.0, 12.0, 13.0])\n", "# Now means should be close to beta and stds close to gamma\n", "print('After batch normalization (gamma=', gamma, ', beta=', beta, ')')\n", "a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n", "print_mean_std(a_norm,axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Check the test-time forward pass by running the training-time\n", "# forward pass many times to warm up the running averages, and then\n", "# checking the means and variances of activations after a test-time\n", "# forward pass.\n", "\n", "np.random.seed(231)\n", "N, D1, D2, D3 = 200, 50, 60, 3\n", "W1 = np.random.randn(D1, D2)\n", "W2 = np.random.randn(D2, D3)\n", "\n", "bn_param = {'mode': 'train'}\n", "gamma = np.ones(D3)\n", "beta = np.zeros(D3)\n", "\n", "for t in range(50):\n", " X = np.random.randn(N, D1)\n", " a = np.maximum(0, X.dot(W1)).dot(W2)\n", " batchnorm_forward(a, gamma, beta, bn_param)\n", "\n", "bn_param['mode'] = 'test'\n", "X = np.random.randn(N, D1)\n", "a = np.maximum(0, X.dot(W1)).dot(W2)\n", "a_norm, _ = batchnorm_forward(a, gamma, beta, bn_param)\n", "\n", "# Means should be close to zero and stds close to one, but will be\n", "# noisier than training-time forward passes.\n", "print('After batch normalization (test-time):')\n", "print_mean_std(a_norm,axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте обратный проход в функции batchnorm_backward." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Gradient check batchnorm backward pass\n", "np.random.seed(231)\n", "N, D = 4, 5\n", "x = 5 * np.random.randn(N, D) + 12\n", "gamma = np.random.randn(D)\n", "beta = np.random.randn(D)\n", "dout = np.random.randn(N, D)\n", "\n", "bn_param = {'mode': 'train'}\n", "fx = lambda x: batchnorm_forward(x, gamma, beta, bn_param)[0]\n", "fg = lambda a: batchnorm_forward(x, a, beta, bn_param)[0]\n", "fb = lambda b: batchnorm_forward(x, gamma, b, bn_param)[0]\n", "\n", "dx_num = eval_numerical_gradient_array(fx, x, dout)\n", "da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n", "db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n", "\n", "_, cache = batchnorm_forward(x, gamma, beta, bn_param)\n", "dx, dgamma, dbeta = batchnorm_backward(dout, cache)\n", "#You should expect to see relative errors between 1e-13 and 1e-8\n", "print('dx error: ', rel_error(dx_num, dx))\n", "print('dgamma error: ', rel_error(da_num, dgamma))\n", "print('dbeta error: ', rel_error(db_num, dbeta))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Измените реализацию класса FullyConnectedNet, добавив батч-нормализацию. \n", "Если флаг normalization == \"batchnorm\", то вам необходимо вставить слой батч-нормализации перед каждым слоем активации ReLU, кроме выхода сети. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "N, D, H1, H2, C = 2, 15, 20, 30, 10\n", "X = np.random.randn(N, D)\n", "y = np.random.randint(C, size=(N,))\n", "\n", "# You should expect losses between 1e-4~1e-10 for W, \n", "# losses between 1e-08~1e-10 for b,\n", "# and losses between 1e-08~1e-09 for beta and gammas.\n", "for reg in [0, 3.14]:\n", " print('Running check with reg = ', reg)\n", " model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n", " reg=reg, weight_scale=5e-2, dtype=np.float64,\n", " normalization='batchnorm')\n", "\n", " loss, grads = model.loss(X, y)\n", " print('Initial loss: ', loss)\n", "\n", " for name in sorted(grads):\n", " f = lambda _: model.loss(X, y)[0]\n", " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n", " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n", " if reg == 0: print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обучите 6-ти слойную сеть на наборе из 1000 изображений с батч-нормализацией и без нее" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "# Try training a very deep net with batchnorm\n", "hidden_dims = [100, 100, 100, 100, 100]\n", "\n", "num_train = 1000\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "weight_scale = 2e-2\n", "bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n", "model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n", "\n", "print('Solver with batch norm:')\n", "bn_solver = Solver(bn_model, small_data,\n", " num_epochs=10, batch_size=50,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': 1e-3,\n", " },\n", " verbose=True,print_every=20)\n", "bn_solver.train()\n", "\n", "print('\\nSolver without batch norm:')\n", "solver = Solver(model, small_data,\n", " num_epochs=10, batch_size=50,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': 1e-3,\n", " },\n", " verbose=True, print_every=20)\n", "solver.train()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Визуализируйте процесс обучения для двух сетей. Увеличилась ли скорость сходимости в случае с батч-нормализацией? Сделайте выводы. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_training_history(title, label, baseline, bn_solvers, plot_fn, bl_marker='.', bn_marker='.', labels=None):\n", " \"\"\"utility function for plotting training history\"\"\"\n", " plt.title(title)\n", " plt.xlabel(label)\n", " bn_plots = [plot_fn(bn_solver) for bn_solver in bn_solvers]\n", " bl_plot = plot_fn(baseline)\n", " num_bn = len(bn_plots)\n", " for i in range(num_bn):\n", " label='with_norm'\n", " if labels is not None:\n", " label += str(labels[i])\n", " plt.plot(bn_plots[i], bn_marker, label=label)\n", " label='baseline'\n", " if labels is not None:\n", " label += str(labels[0])\n", " plt.plot(bl_plot, bl_marker, label=label)\n", " plt.legend(loc='lower center', ncol=num_bn+1) \n", "\n", " \n", "plt.subplot(3, 1, 1)\n", "plot_training_history('Training loss','Iteration', solver, [bn_solver], \\\n", " lambda x: x.loss_history, bl_marker='o', bn_marker='o')\n", "plt.subplot(3, 1, 2)\n", "plot_training_history('Training accuracy','Epoch', solver, [bn_solver], \\\n", " lambda x: x.train_acc_history, bl_marker='-o', bn_marker='-o')\n", "plt.subplot(3, 1, 3)\n", "plot_training_history('Validation accuracy','Epoch', solver, [bn_solver], \\\n", " lambda x: x.val_acc_history, bl_marker='-o', bn_marker='-o')\n", "\n", "plt.gcf().set_size_inches(15, 15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обучите 6-тислойную сеть с батч-нормализацией и без нее, используя разные размеры батча. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def run_batchsize_experiments(normalization_mode):\n", " np.random.seed(231)\n", " # Try training a very deep net with batchnorm\n", " hidden_dims = [100, 100, 100, 100, 100]\n", " num_train = 1000\n", " small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", " }\n", " n_epochs=10\n", " weight_scale = 2e-2\n", " batch_sizes = [5,10,50]\n", " lr = 10**(-3.5)\n", " solver_bsize = batch_sizes[0]\n", "\n", " print('No normalization: batch size = ',solver_bsize)\n", " model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n", " solver = Solver(model, small_data,\n", " num_epochs=n_epochs, batch_size=solver_bsize,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': lr,\n", " },\n", " verbose=False)\n", " solver.train()\n", " \n", " bn_solvers = []\n", " for i in range(len(batch_sizes)):\n", " b_size=batch_sizes[i]\n", " print('Normalization: batch size = ',b_size)\n", " bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=normalization_mode)\n", " bn_solver = Solver(bn_model, small_data,\n", " num_epochs=n_epochs, batch_size=b_size,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': lr,\n", " },\n", " verbose=False)\n", " bn_solver.train()\n", " bn_solvers.append(bn_solver)\n", " \n", " return bn_solvers, solver, batch_sizes\n", "\n", "batch_sizes = [5,10,50]\n", "bn_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('batchnorm')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.subplot(2, 1, 1)\n", "plot_training_history('Training accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n", " lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", "plt.subplot(2, 1, 2)\n", "plot_training_history('Validation accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n", " lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", "\n", "plt.gcf().set_size_inches(15, 10)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dropout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте прямой проход для dropout-слоя в scripts/layers.py\n", "\n", "http://cs231n.github.io/neural-networks-2/#reg" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(500, 500) + 10\n", "\n", "for p in [0.25, 0.4, 0.7]:\n", " out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n", " out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n", "\n", " print('Running tests with p = ', p)\n", " print('Mean of input: ', x.mean())\n", " print('Mean of train-time output: ', out.mean())\n", " print('Mean of test-time output: ', out_test.mean())\n", " print('Fraction of train-time output set to zero: ', (out == 0).mean())\n", " print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте обратный проход для dropout-слоя" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(10, 10) + 10\n", "dout = np.random.randn(*x.shape)\n", "\n", "dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n", "out, cache = dropout_forward(x, dropout_param)\n", "dx = dropout_backward(dout, cache)\n", "dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n", "\n", "# Error should be around e-10 or less\n", "print('dx relative error: ', rel_error(dx, dx_num))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Добавьте в реализацию класса FullyConnectedNet поддержку dropout. Если параметр dropout != 1, то добавьте в модель dropout-слой после каждого слоя активации. Проверьте свою реализацию" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "N, D, H1, H2, C = 2, 15, 20, 30, 10\n", "X = np.random.randn(N, D)\n", "y = np.random.randint(C, size=(N,))\n", "\n", "for dropout in [1, 0.75, 0.5]:\n", " print('Running check with dropout = ', dropout)\n", " model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n", " weight_scale=5e-2, dtype=np.float64,\n", " dropout=dropout, seed=123)\n", "\n", " loss, grads = model.loss(X, y)\n", " print('Initial loss: ', loss)\n", " \n", " # Relative errors should be around e-6 or less; Note that it's fine\n", " # if for dropout=1 you have W2 error be on the order of e-5.\n", " for name in sorted(grads):\n", " f = lambda _: model.loss(X, y)[0]\n", " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n", " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обучите две двухслойные сети с dropout-слоем (вероятность отсева 0,25) и без на наборе из 500 изображений. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train two identical nets, one with dropout and one without\n", "np.random.seed(231)\n", "num_train = 500\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "solvers = {}\n", "dropout_choices = [1, 0.25]\n", "for dropout in dropout_choices:\n", " model = FullyConnectedNet([500], dropout=dropout)\n", " print(dropout)\n", "\n", " solver = Solver(model, small_data,\n", " num_epochs=25, batch_size=100,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': 5e-4,\n", " },\n", " verbose=True, print_every=100)\n", " solver.train()\n", " solvers[dropout] = solver\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot train and validation accuracies of the two models\n", "\n", "train_accs = []\n", "val_accs = []\n", "for dropout in dropout_choices:\n", " solver = solvers[dropout]\n", " train_accs.append(solver.train_acc_history[-1])\n", " val_accs.append(solver.val_acc_history[-1])\n", "\n", "plt.subplot(3, 1, 1)\n", "for dropout in dropout_choices:\n", " plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n", "plt.title('Train accuracy')\n", "plt.xlabel('Epoch')\n", "plt.ylabel('Accuracy')\n", "plt.legend(ncol=2, loc='lower right')\n", " \n", "plt.subplot(3, 1, 2)\n", "for dropout in dropout_choices:\n", " plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n", "plt.title('Val accuracy')\n", "plt.xlabel('Epoch')\n", "plt.ylabel('Accuracy')\n", "plt.legend(ncol=2, loc='lower right')\n", "\n", "plt.gcf().set_size_inches(15, 15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Сверточные нейронные сети (CNN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте прямой проход для сверточного слоя - функция conv_forward_naive в scripts/layers.py юПроверьте свою реализацию, запустив код ниже " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_shape = (2, 3, 4, 4)\n", "w_shape = (3, 3, 4, 4)\n", "x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n", "w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n", "b = np.linspace(-0.1, 0.2, num=3)\n", "\n", "conv_param = {'stride': 2, 'pad': 1}\n", "out, _ = conv_forward_naive(x, w, b, conv_param)\n", "correct_out = np.array([[[[-0.08759809, -0.10987781],\n", " [-0.18387192, -0.2109216 ]],\n", " [[ 0.21027089, 0.21661097],\n", " [ 0.22847626, 0.23004637]],\n", " [[ 0.50813986, 0.54309974],\n", " [ 0.64082444, 0.67101435]]],\n", " [[[-0.98053589, -1.03143541],\n", " [-1.19128892, -1.24695841]],\n", " [[ 0.69108355, 0.66880383],\n", " [ 0.59480972, 0.56776003]],\n", " [[ 2.36270298, 2.36904306],\n", " [ 2.38090835, 2.38247847]]]])\n", "\n", "# Compare your output to ours; difference should be around e-8\n", "print('Testing conv_forward_naive')\n", "print('difference: ', rel_error(out, correct_out))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте обратный проход - функция conv_backward_naive в scripts/layers.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(4, 3, 5, 5)\n", "w = np.random.randn(2, 3, 3, 3)\n", "b = np.random.randn(2,)\n", "dout = np.random.randn(4, 2, 5, 5)\n", "conv_param = {'stride': 1, 'pad': 1}\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n", "dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n", "db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n", "\n", "out, cache = conv_forward_naive(x, w, b, conv_param)\n", "dx, dw, db = conv_backward_naive(dout, cache)\n", "\n", "# Your errors should be around e-8 or less.\n", "print('Testing conv_backward_naive function')\n", "print('dx error: ', rel_error(dx, dx_num))\n", "print('dw error: ', rel_error(dw, dw_num))\n", "print('db error: ', rel_error(db, db_num))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте прямой проход для max-pooling слоя -функция max_pool_forward_naive в scripts/layers.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_shape = (2, 3, 4, 4)\n", "x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n", "pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n", "\n", "out, _ = max_pool_forward_naive(x, pool_param)\n", "\n", "correct_out = np.array([[[[-0.26315789, -0.24842105],\n", " [-0.20421053, -0.18947368]],\n", " [[-0.14526316, -0.13052632],\n", " [-0.08631579, -0.07157895]],\n", " [[-0.02736842, -0.01263158],\n", " [ 0.03157895, 0.04631579]]],\n", " [[[ 0.09052632, 0.10526316],\n", " [ 0.14947368, 0.16421053]],\n", " [[ 0.20842105, 0.22315789],\n", " [ 0.26736842, 0.28210526]],\n", " [[ 0.32631579, 0.34105263],\n", " [ 0.38526316, 0.4 ]]]])\n", "\n", "# Compare your output with ours. Difference should be on the order of e-8.\n", "print('Testing max_pool_forward_naive function:')\n", "print('difference: ', rel_error(out, correct_out))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Реализуйте обратный проход для max-pooling слоя в max_pool_backward_naive . " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "x = np.random.randn(3, 2, 8, 8)\n", "dout = np.random.randn(3, 2, 4, 4)\n", "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n", "\n", "out, cache = max_pool_forward_naive(x, pool_param)\n", "dx = max_pool_backward_naive(dout, cache)\n", "\n", "# Your error should be on the order of e-12\n", "print('Testing max_pool_backward_naive function:')\n", "print('dx error: ', rel_error(dx, dx_num))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В скрипте scripts/fast_layers.py представлены быстрые реализации слоев свертки и пуллинга, написанных с использованием Cython. \n", "\n", "Для компиляции выполните следующую команду в директории scripts\n", "\n", "```bash\n", "python setup.py build_ext --inplace\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сравните ваши реализации слоев свертки и пуллинга с быстрыми реализациями." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Rel errors should be around e-9 or less\n", "from scripts.fast_layers import conv_forward_fast, conv_backward_fast\n", "from time import time\n", "np.random.seed(231)\n", "x = np.random.randn(100, 3, 31, 31)\n", "w = np.random.randn(25, 3, 3, 3)\n", "b = np.random.randn(25,)\n", "dout = np.random.randn(100, 25, 16, 16)\n", "conv_param = {'stride': 2, 'pad': 1}\n", "\n", "t0 = time()\n", "out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n", "t1 = time()\n", "out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n", "t2 = time()\n", "\n", "print('Testing conv_forward_fast:')\n", "print('Naive: %fs' % (t1 - t0))\n", "print('Fast: %fs' % (t2 - t1))\n", "print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", "print('Difference: ', rel_error(out_naive, out_fast))\n", "\n", "t0 = time()\n", "dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n", "t1 = time()\n", "dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n", "t2 = time()\n", "\n", "print('\\nTesting conv_backward_fast:')\n", "print('Naive: %fs' % (t1 - t0))\n", "print('Fast: %fs' % (t2 - t1))\n", "print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", "print('dx difference: ', rel_error(dx_naive, dx_fast))\n", "print('dw difference: ', rel_error(dw_naive, dw_fast))\n", "print('db difference: ', rel_error(db_naive, db_fast))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Relative errors should be close to 0.0\n", "from scripts.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n", "np.random.seed(231)\n", "x = np.random.randn(100, 3, 32, 32)\n", "dout = np.random.randn(100, 3, 16, 16)\n", "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", "\n", "t0 = time()\n", "out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n", "t1 = time()\n", "out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n", "t2 = time()\n", "\n", "print('Testing pool_forward_fast:')\n", "print('Naive: %fs' % (t1 - t0))\n", "print('fast: %fs' % (t2 - t1))\n", "print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", "print('difference: ', rel_error(out_naive, out_fast))\n", "\n", "t0 = time()\n", "dx_naive = max_pool_backward_naive(dout, cache_naive)\n", "t1 = time()\n", "dx_fast = max_pool_backward_fast(dout, cache_fast)\n", "t2 = time()\n", "\n", "print('\\nTesting pool_backward_fast:')\n", "print('Naive: %fs' % (t1 - t0))\n", "print('fast: %fs' % (t2 - t1))\n", "print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", "print('dx difference: ', rel_error(dx_naive, dx_fast))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В layer_utils.py вы можете найти часто используемые комбинации слоев, используемых в сверточных сетях. Ознакомьтесь с ними и запустите код ниже для проверки их работы" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scripts.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n", "np.random.seed(231)\n", "x = np.random.randn(2, 3, 16, 16)\n", "w = np.random.randn(3, 3, 3, 3)\n", "b = np.random.randn(3,)\n", "dout = np.random.randn(2, 3, 8, 8)\n", "conv_param = {'stride': 1, 'pad': 1}\n", "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", "\n", "out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n", "dx, dw, db = conv_relu_pool_backward(dout, cache)\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n", "dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n", "db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n", "\n", "# Relative errors should be around e-8 or less\n", "print('Testing conv_relu_pool')\n", "print('dx error: ', rel_error(dx_num, dx))\n", "print('dw error: ', rel_error(dw_num, dw))\n", "print('db error: ', rel_error(db_num, db))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scripts.layer_utils import conv_relu_forward, conv_relu_backward\n", "np.random.seed(231)\n", "x = np.random.randn(2, 3, 8, 8)\n", "w = np.random.randn(3, 3, 3, 3)\n", "b = np.random.randn(3,)\n", "dout = np.random.randn(2, 3, 8, 8)\n", "conv_param = {'stride': 1, 'pad': 1}\n", "\n", "out, cache = conv_relu_forward(x, w, b, conv_param)\n", "dx, dw, db = conv_relu_backward(dout, cache)\n", "\n", "dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n", "dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n", "db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n", "\n", "# Relative errors should be around e-8 or less\n", "print('Testing conv_relu:')\n", "print('dx error: ', rel_error(dx_num, dx))\n", "print('dw error: ', rel_error(dw_num, dw))\n", "print('db error: ', rel_error(db_num, db))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Напишите реализацию класса ThreeLayerConvNet в scripts/classifiers/cnn.py . Вы можете использовать готовые реализации слоев и их комбинаций." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Проверьте вашу реализацию. Ожидается, что значение функции потерь softmax будет порядка `log(C)` для `C` классов для случая без регуляризации. В случае регуляризации значение функции потерь должно немного возрасти. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ThreeLayerConvNet()\n", "\n", "N = 50\n", "X = np.random.randn(N, 3, 32, 32)\n", "y = np.random.randint(10, size=N)\n", "\n", "loss, grads = model.loss(X, y)\n", "print('Initial loss (no regularization): ', loss)\n", "\n", "model.reg = 0.5\n", "loss, grads = model.loss(X, y)\n", "print('Initial loss (with regularization): ', loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Проверьте реализацию обратного прохода" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_inputs = 2\n", "input_dim = (3, 16, 16)\n", "reg = 0.0\n", "num_classes = 10\n", "np.random.seed(231)\n", "X = np.random.randn(num_inputs, *input_dim)\n", "y = np.random.randint(num_classes, size=num_inputs)\n", "\n", "model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n", " input_dim=input_dim, hidden_dim=7,\n", " dtype=np.float64)\n", "loss, grads = model.loss(X, y)\n", "# Errors should be small, but correct implementations may have\n", "# relative errors up to the order of e-2\n", "for param_name in sorted(grads):\n", " f = lambda _: model.loss(X, y)[0]\n", " param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n", " e = rel_error(param_grad_num, grads[param_name])\n", " print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Попробуйте добиться эффекта переобучения. Обучите модель на небольшом наборе данных.Сравните значения accuracy на обучающих данных и на валидационных. Визуализируйте графики обучения " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(231)\n", "\n", "num_train = 100\n", "small_data = {\n", " 'X_train': data['X_train'][:num_train],\n", " 'y_train': data['y_train'][:num_train],\n", " 'X_val': data['X_val'],\n", " 'y_val': data['y_val'],\n", "}\n", "\n", "model = ThreeLayerConvNet(weight_scale=1e-2)\n", "\n", "solver = Solver(model, small_data,\n", " num_epochs=15, batch_size=50,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': 1e-3,\n", " },\n", " verbose=True, print_every=1)\n", "solver.train()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print final training accuracy\n", "print(\n", " \"Small data training accuracy:\",\n", " solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print final validation accuracy\n", "print(\n", " \"Small data validation accuracy:\",\n", " solver.check_accuracy(small_data['X_val'], small_data['y_val'])\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.subplot(2, 1, 1)\n", "plt.plot(solver.loss_history, 'o')\n", "plt.xlabel('iteration')\n", "plt.ylabel('loss')\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.plot(solver.train_acc_history, '-o')\n", "plt.plot(solver.val_acc_history, '-o')\n", "plt.legend(['train', 'val'], loc='upper left')\n", "plt.xlabel('epoch')\n", "plt.ylabel('accuracy')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обучите сеть на полном наборе данных. Выведите accuracy на обучающей и валидационной выборках" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n", "\n", "solver = Solver(model, data,\n", " num_epochs=1, batch_size=50,\n", " update_rule='adam',\n", " optim_config={\n", " 'learning_rate': 1e-3,\n", " },\n", " verbose=True, print_every=20)\n", "solver.train()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print final training accuracy\n", "print(\n", " \"Full data training accuracy:\",\n", " solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print final validation accuracy\n", "print(\n", " \"Full data validation accuracy:\",\n", " solver.check_accuracy(data['X_val'], data['y_val'])\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Визуализируйте фильтры на первом слое обученной сети" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scripts.vis_utils import visualize_grid\n", "\n", "grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n", "plt.imshow(grid.astype('uint8'))\n", "plt.axis('off')\n", "plt.gcf().set_size_inches(5, 5)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }