DL_Course_SamU/lab_3/assignment3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Лабораторная работа 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1) Полносвязная нейронная сеть ( Fully-Connected Neural Network)\n",
    "\n",
    "2) Нормализация по мини-батчам (Batch normalization)\n",
    "\n",
    "3) Dropout\n",
    "\n",
    "4) Сверточные нейронные сети (Convolutional Networks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Лабораторные работы можно выполнять с использованием сервиса Google Colaboratory (https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d) или на локальном компьютере. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Полносвязная нейронная сеть"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "В данной лабораторной работе необходимо будет реализовать полносвязную нейронную сеть, используя модульный подход. Для каждого  слоя реализации прямого и обратного проходов алгоритма обратного распространения ошибки будут иметь следующий вид:\n",
    "\n",
    "```python\n",
    "def layer_forward(x, w):\n",
    "  \"\"\" Receive inputs x and weights w \"\"\"\n",
    "  # Do some computations ...\n",
    "  z = # ... some intermediate value\n",
    "  # Do some more computations ...\n",
    "  out = # the output\n",
    "   \n",
    "  cache = (x, w, z, out) # Values we need to compute gradients\n",
    "   \n",
    "  return out, cache\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "```python\n",
    "def layer_backward(dout, cache):\n",
    "  \"\"\"\n",
    "  Receive dout (derivative of loss with respect to outputs) and cache,\n",
    "  and compute derivative with respect to inputs.\n",
    "  \"\"\"\n",
    "  # Unpack cache values\n",
    "  x, w, z, out = cache\n",
    "  \n",
    "  # Use values in cache to compute derivatives\n",
    "  dx = # Derivative of loss with respect to x\n",
    "  dw = # Derivative of loss with respect to w\n",
    "  \n",
    "  return dx, dw\n",
    "```\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=========== You can safely ignore the message below if you are NOT working on ConvolutionalNetworks.ipynb ===========\n",
      "\tYou will need to compile a Cython extension for a portion of this assignment.\n",
      "\tThe instructions to do this will be given in a section of the notebook below.\n",
      "\tThere will be an option for Colab users and another for Jupyter (local) users.\n"
     ]
    }
   ],
   "source": [
    "from __future__ import print_function\n",
    "import time\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from scripts.classifiers.fc_net import *\n",
    "\n",
    "from scripts.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",
    "from scripts.solver import Solver\n",
    "from scripts.classifiers.cnn import *\n",
    "from scripts.layers import *\n",
    "from scripts.fast_layers import *\n",
    "\n",
    "\n",
    "%matplotlib inline\n",
    "plt.rcParams['figure.figsize'] = (10.0, 8.0)  \n",
    "plt.rcParams['image.interpolation'] = 'nearest'\n",
    "plt.rcParams['image.cmap'] = 'gray'\n",
    "\n",
    "# for auto-reloading external modules\n",
    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "def rel_error(x, y):\n",
    "  \"\"\" returns relative error \"\"\"\n",
    "  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n",
    "def print_mean_std(x,axis=0):\n",
    "    print('  means: ', x.mean(axis=axis))\n",
    "    print('  stds:  ', x.std(axis=axis))\n",
    "    print() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Загрузите данные из предыдущей лабораторной работы. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для полносвязного слоя реализуйте прямой проход (метод affine_forward в scripts/layers.py). Протестируйте свою реализацию. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_inputs = 2\n",
    "input_shape = (4, 5, 6)\n",
    "output_dim = 3\n",
    "\n",
    "input_size = num_inputs * np.prod(input_shape)\n",
    "weight_size = output_dim * np.prod(input_shape)\n",
    "\n",
    "x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n",
    "w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n",
    "b = np.linspace(-0.3, 0.1, num=output_dim)\n",
    "\n",
    "out, _ = affine_forward(x, w, b)\n",
    "correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],\n",
    "                        [ 3.25553199,  3.5141327,   3.77273342]])\n",
    "\n",
    "\n",
    "print('Testing affine_forward function:')\n",
    "print('difference: ', rel_error(out, correct_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для полносвязного слоя реализуйте обратный проход (метод affine_backward в scripts/layers.py). Протестируйте свою реализацию. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(10, 2, 3)\n",
    "w = np.random.randn(6, 5)\n",
    "b = np.random.randn(5)\n",
    "dout = np.random.randn(10, 5)\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n",
    "dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n",
    "db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n",
    "\n",
    "_, cache = affine_forward(x, w, b)\n",
    "dx, dw, db = affine_backward(dout, cache)\n",
    "\n",
    "print('Testing affine_backward function:')\n",
    "print('dx error: ', rel_error(dx_num, dx))\n",
    "print('dw error: ', rel_error(dw_num, dw))\n",
    "print('db error: ', rel_error(db_num, db))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте прямой проход для слоя активации ReLU (relu_forward) и протестируйте его."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n",
    "\n",
    "out, _ = relu_forward(x)\n",
    "correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],\n",
    "                        [ 0.,          0.,          0.04545455,  0.13636364,],\n",
    "                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])\n",
    "\n",
    "# Compare your output with ours. The error should be on the order of e-8\n",
    "print('Testing relu_forward function:')\n",
    "print('difference: ', rel_error(out, correct_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте обратный проход для слоя активации ReLU (relu_backward ) и протестируйте его."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(10, 10)\n",
    "dout = np.random.randn(*x.shape)\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n",
    "\n",
    "_, cache = relu_forward(x)\n",
    "dx = relu_backward(dout, cache)\n",
    "\n",
    "# The error should be on the order of e-12\n",
    "print('Testing relu_backward function:')\n",
    "print('dx error: ', rel_error(dx_num, dx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "В скрипте /layer_utils.py приведены реализации прямого и обратного проходов для часто используемых комбинаций слоев. Например, за полносвязным слоем часто следует слой активации. Ознакомьтесь с функциями affine_relu_forward и affine_relu_backward, запустите код ниже и убедитесь, что ошибка порядка e-10 или ниже. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.layer_utils import affine_relu_forward, affine_relu_backward\n",
    "np.random.seed(231)\n",
    "x = np.random.randn(2, 3, 4)\n",
    "w = np.random.randn(12, 10)\n",
    "b = np.random.randn(10)\n",
    "dout = np.random.randn(2, 10)\n",
    "\n",
    "out, cache = affine_relu_forward(x, w, b)\n",
    "dx, dw, db = affine_relu_backward(dout, cache)\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n",
    "dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n",
    "db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n",
    "\n",
    "# Relative error should be around e-10 or less\n",
    "print('Testing affine_relu_forward and affine_relu_backward:')\n",
    "print('dx error: ', rel_error(dx_num, dx))\n",
    "print('dw error: ', rel_error(dw_num, dw))\n",
    "print('db error: ', rel_error(db_num, db))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте двухслойную полносвязную сеть - класс TwoLayerNet в scripts/classifiers/fc_net.py . Проверьте свою реализацию, запустив код ниже. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "N, D, H, C = 3, 5, 50, 7\n",
    "X = np.random.randn(N, D)\n",
    "y = np.random.randint(C, size=N)\n",
    "\n",
    "std = 1e-3\n",
    "model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n",
    "\n",
    "print('Testing initialization ... ')\n",
    "W1_std = abs(model.params['W1'].std() - std)\n",
    "b1 = model.params['b1']\n",
    "W2_std = abs(model.params['W2'].std() - std)\n",
    "b2 = model.params['b2']\n",
    "assert W1_std < std / 10, 'First layer weights do not seem right'\n",
    "assert np.all(b1 == 0), 'First layer biases do not seem right'\n",
    "assert W2_std < std / 10, 'Second layer weights do not seem right'\n",
    "assert np.all(b2 == 0), 'Second layer biases do not seem right'\n",
    "\n",
    "print('Testing test-time forward pass ... ')\n",
    "model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n",
    "model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n",
    "model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n",
    "model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n",
    "X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n",
    "scores = model.loss(X)\n",
    "correct_scores = np.asarray(\n",
    "  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],\n",
    "   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],\n",
    "   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])\n",
    "scores_diff = np.abs(scores - correct_scores).sum()\n",
    "assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n",
    "\n",
    "print('Testing training loss (no regularization)')\n",
    "y = np.asarray([0, 5, 1])\n",
    "loss, grads = model.loss(X, y)\n",
    "correct_loss = 3.4702243556\n",
    "assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n",
    "\n",
    "model.reg = 1.0\n",
    "loss, grads = model.loss(X, y)\n",
    "correct_loss = 26.5948426952\n",
    "assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n",
    "\n",
    "# Errors should be around e-7 or less\n",
    "for reg in [0.0, 0.7]:\n",
    "  print('Running numeric gradient check with reg = ', reg)\n",
    "  model.reg = reg\n",
    "  loss, grads = model.loss(X, y)\n",
    "\n",
    "  for name in sorted(grads):\n",
    "    f = lambda _: model.loss(X, y)[0]\n",
    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n",
    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ознакомьтесь с API для обучения и тестирования моделей в scripts/solver.py . Используйте экземпляр класса Solver для обучения двухслойной полносвязной сети. Необходимо достичь минимум 50% верно классифицированных объектов на валидационном наборе. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = TwoLayerNet()\n",
    "solver = None\n",
    "\n",
    "##############################################################################\n",
    "# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least  #\n",
    "# 50% accuracy on the validation set.                                        #\n",
    "##############################################################################\n",
    "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
    "\n",
    "pass\n",
    "\n",
    "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
    "##############################################################################\n",
    "#                             END OF YOUR CODE                               #\n",
    "##############################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplot(2, 1, 1)\n",
    "plt.title('Training loss')\n",
    "plt.plot(solver.loss_history, 'o')\n",
    "plt.xlabel('Iteration')\n",
    "\n",
    "plt.subplot(2, 1, 2)\n",
    "plt.title('Accuracy')\n",
    "plt.plot(solver.train_acc_history, '-o', label='train')\n",
    "plt.plot(solver.val_acc_history, '-o', label='val')\n",
    "plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n",
    "plt.xlabel('Epoch')\n",
    "plt.legend(loc='lower right')\n",
    "plt.gcf().set_size_inches(15, 12)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Теперь реализуйте полносвязную сеть с произвольным числом скрытых слоев. Ознакомьтесь с классом FullyConnectedNet в scripts/classifiers/fc_net.py . Реализуйте инициализацию, прямой и обратный проходы."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
    "X = np.random.randn(N, D)\n",
    "y = np.random.randint(C, size=(N,))\n",
    "\n",
    "for reg in [0, 3.14]:\n",
    "  print('Running check with reg = ', reg)\n",
    "  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
    "                            reg=reg, weight_scale=5e-2, dtype=np.float64)\n",
    "\n",
    "  loss, grads = model.loss(X, y)\n",
    "  print('Initial loss: ', loss)\n",
    "  \n",
    "  # Most of the errors should be on the order of e-7 or smaller.   \n",
    "  # NOTE: It is fine however to see an error for W2 on the order of e-5\n",
    "  # for the check when reg = 0.0\n",
    "  for name in sorted(grads):\n",
    "    f = lambda _: model.loss(X, y)[0]\n",
    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Попробуйте добиться эффекта переобучения на небольшом наборе изображений (например, 50). Используйте трехслойную сеть со 100 нейронами на каждом скрытом слое. Попробуйте переобучить сеть, достигнув 100 % accuracy за 20 эпох. Для этого поэкспериментируйте с параметрами weight_scale и learning_rate. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Use a three-layer Net to overfit 50 training examples by \n",
    "# tweaking just the learning rate and initialization scale.\n",
    "\n",
    "num_train = 50\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "weight_scale = 1e-2   # Experiment with this!\n",
    "learning_rate = 1e-4  # Experiment with this!\n",
    "model = FullyConnectedNet([100, 100],\n",
    "              weight_scale=weight_scale, dtype=np.float64)\n",
    "solver = Solver(model, small_data,\n",
    "                print_every=10, num_epochs=20, batch_size=25,\n",
    "                update_rule='sgd',\n",
    "                optim_config={\n",
    "                  'learning_rate': learning_rate,\n",
    "                }\n",
    "         )\n",
    "solver.train()\n",
    "\n",
    "plt.plot(solver.loss_history, 'o')\n",
    "plt.title('Training loss history')\n",
    "plt.xlabel('Iteration')\n",
    "plt.ylabel('Training loss')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Повторите эксперимент, описанный выше, для пятислойной сети."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Use a five-layer Net to overfit 50 training examples by \n",
    "# tweaking just the learning rate and initialization scale.\n",
    "\n",
    "num_train = 50\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "learning_rate = 2e-3  # Experiment with this!\n",
    "weight_scale = 1e-5   # Experiment with this!\n",
    "model = FullyConnectedNet([100, 100, 100, 100],\n",
    "                weight_scale=weight_scale, dtype=np.float64)\n",
    "solver = Solver(model, small_data,\n",
    "                print_every=10, num_epochs=20, batch_size=25,\n",
    "                update_rule='sgd',\n",
    "                optim_config={\n",
    "                  'learning_rate': learning_rate,\n",
    "                }\n",
    "         )\n",
    "solver.train()\n",
    "\n",
    "plt.plot(solver.loss_history, 'o')\n",
    "plt.title('Training loss history')\n",
    "plt.xlabel('Iteration')\n",
    "plt.ylabel('Training loss')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Сделайте выводы по проведенному эксперименту. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ранее обновление весов проходило по правилу SGD. Теперь попробуйте реализовать стохастический градиентный спуск с импульсом (SGD+momentum). http://cs231n.github.io/neural-networks-3/#sgd Реализуйте sgd_momentum в scripts/optim.py  и запустите проверку. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.optim import sgd_momentum\n",
    "\n",
    "N, D = 4, 5\n",
    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
    "v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
    "\n",
    "config = {'learning_rate': 1e-3, 'velocity': v}\n",
    "next_w, _ = sgd_momentum(w, dw, config=config)\n",
    "\n",
    "expected_next_w = np.asarray([\n",
    "  [ 0.1406,      0.20738947,  0.27417895,  0.34096842,  0.40775789],\n",
    "  [ 0.47454737,  0.54133684,  0.60812632,  0.67491579,  0.74170526],\n",
    "  [ 0.80849474,  0.87528421,  0.94207368,  1.00886316,  1.07565263],\n",
    "  [ 1.14244211,  1.20923158,  1.27602105,  1.34281053,  1.4096    ]])\n",
    "expected_velocity = np.asarray([\n",
    "  [ 0.5406,      0.55475789,  0.56891579, 0.58307368,  0.59723158],\n",
    "  [ 0.61138947,  0.62554737,  0.63970526,  0.65386316,  0.66802105],\n",
    "  [ 0.68217895,  0.69633684,  0.71049474,  0.72465263,  0.73881053],\n",
    "  [ 0.75296842,  0.76712632,  0.78128421,  0.79544211,  0.8096    ]])\n",
    "\n",
    "# Should see relative errors around e-8 or less\n",
    "print('next_w error: ', rel_error(next_w, expected_next_w))\n",
    "print('velocity error: ', rel_error(expected_velocity, config['velocity']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Сравните результаты обучения шестислойной сети, обученной классическим градиентным спуском и адаптивным алгоритмом с импульсом. Какой алгоритм сходится быстрее."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_train = 4000\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "solvers = {}\n",
    "\n",
    "for update_rule in ['sgd', 'sgd_momentum']:\n",
    "  print('running with ', update_rule)\n",
    "  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",
    "\n",
    "  solver = Solver(model, small_data,\n",
    "                  num_epochs=5, batch_size=100,\n",
    "                  update_rule=update_rule,\n",
    "                  optim_config={\n",
    "                    'learning_rate': 5e-3,\n",
    "                  },\n",
    "                  verbose=True)\n",
    "  solvers[update_rule] = solver\n",
    "  solver.train()\n",
    "  print()\n",
    "\n",
    "plt.subplot(3, 1, 1)\n",
    "plt.title('Training loss')\n",
    "plt.xlabel('Iteration')\n",
    "\n",
    "plt.subplot(3, 1, 2)\n",
    "plt.title('Training accuracy')\n",
    "plt.xlabel('Epoch')\n",
    "\n",
    "plt.subplot(3, 1, 3)\n",
    "plt.title('Validation accuracy')\n",
    "plt.xlabel('Epoch')\n",
    "\n",
    "for update_rule, solver in solvers.items():\n",
    "  plt.subplot(3, 1, 1)\n",
    "  plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n",
    "  \n",
    "  plt.subplot(3, 1, 2)\n",
    "  plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n",
    "\n",
    "  plt.subplot(3, 1, 3)\n",
    "  plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n",
    "  \n",
    "for i in [1, 2, 3]:\n",
    "  plt.subplot(3, 1, i)\n",
    "  plt.legend(loc='upper center', ncol=4)\n",
    "plt.gcf().set_size_inches(15, 15)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте алгоритмы RMSProp [1] and Adam [2] с коррекцией смещения  - методы rmsprop и adam . \n",
    "\n",
    "\n",
    "[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n",
    "\n",
    "[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test RMSProp implementation\n",
    "from scripts.optim import rmsprop\n",
    "\n",
    "N, D = 4, 5\n",
    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
    "cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
    "\n",
    "config = {'learning_rate': 1e-2, 'cache': cache}\n",
    "next_w, _ = rmsprop(w, dw, config=config)\n",
    "\n",
    "expected_next_w = np.asarray([\n",
    "  [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n",
    "  [-0.132737,   -0.08078555, -0.02881884,  0.02316247,  0.07515774],\n",
    "  [ 0.12716641,  0.17918792,  0.23122175,  0.28326742,  0.33532447],\n",
    "  [ 0.38739248,  0.43947102,  0.49155973,  0.54365823,  0.59576619]])\n",
    "expected_cache = np.asarray([\n",
    "  [ 0.5976,      0.6126277,   0.6277108,   0.64284931,  0.65804321],\n",
    "  [ 0.67329252,  0.68859723,  0.70395734,  0.71937285,  0.73484377],\n",
    "  [ 0.75037008,  0.7659518,   0.78158892,  0.79728144,  0.81302936],\n",
    "  [ 0.82883269,  0.84469141,  0.86060554,  0.87657507,  0.8926    ]])\n",
    "\n",
    "# You should see relative errors around e-7 or less\n",
    "print('next_w error: ', rel_error(expected_next_w, next_w))\n",
    "print('cache error: ', rel_error(expected_cache, config['cache']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test Adam implementation\n",
    "from scripts.optim import adam\n",
    "\n",
    "N, D = 4, 5\n",
    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
    "m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
    "v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n",
    "\n",
    "config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n",
    "next_w, _ = adam(w, dw, config=config)\n",
    "\n",
    "expected_next_w = np.asarray([\n",
    "  [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n",
    "  [-0.1380274,  -0.08544591, -0.03286534,  0.01971428,  0.0722929],\n",
    "  [ 0.1248705,   0.17744702,  0.23002243,  0.28259667,  0.33516969],\n",
    "  [ 0.38774145,  0.44031188,  0.49288093,  0.54544852,  0.59801459]])\n",
    "expected_v = np.asarray([\n",
    "  [ 0.69966,     0.68908382,  0.67851319,  0.66794809,  0.65738853,],\n",
    "  [ 0.64683452,  0.63628604,  0.6257431,   0.61520571,  0.60467385,],\n",
    "  [ 0.59414753,  0.58362676,  0.57311152,  0.56260183,  0.55209767,],\n",
    "  [ 0.54159906,  0.53110598,  0.52061845,  0.51013645,  0.49966,   ]])\n",
    "expected_m = np.asarray([\n",
    "  [ 0.48,        0.49947368,  0.51894737,  0.53842105,  0.55789474],\n",
    "  [ 0.57736842,  0.59684211,  0.61631579,  0.63578947,  0.65526316],\n",
    "  [ 0.67473684,  0.69421053,  0.71368421,  0.73315789,  0.75263158],\n",
    "  [ 0.77210526,  0.79157895,  0.81105263,  0.83052632,  0.85      ]])\n",
    "\n",
    "# You should see relative errors around e-7 or less\n",
    "print('next_w error: ', rel_error(expected_next_w, next_w))\n",
    "print('v error: ', rel_error(expected_v, config['v']))\n",
    "print('m error: ', rel_error(expected_m, config['m']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Обучите пару глубоких сетей с испольованием RMSProp и Adam алгоритмов обновления весов и сравните результаты обучения."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Получите лучшую полносвязную сеть для классификации вашего набора данных. На наборе CIFAR-10 необходимо получить accuracy не ниже 50 % на валидационном наборе."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_model = None\n",
    "################################################################################\n",
    "# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might   #\n",
    "# find batch/layer normalization and dropout useful. Store your best model in  #\n",
    "# the best_model variable.                                                     #\n",
    "################################################################################\n",
    "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
    "\n",
    "pass\n",
    "\n",
    "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
    "################################################################################\n",
    "#                              END OF YOUR CODE                                #\n",
    "################################################################################"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Получите оценку accuracy для валидационной и тестовой выборок. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n",
    "y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n",
    "print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n",
    "print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Нормализация по мини-батчам\n",
    "\n",
    "Идея нормализации по мини-батчам предложена в работе [1]\n",
    "\n",
    "[1] Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\", ICML 2015."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте прямой проход для слоя батч-нормализации - функция batchnorm_forward в scripts/layers.py . Проверьте свою реализацию, запустив следующий код:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the training-time forward pass by checking means and variances\n",
    "# of features both before and after batch normalization   \n",
    "\n",
    "# Simulate the forward pass for a two-layer network\n",
    "np.random.seed(231)\n",
    "N, D1, D2, D3 = 200, 50, 60, 3\n",
    "X = np.random.randn(N, D1)\n",
    "W1 = np.random.randn(D1, D2)\n",
    "W2 = np.random.randn(D2, D3)\n",
    "a = np.maximum(0, X.dot(W1)).dot(W2)\n",
    "\n",
    "print('Before batch normalization:')\n",
    "print_mean_std(a,axis=0)\n",
    "\n",
    "gamma = np.ones((D3,))\n",
    "beta = np.zeros((D3,))\n",
    "# Means should be close to zero and stds close to one\n",
    "print('After batch normalization (gamma=1, beta=0)')\n",
    "a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
    "print_mean_std(a_norm,axis=0)\n",
    "\n",
    "gamma = np.asarray([1.0, 2.0, 3.0])\n",
    "beta = np.asarray([11.0, 12.0, 13.0])\n",
    "# Now means should be close to beta and stds close to gamma\n",
    "print('After batch normalization (gamma=', gamma, ', beta=', beta, ')')\n",
    "a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
    "print_mean_std(a_norm,axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the test-time forward pass by running the training-time\n",
    "# forward pass many times to warm up the running averages, and then\n",
    "# checking the means and variances of activations after a test-time\n",
    "# forward pass.\n",
    "\n",
    "np.random.seed(231)\n",
    "N, D1, D2, D3 = 200, 50, 60, 3\n",
    "W1 = np.random.randn(D1, D2)\n",
    "W2 = np.random.randn(D2, D3)\n",
    "\n",
    "bn_param = {'mode': 'train'}\n",
    "gamma = np.ones(D3)\n",
    "beta = np.zeros(D3)\n",
    "\n",
    "for t in range(50):\n",
    "  X = np.random.randn(N, D1)\n",
    "  a = np.maximum(0, X.dot(W1)).dot(W2)\n",
    "  batchnorm_forward(a, gamma, beta, bn_param)\n",
    "\n",
    "bn_param['mode'] = 'test'\n",
    "X = np.random.randn(N, D1)\n",
    "a = np.maximum(0, X.dot(W1)).dot(W2)\n",
    "a_norm, _ = batchnorm_forward(a, gamma, beta, bn_param)\n",
    "\n",
    "# Means should be close to zero and stds close to one, but will be\n",
    "# noisier than training-time forward passes.\n",
    "print('After batch normalization (test-time):')\n",
    "print_mean_std(a_norm,axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте обратный проход в функции batchnorm_backward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Gradient check batchnorm backward pass\n",
    "np.random.seed(231)\n",
    "N, D = 4, 5\n",
    "x = 5 * np.random.randn(N, D) + 12\n",
    "gamma = np.random.randn(D)\n",
    "beta = np.random.randn(D)\n",
    "dout = np.random.randn(N, D)\n",
    "\n",
    "bn_param = {'mode': 'train'}\n",
    "fx = lambda x: batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
    "fg = lambda a: batchnorm_forward(x, a, beta, bn_param)[0]\n",
    "fb = lambda b: batchnorm_forward(x, gamma, b, bn_param)[0]\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(fx, x, dout)\n",
    "da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n",
    "db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n",
    "\n",
    "_, cache = batchnorm_forward(x, gamma, beta, bn_param)\n",
    "dx, dgamma, dbeta = batchnorm_backward(dout, cache)\n",
    "#You should expect to see relative errors between 1e-13 and 1e-8\n",
    "print('dx error: ', rel_error(dx_num, dx))\n",
    "print('dgamma error: ', rel_error(da_num, dgamma))\n",
    "print('dbeta error: ', rel_error(db_num, dbeta))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Измените реализацию класса FullyConnectedNet, добавив батч-нормализацию. \n",
    "Если флаг normalization == \"batchnorm\", то вам необходимо вставить слой батч-нормализации перед каждым слоем активации ReLU, кроме выхода сети. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
    "X = np.random.randn(N, D)\n",
    "y = np.random.randint(C, size=(N,))\n",
    "\n",
    "# You should expect losses between 1e-4~1e-10 for W, \n",
    "# losses between 1e-08~1e-10 for b,\n",
    "# and losses between 1e-08~1e-09 for beta and gammas.\n",
    "for reg in [0, 3.14]:\n",
    "  print('Running check with reg = ', reg)\n",
    "  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
    "                            reg=reg, weight_scale=5e-2, dtype=np.float64,\n",
    "                            normalization='batchnorm')\n",
    "\n",
    "  loss, grads = model.loss(X, y)\n",
    "  print('Initial loss: ', loss)\n",
    "\n",
    "  for name in sorted(grads):\n",
    "    f = lambda _: model.loss(X, y)[0]\n",
    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
    "  if reg == 0: print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Обучите 6-ти слойную сеть на наборе из 1000 изображений с батч-нормализацией и без нее"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "# Try training a very deep net with batchnorm\n",
    "hidden_dims = [100, 100, 100, 100, 100]\n",
    "\n",
    "num_train = 1000\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "weight_scale = 2e-2\n",
    "bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n",
    "model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
    "\n",
    "print('Solver with batch norm:')\n",
    "bn_solver = Solver(bn_model, small_data,\n",
    "                num_epochs=10, batch_size=50,\n",
    "                update_rule='adam',\n",
    "                optim_config={\n",
    "                  'learning_rate': 1e-3,\n",
    "                },\n",
    "                verbose=True,print_every=20)\n",
    "bn_solver.train()\n",
    "\n",
    "print('\\nSolver without batch norm:')\n",
    "solver = Solver(model, small_data,\n",
    "                num_epochs=10, batch_size=50,\n",
    "                update_rule='adam',\n",
    "                optim_config={\n",
    "                  'learning_rate': 1e-3,\n",
    "                },\n",
    "                verbose=True, print_every=20)\n",
    "solver.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Визуализируйте процесс обучения для двух сетей. Увеличилась ли скорость сходимости в случае с батч-нормализацией? Сделайте выводы. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_training_history(title, label, baseline, bn_solvers, plot_fn, bl_marker='.', bn_marker='.', labels=None):\n",
    "    \"\"\"utility function for plotting training history\"\"\"\n",
    "    plt.title(title)\n",
    "    plt.xlabel(label)\n",
    "    bn_plots = [plot_fn(bn_solver) for bn_solver in bn_solvers]\n",
    "    bl_plot = plot_fn(baseline)\n",
    "    num_bn = len(bn_plots)\n",
    "    for i in range(num_bn):\n",
    "        label='with_norm'\n",
    "        if labels is not None:\n",
    "            label += str(labels[i])\n",
    "        plt.plot(bn_plots[i], bn_marker, label=label)\n",
    "    label='baseline'\n",
    "    if labels is not None:\n",
    "        label += str(labels[0])\n",
    "    plt.plot(bl_plot, bl_marker, label=label)\n",
    "    plt.legend(loc='lower center', ncol=num_bn+1) \n",
    "\n",
    "    \n",
    "plt.subplot(3, 1, 1)\n",
    "plot_training_history('Training loss','Iteration', solver, [bn_solver], \\\n",
    "                      lambda x: x.loss_history, bl_marker='o', bn_marker='o')\n",
    "plt.subplot(3, 1, 2)\n",
    "plot_training_history('Training accuracy','Epoch', solver, [bn_solver], \\\n",
    "                      lambda x: x.train_acc_history, bl_marker='-o', bn_marker='-o')\n",
    "plt.subplot(3, 1, 3)\n",
    "plot_training_history('Validation accuracy','Epoch', solver, [bn_solver], \\\n",
    "                      lambda x: x.val_acc_history, bl_marker='-o', bn_marker='-o')\n",
    "\n",
    "plt.gcf().set_size_inches(15, 15)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Обучите 6-тислойную сеть с батч-нормализацией и без нее, используя разные размеры батча. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_batchsize_experiments(normalization_mode):\n",
    "    np.random.seed(231)\n",
    "    # Try training a very deep net with batchnorm\n",
    "    hidden_dims = [100, 100, 100, 100, 100]\n",
    "    num_train = 1000\n",
    "    small_data = {\n",
    "      'X_train': data['X_train'][:num_train],\n",
    "      'y_train': data['y_train'][:num_train],\n",
    "      'X_val': data['X_val'],\n",
    "      'y_val': data['y_val'],\n",
    "    }\n",
    "    n_epochs=10\n",
    "    weight_scale = 2e-2\n",
    "    batch_sizes = [5,10,50]\n",
    "    lr = 10**(-3.5)\n",
    "    solver_bsize = batch_sizes[0]\n",
    "\n",
    "    print('No normalization: batch size = ',solver_bsize)\n",
    "    model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
    "    solver = Solver(model, small_data,\n",
    "                    num_epochs=n_epochs, batch_size=solver_bsize,\n",
    "                    update_rule='adam',\n",
    "                    optim_config={\n",
    "                      'learning_rate': lr,\n",
    "                    },\n",
    "                    verbose=False)\n",
    "    solver.train()\n",
    "    \n",
    "    bn_solvers = []\n",
    "    for i in range(len(batch_sizes)):\n",
    "        b_size=batch_sizes[i]\n",
    "        print('Normalization: batch size = ',b_size)\n",
    "        bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=normalization_mode)\n",
    "        bn_solver = Solver(bn_model, small_data,\n",
    "                        num_epochs=n_epochs, batch_size=b_size,\n",
    "                        update_rule='adam',\n",
    "                        optim_config={\n",
    "                          'learning_rate': lr,\n",
    "                        },\n",
    "                        verbose=False)\n",
    "        bn_solver.train()\n",
    "        bn_solvers.append(bn_solver)\n",
    "        \n",
    "    return bn_solvers, solver, batch_sizes\n",
    "\n",
    "batch_sizes = [5,10,50]\n",
    "bn_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('batchnorm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplot(2, 1, 1)\n",
    "plot_training_history('Training accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
    "                      lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
    "plt.subplot(2, 1, 2)\n",
    "plot_training_history('Validation accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
    "                      lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
    "\n",
    "plt.gcf().set_size_inches(15, 10)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dropout"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте прямой проход для dropout-слоя в scripts/layers.py\n",
    "\n",
    "http://cs231n.github.io/neural-networks-2/#reg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(500, 500) + 10\n",
    "\n",
    "for p in [0.25, 0.4, 0.7]:\n",
    "  out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n",
    "  out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n",
    "\n",
    "  print('Running tests with p = ', p)\n",
    "  print('Mean of input: ', x.mean())\n",
    "  print('Mean of train-time output: ', out.mean())\n",
    "  print('Mean of test-time output: ', out_test.mean())\n",
    "  print('Fraction of train-time output set to zero: ', (out == 0).mean())\n",
    "  print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n",
    "  print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте обратный проход для dropout-слоя"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(10, 10) + 10\n",
    "dout = np.random.randn(*x.shape)\n",
    "\n",
    "dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n",
    "out, cache = dropout_forward(x, dropout_param)\n",
    "dx = dropout_backward(dout, cache)\n",
    "dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n",
    "\n",
    "# Error should be around e-10 or less\n",
    "print('dx relative error: ', rel_error(dx, dx_num))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Добавьте в реализацию класса FullyConnectedNet поддержку dropout. Если параметр dropout != 1, то добавьте в модель dropout-слой после каждого слоя активации. Проверьте свою реализацию"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
    "X = np.random.randn(N, D)\n",
    "y = np.random.randint(C, size=(N,))\n",
    "\n",
    "for dropout in [1, 0.75, 0.5]:\n",
    "  print('Running check with dropout = ', dropout)\n",
    "  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
    "                            weight_scale=5e-2, dtype=np.float64,\n",
    "                            dropout=dropout, seed=123)\n",
    "\n",
    "  loss, grads = model.loss(X, y)\n",
    "  print('Initial loss: ', loss)\n",
    "  \n",
    "  # Relative errors should be around e-6 or less; Note that it's fine\n",
    "  # if for dropout=1 you have W2 error be on the order of e-5.\n",
    "  for name in sorted(grads):\n",
    "    f = lambda _: model.loss(X, y)[0]\n",
    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
    "  print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Обучите две двухслойные сети с dropout-слоем (вероятность отсева 0,25) и без на наборе из 500 изображений. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train two identical nets, one with dropout and one without\n",
    "np.random.seed(231)\n",
    "num_train = 500\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "solvers = {}\n",
    "dropout_choices = [1, 0.25]\n",
    "for dropout in dropout_choices:\n",
    "  model = FullyConnectedNet([500], dropout=dropout)\n",
    "  print(dropout)\n",
    "\n",
    "  solver = Solver(model, small_data,\n",
    "                  num_epochs=25, batch_size=100,\n",
    "                  update_rule='adam',\n",
    "                  optim_config={\n",
    "                    'learning_rate': 5e-4,\n",
    "                  },\n",
    "                  verbose=True, print_every=100)\n",
    "  solver.train()\n",
    "  solvers[dropout] = solver\n",
    "  print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot train and validation accuracies of the two models\n",
    "\n",
    "train_accs = []\n",
    "val_accs = []\n",
    "for dropout in dropout_choices:\n",
    "  solver = solvers[dropout]\n",
    "  train_accs.append(solver.train_acc_history[-1])\n",
    "  val_accs.append(solver.val_acc_history[-1])\n",
    "\n",
    "plt.subplot(3, 1, 1)\n",
    "for dropout in dropout_choices:\n",
    "  plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n",
    "plt.title('Train accuracy')\n",
    "plt.xlabel('Epoch')\n",
    "plt.ylabel('Accuracy')\n",
    "plt.legend(ncol=2, loc='lower right')\n",
    "  \n",
    "plt.subplot(3, 1, 2)\n",
    "for dropout in dropout_choices:\n",
    "  plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n",
    "plt.title('Val accuracy')\n",
    "plt.xlabel('Epoch')\n",
    "plt.ylabel('Accuracy')\n",
    "plt.legend(ncol=2, loc='lower right')\n",
    "\n",
    "plt.gcf().set_size_inches(15, 15)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Сверточные нейронные сети (CNN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте прямой проход для сверточного слоя - функция conv_forward_naive в scripts/layers.py юПроверьте свою реализацию, запустив код ниже "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_shape = (2, 3, 4, 4)\n",
    "w_shape = (3, 3, 4, 4)\n",
    "x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n",
    "w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n",
    "b = np.linspace(-0.1, 0.2, num=3)\n",
    "\n",
    "conv_param = {'stride': 2, 'pad': 1}\n",
    "out, _ = conv_forward_naive(x, w, b, conv_param)\n",
    "correct_out = np.array([[[[-0.08759809, -0.10987781],\n",
    "                           [-0.18387192, -0.2109216 ]],\n",
    "                          [[ 0.21027089,  0.21661097],\n",
    "                           [ 0.22847626,  0.23004637]],\n",
    "                          [[ 0.50813986,  0.54309974],\n",
    "                           [ 0.64082444,  0.67101435]]],\n",
    "                         [[[-0.98053589, -1.03143541],\n",
    "                           [-1.19128892, -1.24695841]],\n",
    "                          [[ 0.69108355,  0.66880383],\n",
    "                           [ 0.59480972,  0.56776003]],\n",
    "                          [[ 2.36270298,  2.36904306],\n",
    "                           [ 2.38090835,  2.38247847]]]])\n",
    "\n",
    "# Compare your output to ours; difference should be around e-8\n",
    "print('Testing conv_forward_naive')\n",
    "print('difference: ', rel_error(out, correct_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте обратный проход - функция conv_backward_naive в scripts/layers.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(4, 3, 5, 5)\n",
    "w = np.random.randn(2, 3, 3, 3)\n",
    "b = np.random.randn(2,)\n",
    "dout = np.random.randn(4, 2, 5, 5)\n",
    "conv_param = {'stride': 1, 'pad': 1}\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n",
    "dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n",
    "db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n",
    "\n",
    "out, cache = conv_forward_naive(x, w, b, conv_param)\n",
    "dx, dw, db = conv_backward_naive(dout, cache)\n",
    "\n",
    "# Your errors should be around e-8 or less.\n",
    "print('Testing conv_backward_naive function')\n",
    "print('dx error: ', rel_error(dx, dx_num))\n",
    "print('dw error: ', rel_error(dw, dw_num))\n",
    "print('db error: ', rel_error(db, db_num))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте прямой проход для max-pooling слоя -функция  max_pool_forward_naive в scripts/layers.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_shape = (2, 3, 4, 4)\n",
    "x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n",
    "pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n",
    "\n",
    "out, _ = max_pool_forward_naive(x, pool_param)\n",
    "\n",
    "correct_out = np.array([[[[-0.26315789, -0.24842105],\n",
    "                          [-0.20421053, -0.18947368]],\n",
    "                         [[-0.14526316, -0.13052632],\n",
    "                          [-0.08631579, -0.07157895]],\n",
    "                         [[-0.02736842, -0.01263158],\n",
    "                          [ 0.03157895,  0.04631579]]],\n",
    "                        [[[ 0.09052632,  0.10526316],\n",
    "                          [ 0.14947368,  0.16421053]],\n",
    "                         [[ 0.20842105,  0.22315789],\n",
    "                          [ 0.26736842,  0.28210526]],\n",
    "                         [[ 0.32631579,  0.34105263],\n",
    "                          [ 0.38526316,  0.4       ]]]])\n",
    "\n",
    "# Compare your output with ours. Difference should be on the order of e-8.\n",
    "print('Testing max_pool_forward_naive function:')\n",
    "print('difference: ', rel_error(out, correct_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Реализуйте обратный проход для max-pooling слоя в max_pool_backward_naive . "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "x = np.random.randn(3, 2, 8, 8)\n",
    "dout = np.random.randn(3, 2, 4, 4)\n",
    "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n",
    "\n",
    "out, cache = max_pool_forward_naive(x, pool_param)\n",
    "dx = max_pool_backward_naive(dout, cache)\n",
    "\n",
    "# Your error should be on the order of e-12\n",
    "print('Testing max_pool_backward_naive function:')\n",
    "print('dx error: ', rel_error(dx, dx_num))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "В скрипте scripts/fast_layers.py представлены быстрые реализации слоев свертки и пуллинга, написанных с использованием  Cython. \n",
    "\n",
    "Для компиляции выполните следующую команду в директории scripts\n",
    "\n",
    "```bash\n",
    "python setup.py build_ext --inplace\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Сравните ваши реализации слоев свертки и пуллинга с быстрыми реализациями."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rel errors should be around e-9 or less\n",
    "from scripts.fast_layers import conv_forward_fast, conv_backward_fast\n",
    "from time import time\n",
    "np.random.seed(231)\n",
    "x = np.random.randn(100, 3, 31, 31)\n",
    "w = np.random.randn(25, 3, 3, 3)\n",
    "b = np.random.randn(25,)\n",
    "dout = np.random.randn(100, 25, 16, 16)\n",
    "conv_param = {'stride': 2, 'pad': 1}\n",
    "\n",
    "t0 = time()\n",
    "out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n",
    "t1 = time()\n",
    "out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n",
    "t2 = time()\n",
    "\n",
    "print('Testing conv_forward_fast:')\n",
    "print('Naive: %fs' % (t1 - t0))\n",
    "print('Fast: %fs' % (t2 - t1))\n",
    "print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
    "print('Difference: ', rel_error(out_naive, out_fast))\n",
    "\n",
    "t0 = time()\n",
    "dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n",
    "t1 = time()\n",
    "dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n",
    "t2 = time()\n",
    "\n",
    "print('\\nTesting conv_backward_fast:')\n",
    "print('Naive: %fs' % (t1 - t0))\n",
    "print('Fast: %fs' % (t2 - t1))\n",
    "print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
    "print('dx difference: ', rel_error(dx_naive, dx_fast))\n",
    "print('dw difference: ', rel_error(dw_naive, dw_fast))\n",
    "print('db difference: ', rel_error(db_naive, db_fast))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Relative errors should be close to 0.0\n",
    "from scripts.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n",
    "np.random.seed(231)\n",
    "x = np.random.randn(100, 3, 32, 32)\n",
    "dout = np.random.randn(100, 3, 16, 16)\n",
    "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
    "\n",
    "t0 = time()\n",
    "out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n",
    "t1 = time()\n",
    "out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n",
    "t2 = time()\n",
    "\n",
    "print('Testing pool_forward_fast:')\n",
    "print('Naive: %fs' % (t1 - t0))\n",
    "print('fast: %fs' % (t2 - t1))\n",
    "print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
    "print('difference: ', rel_error(out_naive, out_fast))\n",
    "\n",
    "t0 = time()\n",
    "dx_naive = max_pool_backward_naive(dout, cache_naive)\n",
    "t1 = time()\n",
    "dx_fast = max_pool_backward_fast(dout, cache_fast)\n",
    "t2 = time()\n",
    "\n",
    "print('\\nTesting pool_backward_fast:')\n",
    "print('Naive: %fs' % (t1 - t0))\n",
    "print('fast: %fs' % (t2 - t1))\n",
    "print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
    "print('dx difference: ', rel_error(dx_naive, dx_fast))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "В layer_utils.py вы можете найти  часто используемые комбинации слоев, используемых в сверточных сетях. Ознакомьтесь с ними и запустите код ниже для проверки их работы"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n",
    "np.random.seed(231)\n",
    "x = np.random.randn(2, 3, 16, 16)\n",
    "w = np.random.randn(3, 3, 3, 3)\n",
    "b = np.random.randn(3,)\n",
    "dout = np.random.randn(2, 3, 8, 8)\n",
    "conv_param = {'stride': 1, 'pad': 1}\n",
    "pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
    "\n",
    "out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n",
    "dx, dw, db = conv_relu_pool_backward(dout, cache)\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n",
    "dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n",
    "db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n",
    "\n",
    "# Relative errors should be around e-8 or less\n",
    "print('Testing conv_relu_pool')\n",
    "print('dx error: ', rel_error(dx_num, dx))\n",
    "print('dw error: ', rel_error(dw_num, dw))\n",
    "print('db error: ', rel_error(db_num, db))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.layer_utils import conv_relu_forward, conv_relu_backward\n",
    "np.random.seed(231)\n",
    "x = np.random.randn(2, 3, 8, 8)\n",
    "w = np.random.randn(3, 3, 3, 3)\n",
    "b = np.random.randn(3,)\n",
    "dout = np.random.randn(2, 3, 8, 8)\n",
    "conv_param = {'stride': 1, 'pad': 1}\n",
    "\n",
    "out, cache = conv_relu_forward(x, w, b, conv_param)\n",
    "dx, dw, db = conv_relu_backward(dout, cache)\n",
    "\n",
    "dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n",
    "dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n",
    "db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n",
    "\n",
    "# Relative errors should be around e-8 or less\n",
    "print('Testing conv_relu:')\n",
    "print('dx error: ', rel_error(dx_num, dx))\n",
    "print('dw error: ', rel_error(dw_num, dw))\n",
    "print('db error: ', rel_error(db_num, db))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Напишите реализацию класса ThreeLayerConvNet в scripts/classifiers/cnn.py . Вы можете использовать готовые реализации слоев и их комбинаций."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Проверьте вашу реализацию. Ожидается, что значение функции потерь softmax будет порядка `log(C)` для `C` классов для случая без регуляризации. В случае регуляризации значение функции потерь должно немного возрасти. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ThreeLayerConvNet()\n",
    "\n",
    "N = 50\n",
    "X = np.random.randn(N, 3, 32, 32)\n",
    "y = np.random.randint(10, size=N)\n",
    "\n",
    "loss, grads = model.loss(X, y)\n",
    "print('Initial loss (no regularization): ', loss)\n",
    "\n",
    "model.reg = 0.5\n",
    "loss, grads = model.loss(X, y)\n",
    "print('Initial loss (with regularization): ', loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Проверьте реализацию обратного прохода"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_inputs = 2\n",
    "input_dim = (3, 16, 16)\n",
    "reg = 0.0\n",
    "num_classes = 10\n",
    "np.random.seed(231)\n",
    "X = np.random.randn(num_inputs, *input_dim)\n",
    "y = np.random.randint(num_classes, size=num_inputs)\n",
    "\n",
    "model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n",
    "                          input_dim=input_dim, hidden_dim=7,\n",
    "                          dtype=np.float64)\n",
    "loss, grads = model.loss(X, y)\n",
    "# Errors should be small, but correct implementations may have\n",
    "# relative errors up to the order of e-2\n",
    "for param_name in sorted(grads):\n",
    "    f = lambda _: model.loss(X, y)[0]\n",
    "    param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n",
    "    e = rel_error(param_grad_num, grads[param_name])\n",
    "    print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Попробуйте добиться эффекта переобучения. Обучите модель на небольшом наборе данных.Сравните значения accuracy на обучающих данных и на валидационных. Визуализируйте графики обучения "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(231)\n",
    "\n",
    "num_train = 100\n",
    "small_data = {\n",
    "  'X_train': data['X_train'][:num_train],\n",
    "  'y_train': data['y_train'][:num_train],\n",
    "  'X_val': data['X_val'],\n",
    "  'y_val': data['y_val'],\n",
    "}\n",
    "\n",
    "model = ThreeLayerConvNet(weight_scale=1e-2)\n",
    "\n",
    "solver = Solver(model, small_data,\n",
    "                num_epochs=15, batch_size=50,\n",
    "                update_rule='adam',\n",
    "                optim_config={\n",
    "                  'learning_rate': 1e-3,\n",
    "                },\n",
    "                verbose=True, print_every=1)\n",
    "solver.train()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print final training accuracy\n",
    "print(\n",
    "    \"Small data training accuracy:\",\n",
    "    solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print final validation accuracy\n",
    "print(\n",
    "    \"Small data validation accuracy:\",\n",
    "    solver.check_accuracy(small_data['X_val'], small_data['y_val'])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplot(2, 1, 1)\n",
    "plt.plot(solver.loss_history, 'o')\n",
    "plt.xlabel('iteration')\n",
    "plt.ylabel('loss')\n",
    "\n",
    "plt.subplot(2, 1, 2)\n",
    "plt.plot(solver.train_acc_history, '-o')\n",
    "plt.plot(solver.val_acc_history, '-o')\n",
    "plt.legend(['train', 'val'], loc='upper left')\n",
    "plt.xlabel('epoch')\n",
    "plt.ylabel('accuracy')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Обучите сеть на полном наборе данных. Выведите accuracy на обучающей и валидационной выборках"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n",
    "\n",
    "solver = Solver(model, data,\n",
    "                num_epochs=1, batch_size=50,\n",
    "                update_rule='adam',\n",
    "                optim_config={\n",
    "                  'learning_rate': 1e-3,\n",
    "                },\n",
    "                verbose=True, print_every=20)\n",
    "solver.train()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print final training accuracy\n",
    "print(\n",
    "    \"Full data training accuracy:\",\n",
    "    solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print final validation accuracy\n",
    "print(\n",
    "    \"Full data validation accuracy:\",\n",
    "    solver.check_accuracy(data['X_val'], data['y_val'])\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Визуализируйте фильтры на первом слое обученной сети"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.vis_utils import visualize_grid\n",
    "\n",
    "grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n",
    "plt.imshow(grid.astype('uint8'))\n",
    "plt.axis('off')\n",
    "plt.gcf().set_size_inches(5, 5)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}