You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DL_Course_SamU/lab_3/assignment3.ipynb

1872 lines
69 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1) Полносвязная нейронная сеть ( Fully-Connected Neural Network)\n",
"\n",
"2) Нормализация по мини-батчам (Batch normalization)\n",
"\n",
"3) Dropout\n",
"\n",
"4) Сверточные нейронные сети (Convolutional Networks)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Лабораторные работы можно выполнять с использованием сервиса Google Colaboratory (https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d) или на локальном компьютере. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Полносвязная нейронная сеть"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В данной лабораторной работе необходимо будет реализовать полносвязную нейронную сеть, используя модульный подход. Для каждого слоя реализации прямого и обратного проходов алгоритма обратного распространения ошибки будут иметь следующий вид:\n",
"\n",
"```python\n",
"def layer_forward(x, w):\n",
" \"\"\" Receive inputs x and weights w \"\"\"\n",
" # Do some computations ...\n",
" z = # ... some intermediate value\n",
" # Do some more computations ...\n",
" out = # the output\n",
" \n",
" cache = (x, w, z, out) # Values we need to compute gradients\n",
" \n",
" return out, cache\n",
"```\n",
"\n",
"\n",
"\n",
"```python\n",
"def layer_backward(dout, cache):\n",
" \"\"\"\n",
" Receive dout (derivative of loss with respect to outputs) and cache,\n",
" and compute derivative with respect to inputs.\n",
" \"\"\"\n",
" # Unpack cache values\n",
" x, w, z, out = cache\n",
" \n",
" # Use values in cache to compute derivatives\n",
" dx = # Derivative of loss with respect to x\n",
" dw = # Derivative of loss with respect to w\n",
" \n",
" return dx, dw\n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=========== You can safely ignore the message below if you are NOT working on ConvolutionalNetworks.ipynb ===========\n",
"\tYou will need to compile a Cython extension for a portion of this assignment.\n",
"\tThe instructions to do this will be given in a section of the notebook below.\n",
"\tThere will be an option for Colab users and another for Jupyter (local) users.\n"
]
}
],
"source": [
"from __future__ import print_function\n",
"import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from scripts.classifiers.fc_net import *\n",
"\n",
"from scripts.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",
"from scripts.solver import Solver\n",
"from scripts.classifiers.cnn import *\n",
"from scripts.layers import *\n",
"from scripts.fast_layers import *\n",
"\n",
"\n",
"%matplotlib inline\n",
"plt.rcParams['figure.figsize'] = (10.0, 8.0) \n",
"plt.rcParams['image.interpolation'] = 'nearest'\n",
"plt.rcParams['image.cmap'] = 'gray'\n",
"\n",
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"def rel_error(x, y):\n",
" \"\"\" returns relative error \"\"\"\n",
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n",
"def print_mean_std(x,axis=0):\n",
" print(' means: ', x.mean(axis=axis))\n",
" print(' stds: ', x.std(axis=axis))\n",
" print() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Загрузите данные из предыдущей лабораторной работы. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для полносвязного слоя реализуйте прямой проход (метод affine_forward в scripts/layers.py). Протестируйте свою реализацию. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_inputs = 2\n",
"input_shape = (4, 5, 6)\n",
"output_dim = 3\n",
"\n",
"input_size = num_inputs * np.prod(input_shape)\n",
"weight_size = output_dim * np.prod(input_shape)\n",
"\n",
"x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n",
"w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n",
"b = np.linspace(-0.3, 0.1, num=output_dim)\n",
"\n",
"out, _ = affine_forward(x, w, b)\n",
"correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],\n",
" [ 3.25553199, 3.5141327, 3.77273342]])\n",
"\n",
"\n",
"print('Testing affine_forward function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для полносвязного слоя реализуйте обратный проход (метод affine_backward в scripts/layers.py). Протестируйте свою реализацию. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 2, 3)\n",
"w = np.random.randn(6, 5)\n",
"b = np.random.randn(5)\n",
"dout = np.random.randn(10, 5)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n",
"\n",
"_, cache = affine_forward(x, w, b)\n",
"dx, dw, db = affine_backward(dout, cache)\n",
"\n",
"print('Testing affine_backward function:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для слоя активации ReLU (relu_forward) и протестируйте его."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n",
"\n",
"out, _ = relu_forward(x)\n",
"correct_out = np.array([[ 0., 0., 0., 0., ],\n",
" [ 0., 0., 0.04545455, 0.13636364,],\n",
" [ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])\n",
"\n",
"# Compare your output with ours. The error should be on the order of e-8\n",
"print('Testing relu_forward function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для слоя активации ReLU (relu_backward ) и протестируйте его."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 10)\n",
"dout = np.random.randn(*x.shape)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n",
"\n",
"_, cache = relu_forward(x)\n",
"dx = relu_backward(dout, cache)\n",
"\n",
"# The error should be on the order of e-12\n",
"print('Testing relu_backward function:')\n",
"print('dx error: ', rel_error(dx_num, dx))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В скрипте /layer_utils.py приведены реализации прямого и обратного проходов для часто используемых комбинаций слоев. Например, за полносвязным слоем часто следует слой активации. Ознакомьтесь с функциями affine_relu_forward и affine_relu_backward, запустите код ниже и убедитесь, что ошибка порядка e-10 или ниже. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import affine_relu_forward, affine_relu_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 4)\n",
"w = np.random.randn(12, 10)\n",
"b = np.random.randn(10)\n",
"dout = np.random.randn(2, 10)\n",
"\n",
"out, cache = affine_relu_forward(x, w, b)\n",
"dx, dw, db = affine_relu_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n",
"\n",
"# Relative error should be around e-10 or less\n",
"print('Testing affine_relu_forward and affine_relu_backward:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте двухслойную полносвязную сеть - класс TwoLayerNet в scripts/classifiers/fc_net.py . Проверьте свою реализацию, запустив код ниже. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H, C = 3, 5, 50, 7\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=N)\n",
"\n",
"std = 1e-3\n",
"model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n",
"\n",
"print('Testing initialization ... ')\n",
"W1_std = abs(model.params['W1'].std() - std)\n",
"b1 = model.params['b1']\n",
"W2_std = abs(model.params['W2'].std() - std)\n",
"b2 = model.params['b2']\n",
"assert W1_std < std / 10, 'First layer weights do not seem right'\n",
"assert np.all(b1 == 0), 'First layer biases do not seem right'\n",
"assert W2_std < std / 10, 'Second layer weights do not seem right'\n",
"assert np.all(b2 == 0), 'Second layer biases do not seem right'\n",
"\n",
"print('Testing test-time forward pass ... ')\n",
"model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n",
"model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n",
"model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n",
"model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n",
"X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n",
"scores = model.loss(X)\n",
"correct_scores = np.asarray(\n",
" [[11.53165108, 12.2917344, 13.05181771, 13.81190102, 14.57198434, 15.33206765, 16.09215096],\n",
" [12.05769098, 12.74614105, 13.43459113, 14.1230412, 14.81149128, 15.49994135, 16.18839143],\n",
" [12.58373087, 13.20054771, 13.81736455, 14.43418138, 15.05099822, 15.66781506, 16.2846319 ]])\n",
"scores_diff = np.abs(scores - correct_scores).sum()\n",
"assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n",
"\n",
"print('Testing training loss (no regularization)')\n",
"y = np.asarray([0, 5, 1])\n",
"loss, grads = model.loss(X, y)\n",
"correct_loss = 3.4702243556\n",
"assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n",
"\n",
"model.reg = 1.0\n",
"loss, grads = model.loss(X, y)\n",
"correct_loss = 26.5948426952\n",
"assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n",
"\n",
"# Errors should be around e-7 or less\n",
"for reg in [0.0, 0.7]:\n",
" print('Running numeric gradient check with reg = ', reg)\n",
" model.reg = reg\n",
" loss, grads = model.loss(X, y)\n",
"\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ознакомьтесь с API для обучения и тестирования моделей в scripts/solver.py . Используйте экземпляр класса Solver для обучения двухслойной полносвязной сети. Необходимо достичь минимум 50% верно классифицированных объектов на валидационном наборе. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = TwoLayerNet()\n",
"solver = None\n",
"\n",
"##############################################################################\n",
"# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least #\n",
"# 50% accuracy on the validation set. #\n",
"##############################################################################\n",
"# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"\n",
"pass\n",
"\n",
"# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"##############################################################################\n",
"# END OF YOUR CODE #\n",
"##############################################################################"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plt.title('Training loss')\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.xlabel('Iteration')\n",
"\n",
"plt.subplot(2, 1, 2)\n",
"plt.title('Accuracy')\n",
"plt.plot(solver.train_acc_history, '-o', label='train')\n",
"plt.plot(solver.val_acc_history, '-o', label='val')\n",
"plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n",
"plt.xlabel('Epoch')\n",
"plt.legend(loc='lower right')\n",
"plt.gcf().set_size_inches(15, 12)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Теперь реализуйте полносвязную сеть с произвольным числом скрытых слоев. Ознакомьтесь с классом FullyConnectedNet в scripts/classifiers/fc_net.py . Реализуйте инициализацию, прямой и обратный проходы."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"for reg in [0, 3.14]:\n",
" print('Running check with reg = ', reg)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" reg=reg, weight_scale=5e-2, dtype=np.float64)\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
" \n",
" # Most of the errors should be on the order of e-7 or smaller. \n",
" # NOTE: It is fine however to see an error for W2 on the order of e-5\n",
" # for the check when reg = 0.0\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуйте добиться эффекта переобучения на небольшом наборе изображений (например, 50). Используйте трехслойную сеть со 100 нейронами на каждом скрытом слое. Попробуйте переобучить сеть, достигнув 100 % accuracy за 20 эпох. Для этого поэкспериментируйте с параметрами weight_scale и learning_rate. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Use a three-layer Net to overfit 50 training examples by \n",
"# tweaking just the learning rate and initialization scale.\n",
"\n",
"num_train = 50\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"weight_scale = 1e-2 # Experiment with this!\n",
"learning_rate = 1e-4 # Experiment with this!\n",
"model = FullyConnectedNet([100, 100],\n",
" weight_scale=weight_scale, dtype=np.float64)\n",
"solver = Solver(model, small_data,\n",
" print_every=10, num_epochs=20, batch_size=25,\n",
" update_rule='sgd',\n",
" optim_config={\n",
" 'learning_rate': learning_rate,\n",
" }\n",
" )\n",
"solver.train()\n",
"\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.title('Training loss history')\n",
"plt.xlabel('Iteration')\n",
"plt.ylabel('Training loss')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Повторите эксперимент, описанный выше, для пятислойной сети."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Use a five-layer Net to overfit 50 training examples by \n",
"# tweaking just the learning rate and initialization scale.\n",
"\n",
"num_train = 50\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"learning_rate = 2e-3 # Experiment with this!\n",
"weight_scale = 1e-5 # Experiment with this!\n",
"model = FullyConnectedNet([100, 100, 100, 100],\n",
" weight_scale=weight_scale, dtype=np.float64)\n",
"solver = Solver(model, small_data,\n",
" print_every=10, num_epochs=20, batch_size=25,\n",
" update_rule='sgd',\n",
" optim_config={\n",
" 'learning_rate': learning_rate,\n",
" }\n",
" )\n",
"solver.train()\n",
"\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.title('Training loss history')\n",
"plt.xlabel('Iteration')\n",
"plt.ylabel('Training loss')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сделайте выводы по проведенному эксперименту. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ранее обновление весов проходило по правилу SGD. Теперь попробуйте реализовать стохастический градиентный спуск с импульсом (SGD+momentum). http://cs231n.github.io/neural-networks-3/#sgd Реализуйте sgd_momentum в scripts/optim.py и запустите проверку. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.optim import sgd_momentum\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-3, 'velocity': v}\n",
"next_w, _ = sgd_momentum(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [ 0.1406, 0.20738947, 0.27417895, 0.34096842, 0.40775789],\n",
" [ 0.47454737, 0.54133684, 0.60812632, 0.67491579, 0.74170526],\n",
" [ 0.80849474, 0.87528421, 0.94207368, 1.00886316, 1.07565263],\n",
" [ 1.14244211, 1.20923158, 1.27602105, 1.34281053, 1.4096 ]])\n",
"expected_velocity = np.asarray([\n",
" [ 0.5406, 0.55475789, 0.56891579, 0.58307368, 0.59723158],\n",
" [ 0.61138947, 0.62554737, 0.63970526, 0.65386316, 0.66802105],\n",
" [ 0.68217895, 0.69633684, 0.71049474, 0.72465263, 0.73881053],\n",
" [ 0.75296842, 0.76712632, 0.78128421, 0.79544211, 0.8096 ]])\n",
"\n",
"# Should see relative errors around e-8 or less\n",
"print('next_w error: ', rel_error(next_w, expected_next_w))\n",
"print('velocity error: ', rel_error(expected_velocity, config['velocity']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сравните результаты обучения шестислойной сети, обученной классическим градиентным спуском и адаптивным алгоритмом с импульсом. Какой алгоритм сходится быстрее."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_train = 4000\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"solvers = {}\n",
"\n",
"for update_rule in ['sgd', 'sgd_momentum']:\n",
" print('running with ', update_rule)\n",
" model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",
"\n",
" solver = Solver(model, small_data,\n",
" num_epochs=5, batch_size=100,\n",
" update_rule=update_rule,\n",
" optim_config={\n",
" 'learning_rate': 5e-3,\n",
" },\n",
" verbose=True)\n",
" solvers[update_rule] = solver\n",
" solver.train()\n",
" print()\n",
"\n",
"plt.subplot(3, 1, 1)\n",
"plt.title('Training loss')\n",
"plt.xlabel('Iteration')\n",
"\n",
"plt.subplot(3, 1, 2)\n",
"plt.title('Training accuracy')\n",
"plt.xlabel('Epoch')\n",
"\n",
"plt.subplot(3, 1, 3)\n",
"plt.title('Validation accuracy')\n",
"plt.xlabel('Epoch')\n",
"\n",
"for update_rule, solver in solvers.items():\n",
" plt.subplot(3, 1, 1)\n",
" plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n",
" \n",
" plt.subplot(3, 1, 2)\n",
" plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n",
"\n",
" plt.subplot(3, 1, 3)\n",
" plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n",
" \n",
"for i in [1, 2, 3]:\n",
" plt.subplot(3, 1, i)\n",
" plt.legend(loc='upper center', ncol=4)\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте алгоритмы RMSProp [1] and Adam [2] с коррекцией смещения - методы rmsprop и adam . \n",
"\n",
"\n",
"[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n",
"\n",
"[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test RMSProp implementation\n",
"from scripts.optim import rmsprop\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-2, 'cache': cache}\n",
"next_w, _ = rmsprop(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n",
" [-0.132737, -0.08078555, -0.02881884, 0.02316247, 0.07515774],\n",
" [ 0.12716641, 0.17918792, 0.23122175, 0.28326742, 0.33532447],\n",
" [ 0.38739248, 0.43947102, 0.49155973, 0.54365823, 0.59576619]])\n",
"expected_cache = np.asarray([\n",
" [ 0.5976, 0.6126277, 0.6277108, 0.64284931, 0.65804321],\n",
" [ 0.67329252, 0.68859723, 0.70395734, 0.71937285, 0.73484377],\n",
" [ 0.75037008, 0.7659518, 0.78158892, 0.79728144, 0.81302936],\n",
" [ 0.82883269, 0.84469141, 0.86060554, 0.87657507, 0.8926 ]])\n",
"\n",
"# You should see relative errors around e-7 or less\n",
"print('next_w error: ', rel_error(expected_next_w, next_w))\n",
"print('cache error: ', rel_error(expected_cache, config['cache']))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test Adam implementation\n",
"from scripts.optim import adam\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n",
"next_w, _ = adam(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n",
" [-0.1380274, -0.08544591, -0.03286534, 0.01971428, 0.0722929],\n",
" [ 0.1248705, 0.17744702, 0.23002243, 0.28259667, 0.33516969],\n",
" [ 0.38774145, 0.44031188, 0.49288093, 0.54544852, 0.59801459]])\n",
"expected_v = np.asarray([\n",
" [ 0.69966, 0.68908382, 0.67851319, 0.66794809, 0.65738853,],\n",
" [ 0.64683452, 0.63628604, 0.6257431, 0.61520571, 0.60467385,],\n",
" [ 0.59414753, 0.58362676, 0.57311152, 0.56260183, 0.55209767,],\n",
" [ 0.54159906, 0.53110598, 0.52061845, 0.51013645, 0.49966, ]])\n",
"expected_m = np.asarray([\n",
" [ 0.48, 0.49947368, 0.51894737, 0.53842105, 0.55789474],\n",
" [ 0.57736842, 0.59684211, 0.61631579, 0.63578947, 0.65526316],\n",
" [ 0.67473684, 0.69421053, 0.71368421, 0.73315789, 0.75263158],\n",
" [ 0.77210526, 0.79157895, 0.81105263, 0.83052632, 0.85 ]])\n",
"\n",
"# You should see relative errors around e-7 or less\n",
"print('next_w error: ', rel_error(expected_next_w, next_w))\n",
"print('v error: ', rel_error(expected_v, config['v']))\n",
"print('m error: ', rel_error(expected_m, config['m']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите пару глубоких сетей с испольованием RMSProp и Adam алгоритмов обновления весов и сравните результаты обучения."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Получите лучшую полносвязную сеть для классификации вашего набора данных. На наборе CIFAR-10 необходимо получить accuracy не ниже 50 % на валидационном наборе."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"best_model = None\n",
"################################################################################\n",
"# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might #\n",
"# find batch/layer normalization and dropout useful. Store your best model in #\n",
"# the best_model variable. #\n",
"################################################################################\n",
"# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"\n",
"pass\n",
"\n",
"# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"################################################################################\n",
"# END OF YOUR CODE #\n",
"################################################################################"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Получите оценку accuracy для валидационной и тестовой выборок. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n",
"y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n",
"print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n",
"print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Нормализация по мини-батчам\n",
"\n",
"Идея нормализации по мини-батчам предложена в работе [1]\n",
"\n",
"[1] Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\", ICML 2015."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для слоя батч-нормализации - функция batchnorm_forward в scripts/layers.py . Проверьте свою реализацию, запустив следующий код:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check the training-time forward pass by checking means and variances\n",
"# of features both before and after batch normalization \n",
"\n",
"# Simulate the forward pass for a two-layer network\n",
"np.random.seed(231)\n",
"N, D1, D2, D3 = 200, 50, 60, 3\n",
"X = np.random.randn(N, D1)\n",
"W1 = np.random.randn(D1, D2)\n",
"W2 = np.random.randn(D2, D3)\n",
"a = np.maximum(0, X.dot(W1)).dot(W2)\n",
"\n",
"print('Before batch normalization:')\n",
"print_mean_std(a,axis=0)\n",
"\n",
"gamma = np.ones((D3,))\n",
"beta = np.zeros((D3,))\n",
"# Means should be close to zero and stds close to one\n",
"print('After batch normalization (gamma=1, beta=0)')\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
"print_mean_std(a_norm,axis=0)\n",
"\n",
"gamma = np.asarray([1.0, 2.0, 3.0])\n",
"beta = np.asarray([11.0, 12.0, 13.0])\n",
"# Now means should be close to beta and stds close to gamma\n",
"print('After batch normalization (gamma=', gamma, ', beta=', beta, ')')\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
"print_mean_std(a_norm,axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check the test-time forward pass by running the training-time\n",
"# forward pass many times to warm up the running averages, and then\n",
"# checking the means and variances of activations after a test-time\n",
"# forward pass.\n",
"\n",
"np.random.seed(231)\n",
"N, D1, D2, D3 = 200, 50, 60, 3\n",
"W1 = np.random.randn(D1, D2)\n",
"W2 = np.random.randn(D2, D3)\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"gamma = np.ones(D3)\n",
"beta = np.zeros(D3)\n",
"\n",
"for t in range(50):\n",
" X = np.random.randn(N, D1)\n",
" a = np.maximum(0, X.dot(W1)).dot(W2)\n",
" batchnorm_forward(a, gamma, beta, bn_param)\n",
"\n",
"bn_param['mode'] = 'test'\n",
"X = np.random.randn(N, D1)\n",
"a = np.maximum(0, X.dot(W1)).dot(W2)\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, bn_param)\n",
"\n",
"# Means should be close to zero and stds close to one, but will be\n",
"# noisier than training-time forward passes.\n",
"print('After batch normalization (test-time):')\n",
"print_mean_std(a_norm,axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход в функции batchnorm_backward."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Gradient check batchnorm backward pass\n",
"np.random.seed(231)\n",
"N, D = 4, 5\n",
"x = 5 * np.random.randn(N, D) + 12\n",
"gamma = np.random.randn(D)\n",
"beta = np.random.randn(D)\n",
"dout = np.random.randn(N, D)\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"fx = lambda x: batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
"fg = lambda a: batchnorm_forward(x, a, beta, bn_param)[0]\n",
"fb = lambda b: batchnorm_forward(x, gamma, b, bn_param)[0]\n",
"\n",
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n",
"da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n",
"db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n",
"\n",
"_, cache = batchnorm_forward(x, gamma, beta, bn_param)\n",
"dx, dgamma, dbeta = batchnorm_backward(dout, cache)\n",
"#You should expect to see relative errors between 1e-13 and 1e-8\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dgamma error: ', rel_error(da_num, dgamma))\n",
"print('dbeta error: ', rel_error(db_num, dbeta))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Измените реализацию класса FullyConnectedNet, добавив батч-нормализацию. \n",
"Если флаг normalization == \"batchnorm\", то вам необходимо вставить слой батч-нормализации перед каждым слоем активации ReLU, кроме выхода сети. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"# You should expect losses between 1e-4~1e-10 for W, \n",
"# losses between 1e-08~1e-10 for b,\n",
"# and losses between 1e-08~1e-09 for beta and gammas.\n",
"for reg in [0, 3.14]:\n",
" print('Running check with reg = ', reg)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" reg=reg, weight_scale=5e-2, dtype=np.float64,\n",
" normalization='batchnorm')\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
"\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
" if reg == 0: print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите 6-ти слойную сеть на наборе из 1000 изображений с батч-нормализацией и без нее"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"# Try training a very deep net with batchnorm\n",
"hidden_dims = [100, 100, 100, 100, 100]\n",
"\n",
"num_train = 1000\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"weight_scale = 2e-2\n",
"bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n",
"model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
"\n",
"print('Solver with batch norm:')\n",
"bn_solver = Solver(bn_model, small_data,\n",
" num_epochs=10, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True,print_every=20)\n",
"bn_solver.train()\n",
"\n",
"print('\\nSolver without batch norm:')\n",
"solver = Solver(model, small_data,\n",
" num_epochs=10, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=20)\n",
"solver.train()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Визуализируйте процесс обучения для двух сетей. Увеличилась ли скорость сходимости в случае с батч-нормализацией? Сделайте выводы. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_training_history(title, label, baseline, bn_solvers, plot_fn, bl_marker='.', bn_marker='.', labels=None):\n",
" \"\"\"utility function for plotting training history\"\"\"\n",
" plt.title(title)\n",
" plt.xlabel(label)\n",
" bn_plots = [plot_fn(bn_solver) for bn_solver in bn_solvers]\n",
" bl_plot = plot_fn(baseline)\n",
" num_bn = len(bn_plots)\n",
" for i in range(num_bn):\n",
" label='with_norm'\n",
" if labels is not None:\n",
" label += str(labels[i])\n",
" plt.plot(bn_plots[i], bn_marker, label=label)\n",
" label='baseline'\n",
" if labels is not None:\n",
" label += str(labels[0])\n",
" plt.plot(bl_plot, bl_marker, label=label)\n",
" plt.legend(loc='lower center', ncol=num_bn+1) \n",
"\n",
" \n",
"plt.subplot(3, 1, 1)\n",
"plot_training_history('Training loss','Iteration', solver, [bn_solver], \\\n",
" lambda x: x.loss_history, bl_marker='o', bn_marker='o')\n",
"plt.subplot(3, 1, 2)\n",
"plot_training_history('Training accuracy','Epoch', solver, [bn_solver], \\\n",
" lambda x: x.train_acc_history, bl_marker='-o', bn_marker='-o')\n",
"plt.subplot(3, 1, 3)\n",
"plot_training_history('Validation accuracy','Epoch', solver, [bn_solver], \\\n",
" lambda x: x.val_acc_history, bl_marker='-o', bn_marker='-o')\n",
"\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите 6-тислойную сеть с батч-нормализацией и без нее, используя разные размеры батча. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def run_batchsize_experiments(normalization_mode):\n",
" np.random.seed(231)\n",
" # Try training a very deep net with batchnorm\n",
" hidden_dims = [100, 100, 100, 100, 100]\n",
" num_train = 1000\n",
" small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
" }\n",
" n_epochs=10\n",
" weight_scale = 2e-2\n",
" batch_sizes = [5,10,50]\n",
" lr = 10**(-3.5)\n",
" solver_bsize = batch_sizes[0]\n",
"\n",
" print('No normalization: batch size = ',solver_bsize)\n",
" model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
" solver = Solver(model, small_data,\n",
" num_epochs=n_epochs, batch_size=solver_bsize,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': lr,\n",
" },\n",
" verbose=False)\n",
" solver.train()\n",
" \n",
" bn_solvers = []\n",
" for i in range(len(batch_sizes)):\n",
" b_size=batch_sizes[i]\n",
" print('Normalization: batch size = ',b_size)\n",
" bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=normalization_mode)\n",
" bn_solver = Solver(bn_model, small_data,\n",
" num_epochs=n_epochs, batch_size=b_size,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': lr,\n",
" },\n",
" verbose=False)\n",
" bn_solver.train()\n",
" bn_solvers.append(bn_solver)\n",
" \n",
" return bn_solvers, solver, batch_sizes\n",
"\n",
"batch_sizes = [5,10,50]\n",
"bn_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('batchnorm')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plot_training_history('Training accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
" lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
"plt.subplot(2, 1, 2)\n",
"plot_training_history('Validation accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
" lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
"\n",
"plt.gcf().set_size_inches(15, 10)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dropout"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для dropout-слоя в scripts/layers.py\n",
"\n",
"http://cs231n.github.io/neural-networks-2/#reg"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(500, 500) + 10\n",
"\n",
"for p in [0.25, 0.4, 0.7]:\n",
" out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n",
" out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n",
"\n",
" print('Running tests with p = ', p)\n",
" print('Mean of input: ', x.mean())\n",
" print('Mean of train-time output: ', out.mean())\n",
" print('Mean of test-time output: ', out_test.mean())\n",
" print('Fraction of train-time output set to zero: ', (out == 0).mean())\n",
" print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для dropout-слоя"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 10) + 10\n",
"dout = np.random.randn(*x.shape)\n",
"\n",
"dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n",
"out, cache = dropout_forward(x, dropout_param)\n",
"dx = dropout_backward(dout, cache)\n",
"dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n",
"\n",
"# Error should be around e-10 or less\n",
"print('dx relative error: ', rel_error(dx, dx_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Добавьте в реализацию класса FullyConnectedNet поддержку dropout. Если параметр dropout != 1, то добавьте в модель dropout-слой после каждого слоя активации. Проверьте свою реализацию"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"for dropout in [1, 0.75, 0.5]:\n",
" print('Running check with dropout = ', dropout)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" weight_scale=5e-2, dtype=np.float64,\n",
" dropout=dropout, seed=123)\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
" \n",
" # Relative errors should be around e-6 or less; Note that it's fine\n",
" # if for dropout=1 you have W2 error be on the order of e-5.\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите две двухслойные сети с dropout-слоем (вероятность отсева 0,25) и без на наборе из 500 изображений. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train two identical nets, one with dropout and one without\n",
"np.random.seed(231)\n",
"num_train = 500\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"solvers = {}\n",
"dropout_choices = [1, 0.25]\n",
"for dropout in dropout_choices:\n",
" model = FullyConnectedNet([500], dropout=dropout)\n",
" print(dropout)\n",
"\n",
" solver = Solver(model, small_data,\n",
" num_epochs=25, batch_size=100,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 5e-4,\n",
" },\n",
" verbose=True, print_every=100)\n",
" solver.train()\n",
" solvers[dropout] = solver\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plot train and validation accuracies of the two models\n",
"\n",
"train_accs = []\n",
"val_accs = []\n",
"for dropout in dropout_choices:\n",
" solver = solvers[dropout]\n",
" train_accs.append(solver.train_acc_history[-1])\n",
" val_accs.append(solver.val_acc_history[-1])\n",
"\n",
"plt.subplot(3, 1, 1)\n",
"for dropout in dropout_choices:\n",
" plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n",
"plt.title('Train accuracy')\n",
"plt.xlabel('Epoch')\n",
"plt.ylabel('Accuracy')\n",
"plt.legend(ncol=2, loc='lower right')\n",
" \n",
"plt.subplot(3, 1, 2)\n",
"for dropout in dropout_choices:\n",
" plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n",
"plt.title('Val accuracy')\n",
"plt.xlabel('Epoch')\n",
"plt.ylabel('Accuracy')\n",
"plt.legend(ncol=2, loc='lower right')\n",
"\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Сверточные нейронные сети (CNN)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для сверточного слоя - функция conv_forward_naive в scripts/layers.py юПроверьте свою реализацию, запустив код ниже "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"w_shape = (3, 3, 4, 4)\n",
"x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n",
"w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n",
"b = np.linspace(-0.1, 0.2, num=3)\n",
"\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"out, _ = conv_forward_naive(x, w, b, conv_param)\n",
"correct_out = np.array([[[[-0.08759809, -0.10987781],\n",
" [-0.18387192, -0.2109216 ]],\n",
" [[ 0.21027089, 0.21661097],\n",
" [ 0.22847626, 0.23004637]],\n",
" [[ 0.50813986, 0.54309974],\n",
" [ 0.64082444, 0.67101435]]],\n",
" [[[-0.98053589, -1.03143541],\n",
" [-1.19128892, -1.24695841]],\n",
" [[ 0.69108355, 0.66880383],\n",
" [ 0.59480972, 0.56776003]],\n",
" [[ 2.36270298, 2.36904306],\n",
" [ 2.38090835, 2.38247847]]]])\n",
"\n",
"# Compare your output to ours; difference should be around e-8\n",
"print('Testing conv_forward_naive')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход - функция conv_backward_naive в scripts/layers.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(4, 3, 5, 5)\n",
"w = np.random.randn(2, 3, 3, 3)\n",
"b = np.random.randn(2,)\n",
"dout = np.random.randn(4, 2, 5, 5)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"out, cache = conv_forward_naive(x, w, b, conv_param)\n",
"dx, dw, db = conv_backward_naive(dout, cache)\n",
"\n",
"# Your errors should be around e-8 or less.\n",
"print('Testing conv_backward_naive function')\n",
"print('dx error: ', rel_error(dx, dx_num))\n",
"print('dw error: ', rel_error(dw, dw_num))\n",
"print('db error: ', rel_error(db, db_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для max-pooling слоя -функция max_pool_forward_naive в scripts/layers.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n",
"pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n",
"\n",
"out, _ = max_pool_forward_naive(x, pool_param)\n",
"\n",
"correct_out = np.array([[[[-0.26315789, -0.24842105],\n",
" [-0.20421053, -0.18947368]],\n",
" [[-0.14526316, -0.13052632],\n",
" [-0.08631579, -0.07157895]],\n",
" [[-0.02736842, -0.01263158],\n",
" [ 0.03157895, 0.04631579]]],\n",
" [[[ 0.09052632, 0.10526316],\n",
" [ 0.14947368, 0.16421053]],\n",
" [[ 0.20842105, 0.22315789],\n",
" [ 0.26736842, 0.28210526]],\n",
" [[ 0.32631579, 0.34105263],\n",
" [ 0.38526316, 0.4 ]]]])\n",
"\n",
"# Compare your output with ours. Difference should be on the order of e-8.\n",
"print('Testing max_pool_forward_naive function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для max-pooling слоя в max_pool_backward_naive . "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(3, 2, 8, 8)\n",
"dout = np.random.randn(3, 2, 4, 4)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n",
"\n",
"out, cache = max_pool_forward_naive(x, pool_param)\n",
"dx = max_pool_backward_naive(dout, cache)\n",
"\n",
"# Your error should be on the order of e-12\n",
"print('Testing max_pool_backward_naive function:')\n",
"print('dx error: ', rel_error(dx, dx_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В скрипте scripts/fast_layers.py представлены быстрые реализации слоев свертки и пуллинга, написанных с использованием Cython. \n",
"\n",
"Для компиляции выполните следующую команду в директории scripts\n",
"\n",
"```bash\n",
"python setup.py build_ext --inplace\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сравните ваши реализации слоев свертки и пуллинга с быстрыми реализациями."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Rel errors should be around e-9 or less\n",
"from scripts.fast_layers import conv_forward_fast, conv_backward_fast\n",
"from time import time\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 31, 31)\n",
"w = np.random.randn(25, 3, 3, 3)\n",
"b = np.random.randn(25,)\n",
"dout = np.random.randn(100, 25, 16, 16)\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n",
"t2 = time()\n",
"\n",
"print('Testing conv_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('Difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting conv_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))\n",
"print('dw difference: ', rel_error(dw_naive, dw_fast))\n",
"print('db difference: ', rel_error(db_naive, db_fast))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Relative errors should be close to 0.0\n",
"from scripts.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 32, 32)\n",
"dout = np.random.randn(100, 3, 16, 16)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n",
"t2 = time()\n",
"\n",
"print('Testing pool_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive = max_pool_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast = max_pool_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting pool_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В layer_utils.py вы можете найти часто используемые комбинации слоев, используемых в сверточных сетях. Ознакомьтесь с ними и запустите код ниже для проверки их работы"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 16, 16)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n",
"dx, dw, db = conv_relu_pool_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu_pool')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import conv_relu_forward, conv_relu_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 8, 8)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"out, cache = conv_relu_forward(x, w, b, conv_param)\n",
"dx, dw, db = conv_relu_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Напишите реализацию класса ThreeLayerConvNet в scripts/classifiers/cnn.py . Вы можете использовать готовые реализации слоев и их комбинаций."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверьте вашу реализацию. Ожидается, что значение функции потерь softmax будет порядка `log(C)` для `C` классов для случая без регуляризации. В случае регуляризации значение функции потерь должно немного возрасти. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet()\n",
"\n",
"N = 50\n",
"X = np.random.randn(N, 3, 32, 32)\n",
"y = np.random.randint(10, size=N)\n",
"\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (no regularization): ', loss)\n",
"\n",
"model.reg = 0.5\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (with regularization): ', loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверьте реализацию обратного прохода"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_inputs = 2\n",
"input_dim = (3, 16, 16)\n",
"reg = 0.0\n",
"num_classes = 10\n",
"np.random.seed(231)\n",
"X = np.random.randn(num_inputs, *input_dim)\n",
"y = np.random.randint(num_classes, size=num_inputs)\n",
"\n",
"model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n",
" input_dim=input_dim, hidden_dim=7,\n",
" dtype=np.float64)\n",
"loss, grads = model.loss(X, y)\n",
"# Errors should be small, but correct implementations may have\n",
"# relative errors up to the order of e-2\n",
"for param_name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n",
" e = rel_error(param_grad_num, grads[param_name])\n",
" print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуйте добиться эффекта переобучения. Обучите модель на небольшом наборе данных.Сравните значения accuracy на обучающих данных и на валидационных. Визуализируйте графики обучения "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"\n",
"num_train = 100\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"model = ThreeLayerConvNet(weight_scale=1e-2)\n",
"\n",
"solver = Solver(model, small_data,\n",
" num_epochs=15, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=1)\n",
"solver.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final training accuracy\n",
"print(\n",
" \"Small data training accuracy:\",\n",
" solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final validation accuracy\n",
"print(\n",
" \"Small data validation accuracy:\",\n",
" solver.check_accuracy(small_data['X_val'], small_data['y_val'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.xlabel('iteration')\n",
"plt.ylabel('loss')\n",
"\n",
"plt.subplot(2, 1, 2)\n",
"plt.plot(solver.train_acc_history, '-o')\n",
"plt.plot(solver.val_acc_history, '-o')\n",
"plt.legend(['train', 'val'], loc='upper left')\n",
"plt.xlabel('epoch')\n",
"plt.ylabel('accuracy')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите сеть на полном наборе данных. Выведите accuracy на обучающей и валидационной выборках"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n",
"\n",
"solver = Solver(model, data,\n",
" num_epochs=1, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=20)\n",
"solver.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final training accuracy\n",
"print(\n",
" \"Full data training accuracy:\",\n",
" solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final validation accuracy\n",
"print(\n",
" \"Full data validation accuracy:\",\n",
" solver.check_accuracy(data['X_val'], data['y_val'])\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Визуализируйте фильтры на первом слое обученной сети"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.vis_utils import visualize_grid\n",
"\n",
"grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n",
"plt.imshow(grid.astype('uint8'))\n",
"plt.axis('off')\n",
"plt.gcf().set_size_inches(5, 5)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}