You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DL_Course_SamU/lab_3/assignment3.ipynb

1872 lines
69 KiB
Plaintext

4 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1) Полносвязная нейронная сеть ( Fully-Connected Neural Network)\n",
"\n",
"2) Нормализация по мини-батчам (Batch normalization)\n",
"\n",
"3) Dropout\n",
"\n",
"4) Сверточные нейронные сети (Convolutional Networks)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Лабораторные работы можно выполнять с использованием сервиса Google Colaboratory (https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d) или на локальном компьютере. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Полносвязная нейронная сеть"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В данной лабораторной работе необходимо будет реализовать полносвязную нейронную сеть, используя модульный подход. Для каждого слоя реализации прямого и обратного проходов алгоритма обратного распространения ошибки будут иметь следующий вид:\n",
"\n",
"```python\n",
"def layer_forward(x, w):\n",
" \"\"\" Receive inputs x and weights w \"\"\"\n",
" # Do some computations ...\n",
" z = # ... some intermediate value\n",
" # Do some more computations ...\n",
" out = # the output\n",
" \n",
" cache = (x, w, z, out) # Values we need to compute gradients\n",
" \n",
" return out, cache\n",
"```\n",
"\n",
"\n",
"\n",
"```python\n",
"def layer_backward(dout, cache):\n",
" \"\"\"\n",
" Receive dout (derivative of loss with respect to outputs) and cache,\n",
" and compute derivative with respect to inputs.\n",
" \"\"\"\n",
" # Unpack cache values\n",
" x, w, z, out = cache\n",
" \n",
" # Use values in cache to compute derivatives\n",
" dx = # Derivative of loss with respect to x\n",
" dw = # Derivative of loss with respect to w\n",
" \n",
" return dx, dw\n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=========== You can safely ignore the message below if you are NOT working on ConvolutionalNetworks.ipynb ===========\n",
"\tYou will need to compile a Cython extension for a portion of this assignment.\n",
"\tThe instructions to do this will be given in a section of the notebook below.\n",
"\tThere will be an option for Colab users and another for Jupyter (local) users.\n"
]
}
],
"source": [
"from __future__ import print_function\n",
"import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from scripts.classifiers.fc_net import *\n",
"\n",
"from scripts.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",
"from scripts.solver import Solver\n",
"from scripts.classifiers.cnn import *\n",
"from scripts.layers import *\n",
"from scripts.fast_layers import *\n",
"\n",
"\n",
"%matplotlib inline\n",
"plt.rcParams['figure.figsize'] = (10.0, 8.0) \n",
"plt.rcParams['image.interpolation'] = 'nearest'\n",
"plt.rcParams['image.cmap'] = 'gray'\n",
"\n",
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"def rel_error(x, y):\n",
" \"\"\" returns relative error \"\"\"\n",
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n",
"def print_mean_std(x,axis=0):\n",
" print(' means: ', x.mean(axis=axis))\n",
" print(' stds: ', x.std(axis=axis))\n",
" print() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Загрузите данные из предыдущей лабораторной работы. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для полносвязного слоя реализуйте прямой проход (метод affine_forward в scripts/layers.py). Протестируйте свою реализацию. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_inputs = 2\n",
"input_shape = (4, 5, 6)\n",
"output_dim = 3\n",
"\n",
"input_size = num_inputs * np.prod(input_shape)\n",
"weight_size = output_dim * np.prod(input_shape)\n",
"\n",
"x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n",
"w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n",
"b = np.linspace(-0.3, 0.1, num=output_dim)\n",
"\n",
"out, _ = affine_forward(x, w, b)\n",
"correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],\n",
" [ 3.25553199, 3.5141327, 3.77273342]])\n",
"\n",
"\n",
"print('Testing affine_forward function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для полносвязного слоя реализуйте обратный проход (метод affine_backward в scripts/layers.py). Протестируйте свою реализацию. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 2, 3)\n",
"w = np.random.randn(6, 5)\n",
"b = np.random.randn(5)\n",
"dout = np.random.randn(10, 5)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n",
"\n",
"_, cache = affine_forward(x, w, b)\n",
"dx, dw, db = affine_backward(dout, cache)\n",
"\n",
"print('Testing affine_backward function:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для слоя активации ReLU (relu_forward) и протестируйте его."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n",
"\n",
"out, _ = relu_forward(x)\n",
"correct_out = np.array([[ 0., 0., 0., 0., ],\n",
" [ 0., 0., 0.04545455, 0.13636364,],\n",
" [ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])\n",
"\n",
"# Compare your output with ours. The error should be on the order of e-8\n",
"print('Testing relu_forward function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для слоя активации ReLU (relu_backward ) и протестируйте его."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 10)\n",
"dout = np.random.randn(*x.shape)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n",
"\n",
"_, cache = relu_forward(x)\n",
"dx = relu_backward(dout, cache)\n",
"\n",
"# The error should be on the order of e-12\n",
"print('Testing relu_backward function:')\n",
"print('dx error: ', rel_error(dx_num, dx))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В скрипте /layer_utils.py приведены реализации прямого и обратного проходов для часто используемых комбинаций слоев. Например, за полносвязным слоем часто следует слой активации. Ознакомьтесь с функциями affine_relu_forward и affine_relu_backward, запустите код ниже и убедитесь, что ошибка порядка e-10 или ниже. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import affine_relu_forward, affine_relu_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 4)\n",
"w = np.random.randn(12, 10)\n",
"b = np.random.randn(10)\n",
"dout = np.random.randn(2, 10)\n",
"\n",
"out, cache = affine_relu_forward(x, w, b)\n",
"dx, dw, db = affine_relu_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n",
"\n",
"# Relative error should be around e-10 or less\n",
"print('Testing affine_relu_forward and affine_relu_backward:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте двухслойную полносвязную сеть - класс TwoLayerNet в scripts/classifiers/fc_net.py . Проверьте свою реализацию, запустив код ниже. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H, C = 3, 5, 50, 7\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=N)\n",
"\n",
"std = 1e-3\n",
"model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n",
"\n",
"print('Testing initialization ... ')\n",
"W1_std = abs(model.params['W1'].std() - std)\n",
"b1 = model.params['b1']\n",
"W2_std = abs(model.params['W2'].std() - std)\n",
"b2 = model.params['b2']\n",
"assert W1_std < std / 10, 'First layer weights do not seem right'\n",
"assert np.all(b1 == 0), 'First layer biases do not seem right'\n",
"assert W2_std < std / 10, 'Second layer weights do not seem right'\n",
"assert np.all(b2 == 0), 'Second layer biases do not seem right'\n",
"\n",
"print('Testing test-time forward pass ... ')\n",
"model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n",
"model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n",
"model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n",
"model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n",
"X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n",
"scores = model.loss(X)\n",
"correct_scores = np.asarray(\n",
" [[11.53165108, 12.2917344, 13.05181771, 13.81190102, 14.57198434, 15.33206765, 16.09215096],\n",
" [12.05769098, 12.74614105, 13.43459113, 14.1230412, 14.81149128, 15.49994135, 16.18839143],\n",
" [12.58373087, 13.20054771, 13.81736455, 14.43418138, 15.05099822, 15.66781506, 16.2846319 ]])\n",
"scores_diff = np.abs(scores - correct_scores).sum()\n",
"assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n",
"\n",
"print('Testing training loss (no regularization)')\n",
"y = np.asarray([0, 5, 1])\n",
"loss, grads = model.loss(X, y)\n",
"correct_loss = 3.4702243556\n",
"assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n",
"\n",
"model.reg = 1.0\n",
"loss, grads = model.loss(X, y)\n",
"correct_loss = 26.5948426952\n",
"assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n",
"\n",
"# Errors should be around e-7 or less\n",
"for reg in [0.0, 0.7]:\n",
" print('Running numeric gradient check with reg = ', reg)\n",
" model.reg = reg\n",
" loss, grads = model.loss(X, y)\n",
"\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ознакомьтесь с API для обучения и тестирования моделей в scripts/solver.py . Используйте экземпляр класса Solver для обучения двухслойной полносвязной сети. Необходимо достичь минимум 50% верно классифицированных объектов на валидационном наборе. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = TwoLayerNet()\n",
"solver = None\n",
"\n",
"##############################################################################\n",
"# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least #\n",
"# 50% accuracy on the validation set. #\n",
"##############################################################################\n",
"# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"\n",
"pass\n",
"\n",
"# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"##############################################################################\n",
"# END OF YOUR CODE #\n",
"##############################################################################"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plt.title('Training loss')\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.xlabel('Iteration')\n",
"\n",
"plt.subplot(2, 1, 2)\n",
"plt.title('Accuracy')\n",
"plt.plot(solver.train_acc_history, '-o', label='train')\n",
"plt.plot(solver.val_acc_history, '-o', label='val')\n",
"plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n",
"plt.xlabel('Epoch')\n",
"plt.legend(loc='lower right')\n",
"plt.gcf().set_size_inches(15, 12)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Теперь реализуйте полносвязную сеть с произвольным числом скрытых слоев. Ознакомьтесь с классом FullyConnectedNet в scripts/classifiers/fc_net.py . Реализуйте инициализацию, прямой и обратный проходы."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"for reg in [0, 3.14]:\n",
" print('Running check with reg = ', reg)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" reg=reg, weight_scale=5e-2, dtype=np.float64)\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
" \n",
" # Most of the errors should be on the order of e-7 or smaller. \n",
" # NOTE: It is fine however to see an error for W2 on the order of e-5\n",
" # for the check when reg = 0.0\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуйте добиться эффекта переобучения на небольшом наборе изображений (например, 50). Используйте трехслойную сеть со 100 нейронами на каждом скрытом слое. Попробуйте переобучить сеть, достигнув 100 % accuracy за 20 эпох. Для этого поэкспериментируйте с параметрами weight_scale и learning_rate. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Use a three-layer Net to overfit 50 training examples by \n",
"# tweaking just the learning rate and initialization scale.\n",
"\n",
"num_train = 50\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"weight_scale = 1e-2 # Experiment with this!\n",
"learning_rate = 1e-4 # Experiment with this!\n",
"model = FullyConnectedNet([100, 100],\n",
" weight_scale=weight_scale, dtype=np.float64)\n",
"solver = Solver(model, small_data,\n",
" print_every=10, num_epochs=20, batch_size=25,\n",
" update_rule='sgd',\n",
" optim_config={\n",
" 'learning_rate': learning_rate,\n",
" }\n",
" )\n",
"solver.train()\n",
"\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.title('Training loss history')\n",
"plt.xlabel('Iteration')\n",
"plt.ylabel('Training loss')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Повторите эксперимент, описанный выше, для пятислойной сети."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Use a five-layer Net to overfit 50 training examples by \n",
"# tweaking just the learning rate and initialization scale.\n",
"\n",
"num_train = 50\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"learning_rate = 2e-3 # Experiment with this!\n",
"weight_scale = 1e-5 # Experiment with this!\n",
"model = FullyConnectedNet([100, 100, 100, 100],\n",
" weight_scale=weight_scale, dtype=np.float64)\n",
"solver = Solver(model, small_data,\n",
" print_every=10, num_epochs=20, batch_size=25,\n",
" update_rule='sgd',\n",
" optim_config={\n",
" 'learning_rate': learning_rate,\n",
" }\n",
" )\n",
"solver.train()\n",
"\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.title('Training loss history')\n",
"plt.xlabel('Iteration')\n",
"plt.ylabel('Training loss')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сделайте выводы по проведенному эксперименту. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ранее обновление весов проходило по правилу SGD. Теперь попробуйте реализовать стохастический градиентный спуск с импульсом (SGD+momentum). http://cs231n.github.io/neural-networks-3/#sgd Реализуйте sgd_momentum в scripts/optim.py и запустите проверку. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.optim import sgd_momentum\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-3, 'velocity': v}\n",
"next_w, _ = sgd_momentum(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [ 0.1406, 0.20738947, 0.27417895, 0.34096842, 0.40775789],\n",
" [ 0.47454737, 0.54133684, 0.60812632, 0.67491579, 0.74170526],\n",
" [ 0.80849474, 0.87528421, 0.94207368, 1.00886316, 1.07565263],\n",
" [ 1.14244211, 1.20923158, 1.27602105, 1.34281053, 1.4096 ]])\n",
"expected_velocity = np.asarray([\n",
" [ 0.5406, 0.55475789, 0.56891579, 0.58307368, 0.59723158],\n",
" [ 0.61138947, 0.62554737, 0.63970526, 0.65386316, 0.66802105],\n",
" [ 0.68217895, 0.69633684, 0.71049474, 0.72465263, 0.73881053],\n",
" [ 0.75296842, 0.76712632, 0.78128421, 0.79544211, 0.8096 ]])\n",
"\n",
"# Should see relative errors around e-8 or less\n",
"print('next_w error: ', rel_error(next_w, expected_next_w))\n",
"print('velocity error: ', rel_error(expected_velocity, config['velocity']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сравните результаты обучения шестислойной сети, обученной классическим градиентным спуском и адаптивным алгоритмом с импульсом. Какой алгоритм сходится быстрее."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_train = 4000\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"solvers = {}\n",
"\n",
"for update_rule in ['sgd', 'sgd_momentum']:\n",
" print('running with ', update_rule)\n",
" model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",
"\n",
" solver = Solver(model, small_data,\n",
" num_epochs=5, batch_size=100,\n",
" update_rule=update_rule,\n",
" optim_config={\n",
" 'learning_rate': 5e-3,\n",
" },\n",
" verbose=True)\n",
" solvers[update_rule] = solver\n",
" solver.train()\n",
" print()\n",
"\n",
"plt.subplot(3, 1, 1)\n",
"plt.title('Training loss')\n",
"plt.xlabel('Iteration')\n",
"\n",
"plt.subplot(3, 1, 2)\n",
"plt.title('Training accuracy')\n",
"plt.xlabel('Epoch')\n",
"\n",
"plt.subplot(3, 1, 3)\n",
"plt.title('Validation accuracy')\n",
"plt.xlabel('Epoch')\n",
"\n",
"for update_rule, solver in solvers.items():\n",
" plt.subplot(3, 1, 1)\n",
" plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n",
" \n",
" plt.subplot(3, 1, 2)\n",
" plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n",
"\n",
" plt.subplot(3, 1, 3)\n",
" plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n",
" \n",
"for i in [1, 2, 3]:\n",
" plt.subplot(3, 1, i)\n",
" plt.legend(loc='upper center', ncol=4)\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте алгоритмы RMSProp [1] and Adam [2] с коррекцией смещения - методы rmsprop и adam . \n",
"\n",
"\n",
"[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n",
"\n",
"[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test RMSProp implementation\n",
"from scripts.optim import rmsprop\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-2, 'cache': cache}\n",
"next_w, _ = rmsprop(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n",
" [-0.132737, -0.08078555, -0.02881884, 0.02316247, 0.07515774],\n",
" [ 0.12716641, 0.17918792, 0.23122175, 0.28326742, 0.33532447],\n",
" [ 0.38739248, 0.43947102, 0.49155973, 0.54365823, 0.59576619]])\n",
"expected_cache = np.asarray([\n",
" [ 0.5976, 0.6126277, 0.6277108, 0.64284931, 0.65804321],\n",
" [ 0.67329252, 0.68859723, 0.70395734, 0.71937285, 0.73484377],\n",
" [ 0.75037008, 0.7659518, 0.78158892, 0.79728144, 0.81302936],\n",
" [ 0.82883269, 0.84469141, 0.86060554, 0.87657507, 0.8926 ]])\n",
"\n",
"# You should see relative errors around e-7 or less\n",
"print('next_w error: ', rel_error(expected_next_w, next_w))\n",
"print('cache error: ', rel_error(expected_cache, config['cache']))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test Adam implementation\n",
"from scripts.optim import adam\n",
"\n",
"N, D = 4, 5\n",
"w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
"dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
"m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
"v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n",
"\n",
"config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n",
"next_w, _ = adam(w, dw, config=config)\n",
"\n",
"expected_next_w = np.asarray([\n",
" [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n",
" [-0.1380274, -0.08544591, -0.03286534, 0.01971428, 0.0722929],\n",
" [ 0.1248705, 0.17744702, 0.23002243, 0.28259667, 0.33516969],\n",
" [ 0.38774145, 0.44031188, 0.49288093, 0.54544852, 0.59801459]])\n",
"expected_v = np.asarray([\n",
" [ 0.69966, 0.68908382, 0.67851319, 0.66794809, 0.65738853,],\n",
" [ 0.64683452, 0.63628604, 0.6257431, 0.61520571, 0.60467385,],\n",
" [ 0.59414753, 0.58362676, 0.57311152, 0.56260183, 0.55209767,],\n",
" [ 0.54159906, 0.53110598, 0.52061845, 0.51013645, 0.49966, ]])\n",
"expected_m = np.asarray([\n",
" [ 0.48, 0.49947368, 0.51894737, 0.53842105, 0.55789474],\n",
" [ 0.57736842, 0.59684211, 0.61631579, 0.63578947, 0.65526316],\n",
" [ 0.67473684, 0.69421053, 0.71368421, 0.73315789, 0.75263158],\n",
" [ 0.77210526, 0.79157895, 0.81105263, 0.83052632, 0.85 ]])\n",
"\n",
"# You should see relative errors around e-7 or less\n",
"print('next_w error: ', rel_error(expected_next_w, next_w))\n",
"print('v error: ', rel_error(expected_v, config['v']))\n",
"print('m error: ', rel_error(expected_m, config['m']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите пару глубоких сетей с испольованием RMSProp и Adam алгоритмов обновления весов и сравните результаты обучения."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Получите лучшую полносвязную сеть для классификации вашего набора данных. На наборе CIFAR-10 необходимо получить accuracy не ниже 50 % на валидационном наборе."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"best_model = None\n",
"################################################################################\n",
"# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might #\n",
"# find batch/layer normalization and dropout useful. Store your best model in #\n",
"# the best_model variable. #\n",
"################################################################################\n",
"# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"\n",
"pass\n",
"\n",
"# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
"################################################################################\n",
"# END OF YOUR CODE #\n",
"################################################################################"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Получите оценку accuracy для валидационной и тестовой выборок. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n",
"y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n",
"print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n",
"print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Нормализация по мини-батчам\n",
"\n",
"Идея нормализации по мини-батчам предложена в работе [1]\n",
"\n",
"[1] Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\", ICML 2015."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для слоя батч-нормализации - функция batchnorm_forward в scripts/layers.py . Проверьте свою реализацию, запустив следующий код:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check the training-time forward pass by checking means and variances\n",
"# of features both before and after batch normalization \n",
"\n",
"# Simulate the forward pass for a two-layer network\n",
"np.random.seed(231)\n",
"N, D1, D2, D3 = 200, 50, 60, 3\n",
"X = np.random.randn(N, D1)\n",
"W1 = np.random.randn(D1, D2)\n",
"W2 = np.random.randn(D2, D3)\n",
"a = np.maximum(0, X.dot(W1)).dot(W2)\n",
"\n",
"print('Before batch normalization:')\n",
"print_mean_std(a,axis=0)\n",
"\n",
"gamma = np.ones((D3,))\n",
"beta = np.zeros((D3,))\n",
"# Means should be close to zero and stds close to one\n",
"print('After batch normalization (gamma=1, beta=0)')\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
"print_mean_std(a_norm,axis=0)\n",
"\n",
"gamma = np.asarray([1.0, 2.0, 3.0])\n",
"beta = np.asarray([11.0, 12.0, 13.0])\n",
"# Now means should be close to beta and stds close to gamma\n",
"print('After batch normalization (gamma=', gamma, ', beta=', beta, ')')\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n",
"print_mean_std(a_norm,axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check the test-time forward pass by running the training-time\n",
"# forward pass many times to warm up the running averages, and then\n",
"# checking the means and variances of activations after a test-time\n",
"# forward pass.\n",
"\n",
"np.random.seed(231)\n",
"N, D1, D2, D3 = 200, 50, 60, 3\n",
"W1 = np.random.randn(D1, D2)\n",
"W2 = np.random.randn(D2, D3)\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"gamma = np.ones(D3)\n",
"beta = np.zeros(D3)\n",
"\n",
"for t in range(50):\n",
" X = np.random.randn(N, D1)\n",
" a = np.maximum(0, X.dot(W1)).dot(W2)\n",
" batchnorm_forward(a, gamma, beta, bn_param)\n",
"\n",
"bn_param['mode'] = 'test'\n",
"X = np.random.randn(N, D1)\n",
"a = np.maximum(0, X.dot(W1)).dot(W2)\n",
"a_norm, _ = batchnorm_forward(a, gamma, beta, bn_param)\n",
"\n",
"# Means should be close to zero and stds close to one, but will be\n",
"# noisier than training-time forward passes.\n",
"print('After batch normalization (test-time):')\n",
"print_mean_std(a_norm,axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход в функции batchnorm_backward."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Gradient check batchnorm backward pass\n",
"np.random.seed(231)\n",
"N, D = 4, 5\n",
"x = 5 * np.random.randn(N, D) + 12\n",
"gamma = np.random.randn(D)\n",
"beta = np.random.randn(D)\n",
"dout = np.random.randn(N, D)\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"fx = lambda x: batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
"fg = lambda a: batchnorm_forward(x, a, beta, bn_param)[0]\n",
"fb = lambda b: batchnorm_forward(x, gamma, b, bn_param)[0]\n",
"\n",
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n",
"da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n",
"db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n",
"\n",
"_, cache = batchnorm_forward(x, gamma, beta, bn_param)\n",
"dx, dgamma, dbeta = batchnorm_backward(dout, cache)\n",
"#You should expect to see relative errors between 1e-13 and 1e-8\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dgamma error: ', rel_error(da_num, dgamma))\n",
"print('dbeta error: ', rel_error(db_num, dbeta))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Измените реализацию класса FullyConnectedNet, добавив батч-нормализацию. \n",
"Если флаг normalization == \"batchnorm\", то вам необходимо вставить слой батч-нормализации перед каждым слоем активации ReLU, кроме выхода сети. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"# You should expect losses between 1e-4~1e-10 for W, \n",
"# losses between 1e-08~1e-10 for b,\n",
"# and losses between 1e-08~1e-09 for beta and gammas.\n",
"for reg in [0, 3.14]:\n",
" print('Running check with reg = ', reg)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" reg=reg, weight_scale=5e-2, dtype=np.float64,\n",
" normalization='batchnorm')\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
"\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
" if reg == 0: print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите 6-ти слойную сеть на наборе из 1000 изображений с батч-нормализацией и без нее"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"# Try training a very deep net with batchnorm\n",
"hidden_dims = [100, 100, 100, 100, 100]\n",
"\n",
"num_train = 1000\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"weight_scale = 2e-2\n",
"bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n",
"model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
"\n",
"print('Solver with batch norm:')\n",
"bn_solver = Solver(bn_model, small_data,\n",
" num_epochs=10, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True,print_every=20)\n",
"bn_solver.train()\n",
"\n",
"print('\\nSolver without batch norm:')\n",
"solver = Solver(model, small_data,\n",
" num_epochs=10, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=20)\n",
"solver.train()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Визуализируйте процесс обучения для двух сетей. Увеличилась ли скорость сходимости в случае с батч-нормализацией? Сделайте выводы. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_training_history(title, label, baseline, bn_solvers, plot_fn, bl_marker='.', bn_marker='.', labels=None):\n",
" \"\"\"utility function for plotting training history\"\"\"\n",
" plt.title(title)\n",
" plt.xlabel(label)\n",
" bn_plots = [plot_fn(bn_solver) for bn_solver in bn_solvers]\n",
" bl_plot = plot_fn(baseline)\n",
" num_bn = len(bn_plots)\n",
" for i in range(num_bn):\n",
" label='with_norm'\n",
" if labels is not None:\n",
" label += str(labels[i])\n",
" plt.plot(bn_plots[i], bn_marker, label=label)\n",
" label='baseline'\n",
" if labels is not None:\n",
" label += str(labels[0])\n",
" plt.plot(bl_plot, bl_marker, label=label)\n",
" plt.legend(loc='lower center', ncol=num_bn+1) \n",
"\n",
" \n",
"plt.subplot(3, 1, 1)\n",
"plot_training_history('Training loss','Iteration', solver, [bn_solver], \\\n",
" lambda x: x.loss_history, bl_marker='o', bn_marker='o')\n",
"plt.subplot(3, 1, 2)\n",
"plot_training_history('Training accuracy','Epoch', solver, [bn_solver], \\\n",
" lambda x: x.train_acc_history, bl_marker='-o', bn_marker='-o')\n",
"plt.subplot(3, 1, 3)\n",
"plot_training_history('Validation accuracy','Epoch', solver, [bn_solver], \\\n",
" lambda x: x.val_acc_history, bl_marker='-o', bn_marker='-o')\n",
"\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите 6-тислойную сеть с батч-нормализацией и без нее, используя разные размеры батча. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def run_batchsize_experiments(normalization_mode):\n",
" np.random.seed(231)\n",
" # Try training a very deep net with batchnorm\n",
" hidden_dims = [100, 100, 100, 100, 100]\n",
" num_train = 1000\n",
" small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
" }\n",
" n_epochs=10\n",
" weight_scale = 2e-2\n",
" batch_sizes = [5,10,50]\n",
" lr = 10**(-3.5)\n",
" solver_bsize = batch_sizes[0]\n",
"\n",
" print('No normalization: batch size = ',solver_bsize)\n",
" model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n",
" solver = Solver(model, small_data,\n",
" num_epochs=n_epochs, batch_size=solver_bsize,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': lr,\n",
" },\n",
" verbose=False)\n",
" solver.train()\n",
" \n",
" bn_solvers = []\n",
" for i in range(len(batch_sizes)):\n",
" b_size=batch_sizes[i]\n",
" print('Normalization: batch size = ',b_size)\n",
" bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=normalization_mode)\n",
" bn_solver = Solver(bn_model, small_data,\n",
" num_epochs=n_epochs, batch_size=b_size,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': lr,\n",
" },\n",
" verbose=False)\n",
" bn_solver.train()\n",
" bn_solvers.append(bn_solver)\n",
" \n",
" return bn_solvers, solver, batch_sizes\n",
"\n",
"batch_sizes = [5,10,50]\n",
"bn_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('batchnorm')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plot_training_history('Training accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
" lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
"plt.subplot(2, 1, 2)\n",
"plot_training_history('Validation accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n",
" lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n",
"\n",
"plt.gcf().set_size_inches(15, 10)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dropout"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для dropout-слоя в scripts/layers.py\n",
"\n",
"http://cs231n.github.io/neural-networks-2/#reg"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(500, 500) + 10\n",
"\n",
"for p in [0.25, 0.4, 0.7]:\n",
" out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n",
" out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n",
"\n",
" print('Running tests with p = ', p)\n",
" print('Mean of input: ', x.mean())\n",
" print('Mean of train-time output: ', out.mean())\n",
" print('Mean of test-time output: ', out_test.mean())\n",
" print('Fraction of train-time output set to zero: ', (out == 0).mean())\n",
" print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для dropout-слоя"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(10, 10) + 10\n",
"dout = np.random.randn(*x.shape)\n",
"\n",
"dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n",
"out, cache = dropout_forward(x, dropout_param)\n",
"dx = dropout_backward(dout, cache)\n",
"dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n",
"\n",
"# Error should be around e-10 or less\n",
"print('dx relative error: ', rel_error(dx, dx_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Добавьте в реализацию класса FullyConnectedNet поддержку dropout. Если параметр dropout != 1, то добавьте в модель dropout-слой после каждого слоя активации. Проверьте свою реализацию"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
"X = np.random.randn(N, D)\n",
"y = np.random.randint(C, size=(N,))\n",
"\n",
"for dropout in [1, 0.75, 0.5]:\n",
" print('Running check with dropout = ', dropout)\n",
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
" weight_scale=5e-2, dtype=np.float64,\n",
" dropout=dropout, seed=123)\n",
"\n",
" loss, grads = model.loss(X, y)\n",
" print('Initial loss: ', loss)\n",
" \n",
" # Relative errors should be around e-6 or less; Note that it's fine\n",
" # if for dropout=1 you have W2 error be on the order of e-5.\n",
" for name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите две двухслойные сети с dropout-слоем (вероятность отсева 0,25) и без на наборе из 500 изображений. Визуализируйте графики обучения. Сделайте выводы по результатам эксперимента"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train two identical nets, one with dropout and one without\n",
"np.random.seed(231)\n",
"num_train = 500\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"solvers = {}\n",
"dropout_choices = [1, 0.25]\n",
"for dropout in dropout_choices:\n",
" model = FullyConnectedNet([500], dropout=dropout)\n",
" print(dropout)\n",
"\n",
" solver = Solver(model, small_data,\n",
" num_epochs=25, batch_size=100,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 5e-4,\n",
" },\n",
" verbose=True, print_every=100)\n",
" solver.train()\n",
" solvers[dropout] = solver\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plot train and validation accuracies of the two models\n",
"\n",
"train_accs = []\n",
"val_accs = []\n",
"for dropout in dropout_choices:\n",
" solver = solvers[dropout]\n",
" train_accs.append(solver.train_acc_history[-1])\n",
" val_accs.append(solver.val_acc_history[-1])\n",
"\n",
"plt.subplot(3, 1, 1)\n",
"for dropout in dropout_choices:\n",
" plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n",
"plt.title('Train accuracy')\n",
"plt.xlabel('Epoch')\n",
"plt.ylabel('Accuracy')\n",
"plt.legend(ncol=2, loc='lower right')\n",
" \n",
"plt.subplot(3, 1, 2)\n",
"for dropout in dropout_choices:\n",
" plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n",
"plt.title('Val accuracy')\n",
"plt.xlabel('Epoch')\n",
"plt.ylabel('Accuracy')\n",
"plt.legend(ncol=2, loc='lower right')\n",
"\n",
"plt.gcf().set_size_inches(15, 15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Сверточные нейронные сети (CNN)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для сверточного слоя - функция conv_forward_naive в scripts/layers.py юПроверьте свою реализацию, запустив код ниже "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"w_shape = (3, 3, 4, 4)\n",
"x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n",
"w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n",
"b = np.linspace(-0.1, 0.2, num=3)\n",
"\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"out, _ = conv_forward_naive(x, w, b, conv_param)\n",
"correct_out = np.array([[[[-0.08759809, -0.10987781],\n",
" [-0.18387192, -0.2109216 ]],\n",
" [[ 0.21027089, 0.21661097],\n",
" [ 0.22847626, 0.23004637]],\n",
" [[ 0.50813986, 0.54309974],\n",
" [ 0.64082444, 0.67101435]]],\n",
" [[[-0.98053589, -1.03143541],\n",
" [-1.19128892, -1.24695841]],\n",
" [[ 0.69108355, 0.66880383],\n",
" [ 0.59480972, 0.56776003]],\n",
" [[ 2.36270298, 2.36904306],\n",
" [ 2.38090835, 2.38247847]]]])\n",
"\n",
"# Compare your output to ours; difference should be around e-8\n",
"print('Testing conv_forward_naive')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход - функция conv_backward_naive в scripts/layers.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(4, 3, 5, 5)\n",
"w = np.random.randn(2, 3, 3, 3)\n",
"b = np.random.randn(2,)\n",
"dout = np.random.randn(4, 2, 5, 5)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"out, cache = conv_forward_naive(x, w, b, conv_param)\n",
"dx, dw, db = conv_backward_naive(dout, cache)\n",
"\n",
"# Your errors should be around e-8 or less.\n",
"print('Testing conv_backward_naive function')\n",
"print('dx error: ', rel_error(dx, dx_num))\n",
"print('dw error: ', rel_error(dw, dw_num))\n",
"print('db error: ', rel_error(db, db_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте прямой проход для max-pooling слоя -функция max_pool_forward_naive в scripts/layers.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n",
"pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n",
"\n",
"out, _ = max_pool_forward_naive(x, pool_param)\n",
"\n",
"correct_out = np.array([[[[-0.26315789, -0.24842105],\n",
" [-0.20421053, -0.18947368]],\n",
" [[-0.14526316, -0.13052632],\n",
" [-0.08631579, -0.07157895]],\n",
" [[-0.02736842, -0.01263158],\n",
" [ 0.03157895, 0.04631579]]],\n",
" [[[ 0.09052632, 0.10526316],\n",
" [ 0.14947368, 0.16421053]],\n",
" [[ 0.20842105, 0.22315789],\n",
" [ 0.26736842, 0.28210526]],\n",
" [[ 0.32631579, 0.34105263],\n",
" [ 0.38526316, 0.4 ]]]])\n",
"\n",
"# Compare your output with ours. Difference should be on the order of e-8.\n",
"print('Testing max_pool_forward_naive function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте обратный проход для max-pooling слоя в max_pool_backward_naive . "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(3, 2, 8, 8)\n",
"dout = np.random.randn(3, 2, 4, 4)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n",
"\n",
"out, cache = max_pool_forward_naive(x, pool_param)\n",
"dx = max_pool_backward_naive(dout, cache)\n",
"\n",
"# Your error should be on the order of e-12\n",
"print('Testing max_pool_backward_naive function:')\n",
"print('dx error: ', rel_error(dx, dx_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В скрипте scripts/fast_layers.py представлены быстрые реализации слоев свертки и пуллинга, написанных с использованием Cython. \n",
"\n",
"Для компиляции выполните следующую команду в директории scripts\n",
"\n",
"```bash\n",
"python setup.py build_ext --inplace\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сравните ваши реализации слоев свертки и пуллинга с быстрыми реализациями."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Rel errors should be around e-9 or less\n",
"from scripts.fast_layers import conv_forward_fast, conv_backward_fast\n",
"from time import time\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 31, 31)\n",
"w = np.random.randn(25, 3, 3, 3)\n",
"b = np.random.randn(25,)\n",
"dout = np.random.randn(100, 25, 16, 16)\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n",
"t2 = time()\n",
"\n",
"print('Testing conv_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('Difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting conv_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))\n",
"print('dw difference: ', rel_error(dw_naive, dw_fast))\n",
"print('db difference: ', rel_error(db_naive, db_fast))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Relative errors should be close to 0.0\n",
"from scripts.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 32, 32)\n",
"dout = np.random.randn(100, 3, 16, 16)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n",
"t2 = time()\n",
"\n",
"print('Testing pool_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive = max_pool_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast = max_pool_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting pool_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В layer_utils.py вы можете найти часто используемые комбинации слоев, используемых в сверточных сетях. Ознакомьтесь с ними и запустите код ниже для проверки их работы"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 16, 16)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n",
"dx, dw, db = conv_relu_pool_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu_pool')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.layer_utils import conv_relu_forward, conv_relu_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 8, 8)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"out, cache = conv_relu_forward(x, w, b, conv_param)\n",
"dx, dw, db = conv_relu_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Напишите реализацию класса ThreeLayerConvNet в scripts/classifiers/cnn.py . Вы можете использовать готовые реализации слоев и их комбинаций."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверьте вашу реализацию. Ожидается, что значение функции потерь softmax будет порядка `log(C)` для `C` классов для случая без регуляризации. В случае регуляризации значение функции потерь должно немного возрасти. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet()\n",
"\n",
"N = 50\n",
"X = np.random.randn(N, 3, 32, 32)\n",
"y = np.random.randint(10, size=N)\n",
"\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (no regularization): ', loss)\n",
"\n",
"model.reg = 0.5\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (with regularization): ', loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверьте реализацию обратного прохода"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_inputs = 2\n",
"input_dim = (3, 16, 16)\n",
"reg = 0.0\n",
"num_classes = 10\n",
"np.random.seed(231)\n",
"X = np.random.randn(num_inputs, *input_dim)\n",
"y = np.random.randint(num_classes, size=num_inputs)\n",
"\n",
"model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n",
" input_dim=input_dim, hidden_dim=7,\n",
" dtype=np.float64)\n",
"loss, grads = model.loss(X, y)\n",
"# Errors should be small, but correct implementations may have\n",
"# relative errors up to the order of e-2\n",
"for param_name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n",
" e = rel_error(param_grad_num, grads[param_name])\n",
" print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуйте добиться эффекта переобучения. Обучите модель на небольшом наборе данных.Сравните значения accuracy на обучающих данных и на валидационных. Визуализируйте графики обучения "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"\n",
"num_train = 100\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"model = ThreeLayerConvNet(weight_scale=1e-2)\n",
"\n",
"solver = Solver(model, small_data,\n",
" num_epochs=15, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=1)\n",
"solver.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final training accuracy\n",
"print(\n",
" \"Small data training accuracy:\",\n",
" solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final validation accuracy\n",
"print(\n",
" \"Small data validation accuracy:\",\n",
" solver.check_accuracy(small_data['X_val'], small_data['y_val'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.xlabel('iteration')\n",
"plt.ylabel('loss')\n",
"\n",
"plt.subplot(2, 1, 2)\n",
"plt.plot(solver.train_acc_history, '-o')\n",
"plt.plot(solver.val_acc_history, '-o')\n",
"plt.legend(['train', 'val'], loc='upper left')\n",
"plt.xlabel('epoch')\n",
"plt.ylabel('accuracy')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите сеть на полном наборе данных. Выведите accuracy на обучающей и валидационной выборках"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n",
"\n",
"solver = Solver(model, data,\n",
" num_epochs=1, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=20)\n",
"solver.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final training accuracy\n",
"print(\n",
" \"Full data training accuracy:\",\n",
" solver.check_accuracy(small_data['X_train'], small_data['y_train'])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print final validation accuracy\n",
"print(\n",
" \"Full data validation accuracy:\",\n",
" solver.check_accuracy(data['X_val'], data['y_val'])\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Визуализируйте фильтры на первом слое обученной сети"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scripts.vis_utils import visualize_grid\n",
"\n",
"grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n",
"plt.imshow(grid.astype('uint8'))\n",
"plt.axis('off')\n",
"plt.gcf().set_size_inches(5, 5)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}