{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn import datasets\n", "import numpy as np\n", "import math\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def load_boston(ratio=0.8):\n", " X, Y = datasets.load_boston(True)\n", " Y.shape = -1, 1\n", " \n", " # normalization\n", " X = X/80\n", " Y = Y/(np.max(Y) - np.min(Y))\n", " \n", " num_samples = len(Y)\n", " num_train = math.ceil(num_samples * ratio)\n", " \n", " # 随机打乱数据\n", " idx = np.random.permutation(np.arange(num_samples))\n", " traindata = X[idx[:num_train]], Y[idx[:num_train]]\n", " validdata = X[idx[num_train:]], Y[idx[num_train:]]\n", " \n", " return traindata, validdata" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "(X_train, Y_train), (X_valid, Y_valid) = load_boston()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MLP Quiz\n", "\n", "## 内容\n", "\n", "利用$\\hat{Y} = f_2\\circ\\phi\\circ f_1 (X)$和梯度下降法来拟合boston数据集,即求解最优化问题:$min_{W, b} L(\\hat{Y}, Y)$\n", "\n", "其中:\n", "\n", "* $Z_0 = X$, 其中 $X \\in \\mathbb{R}^{N\\times n_{in}}$\n", "* $Z_1 = f_1(Z_0) := Z_0W_1^T + b_1$, 其中 $W_1 \\in \\mathbb{R}^{n_{mid}\\times n_{in}}, b_1 \\in \\mathbb{R}^{n_{mid}}$\n", "* $Z_2 = \\phi_2(Z_1) := \\frac{1}{1+e^{-Z_1}}$, 其中指数运算为逐元素运算,即$e^{X}_i := e^{X_i}$\n", "* $Z_3 = f_2(Z_2) := Z_2W_2^T + b2$, 其中 $W_2 \\in \\mathbb{R}^{n_{out}\\times n_{mid}}, b_2 \\in \\mathbb{R}^{n_{out}}$\n", "* $\\hat{Y} = Z_3$\n", "* $L(\\hat{Y}, Y) := \\frac{1}{2} \\sum_{i=1}^{N} (\\hat{Y_i} - Y_i)^2$\n", "\n", "关于boston数据集:$n_{in}=13, n_{out}=1$,为了降低计算量,设定$n_{mid} = 30$\n", "\n", "## 评分\n", "\n", "1. (4分)给出$\\frac{\\partial L}{\\partial W_1}, \\frac{\\partial L}{\\partial b_1}, \\frac{\\partial L}{\\partial W_2}, \\frac{\\partial L}{\\partial b_2}$的计算表达式,并注明其中每一个矩阵的尺寸(纸质或pdf)\n", "2. (4分)补充完整下述代码\n", "3. (2分)性能:服务器空载情况下运行一次完整的训练时间低于10s (Baseline为3.5s)\n", "\n", "## 提交\n", "\n", "提交到`ftp://ftp.lflab.cn/AI_homework/Graduate/quiz/`下\n", "\n", "## 参考\n", "\n", "* 矩阵关于标量的导数:$(\\frac{\\partial{Y}}{\\partial{X}})_{ij} := \\frac{\\partial{Y_ij}}{\\partial{X}}$, 其中 $Y \\in \\mathbb{R}^{m\\times n}, X \\in \\mathbb{R}$\n", "* 向量关于向量的导数:$(\\frac{\\partial{Y}}{\\partial{X}})_{ij} := \\frac{\\partial{Y_i}}{\\partial{X_j}}$, 其中 $Y \\in \\mathbb{R}^{m\\times 1}, X \\in \\mathbb{R}^{n\\times 1}$\n", "* 标量关于矩阵的导数:$(\\frac{\\partial{Y}}{\\partial{X}})_{ij} := \\frac{\\partial{Y}}{\\partial{X_ij}}$, 其中 $Y \\in \\mathbb{R}, X \\in \\mathbb{R}^{m\\times n}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 实现" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$f_1, f_2$称为线性层" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "class Linear():\n", " def __init__(self, in_features: int, out_features: int):\n", " raise NotImplementedError(\"实现它\")\n", " \n", " def __call__(self, X):\n", " return self.forward(X)\n", " \n", " def forward(self, X):\n", " raise NotImplementedError(\"实现它\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\phi$称为激活函数(非线性层)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "class Sigmoid():\n", " \"\"\"phi\"\"\"\n", " def __call__(self, X):\n", " return self.forward(X)\n", " \n", " def forward(self, X):\n", " raise NotImplementedError(\"实现它\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "class MLP:\n", " def __init__(self, in_features: int, mid_features: int, out_features: int):\n", " self.f1 = Linear(in_features, mid_features)\n", " self.phi = Sigmoid()\n", " self.f2 = Linear(mid_features, out_features)\n", " \n", " def __call__(self, X):\n", " return self.f2(self.phi(self.f1(X)))\n", " \n", " def forward(self, X):\n", " Z0 = X\n", " Z1 = self.f1(X)\n", " Z2 = self.phi(Z1)\n", " Z3 = self.f2(Z2)\n", " return [Z0, Z1, Z2, Z3]\n", " \n", " def grad(self, Y, Z): # 3分\n", " Z0, Z1, Z2, Z3 = Z[0], Z[1], Z[2], Z[3]\n", "\n", " raise NotImplementedError(\"实现它\")\n", "\n", " return dLdW1, dLdb1, dLdW2, dLdb2" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def loss(Y_real, Y_pred):\n", " return 0.5 * np.sum((Y_real - Y_pred)**2)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "class GradientDescent:\n", " def __init__(self, step=1e-3):\n", " self.step = step\n", " \n", " def update(self, model:MLP, dLdW1, dLdb1, dLdW2, dLdb2):\n", " \"\"\"利用梯度dW来更新f的权重\"\"\"\n", " raise NotImplementedError(\"实现它\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter 0: loss 54.8086, valid loss 12.8588\n", "Iter 100: loss 8.5929, valid loss 2.0355\n", "Iter 200: loss 8.5417, valid loss 2.0210\n", "Iter 300: loss 8.5267, valid loss 2.0173\n", "Iter 400: loss 8.5118, valid loss 2.0136\n", "Iter 500: loss 8.4970, valid loss 2.0100\n", "Iter 600: loss 8.4822, valid loss 2.0063\n", "Iter 700: loss 8.4674, valid loss 2.0027\n", "Iter 800: loss 8.4527, valid loss 1.9991\n", "Iter 900: loss 8.4380, valid loss 1.9955\n", "CPU times: user 56.5 s, sys: 1min 21s, total: 2min 17s\n", "Wall time: 3.01 s\n" ] } ], "source": [ "%%time\n", "num_features = X_train.shape[-1]\n", "model = MLP(num_features, 30, 1)\n", "opt = GradientDescent(1e-6)\n", "\n", "valid_losses = []\n", "train_losses = []\n", "for i in range(1000):\n", " X, Y = X_train, Y_train\n", " \n", " # 1分\n", " # 1. 计算梯度\n", " # 2. 更新权重\n", " raise NotImplementedError(\"实现它\")\n", "\n", " # 3. 存储中间状态\n", " Y_out = None # FIXME\n", " cur_valid_loss = loss(Y_valid, model(X_valid))\n", " cur_train_loss = loss(Y, Y_out)\n", " valid_losses.append(cur_valid_loss) \n", " train_losses.append(cur_train_loss)\n", " \n", " if i%100 == 0:\n", " print(f\"Iter {i}: loss {cur_train_loss:.4f}, valid loss {cur_valid_loss:.4f}\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(train_losses)\n", "plt.plot(valid_losses)\n", "plt.legend([\"train loss\", \"validation loss\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "AI-Course", "language": "python", "name": "ai-course" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }