向量微积分基础

机器学习里经常需要用到向量微积分。向量微积分其实并不难,但大学数学一般不提,导致在看机器学习的一些推导时常常感觉疑惑。

机器学习里经常用到标量和向量、向量和向量的求导,其实只是把向量对应位置的元素进行求导。但是,这些元素的组织方式有两种,分别是分子布局和分母布局,二者并无本质上的差别,只是结果相差个转置。这两种布局都存在,初学者常常混淆。

例如求$ \frac {\partial \mathbf{y}} {\partial x} $,其中$ \mathbf{y} $是$n$维列向量,$x$是标量。这个求导就是把$\mathbf{y}$里每个元素分别对$x$求导,但求导后是得到列向量还是行向量呢?

对于分子布局:

$$ \frac {\partial \mathbf{y}} {\partial x} = \begin{bmatrix} \frac {\partial y_1} {\partial x} \\ \frac {\partial y_2} {\partial x} \\ \vdots \\ \frac {\partial y_n} {\partial x} \\ \end{bmatrix} $$

对于分母布局:

$$ \frac {\partial \mathbf{y}} {\partial x} = \begin{bmatrix} \frac {\partial y_1} {\partial x} & \frac {\partial y_2} {\partial x} & \dots & \frac {\partial y_n} {\partial x} \\ \end{bmatrix} $$

两种布局容易混淆,建议选择自己习惯的布局即可。这里我们选择分子布局进行后面的说明。

符号约定:小写粗体:值为向量;大写粗体:值为矩阵;小写斜体:值为标量。以a、b、c、d表示和x无关的函数,u=u(x),v=v(x),f、g、h是函数。

$$ \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \dots & \frac{\partial y}{\partial x_n} \\ \end{bmatrix} $$

$$ \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix} $$

这个矩阵又叫雅可比(Jacobi)矩阵

$$ \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix} $$

虽然看着挺复杂,但不难看出:分子布局的特点是,分子的编号排列和分子相同,分母的编号排列和分母的转置相同。

一些求导公式比较常用,在此列举一下:

$$ \frac {\partial {\mathbf{Ax}}} {\partial \mathbf{x}} = \mathbf{A} $$

$$ \frac {\partial \mathbf{x}^\top \mathbf{X}} {\partial \mathbf{x}} = \mathbf{A}^\top $$

$$ \frac {\partial \mathbf{x}^\top \mathbf{x}} {\partial \mathbf{x}} = 2 \mathbf{x}^\top $$

$$ \frac {\partial \mathbf{x}^\top \mathbf{A} \mathbf{x}} {\partial \mathbf{x}} = \mathbf{x}^\top(\mathbf{A} + \mathbf{A}^\top) $$

若$\mathbf{A}$为对称阵,则对于上式:

$$ \begin{split} \frac {\partial \mathbf{x}^\top \mathbf{A} \mathbf{x}} {\partial \mathbf{x}} &= \mathbf{x}^\top(\mathbf{A} + \mathbf{A}^\top) \\ &= 2 \mathbf{x}^\top\mathbf{A} \end{split} $$

和、积的导数:

$$ \frac {\partial (\mathbf{u} + \mathbf{v})} {\partial \mathbf{x}} = \frac {\partial \mathbf{u}} {\partial \mathbf{x}} + \frac {\partial \mathbf{v}} {\partial \mathbf{x}} $$

$$ {\frac {\partial ({\mathbf {u}}\cdot {\mathbf {v}})}{\partial {\mathbf {x}}}}={\frac {\partial {\mathbf {u}}^{\top }{\mathbf {v}}}{\partial {\mathbf {x}}}}= {\mathbf {u}}^{\top }{\frac {\partial {\mathbf {v}}}{\partial {\mathbf {x}}}}+{\mathbf {v}}^{\top }{\frac {\partial {\mathbf {u}}}{\partial {\mathbf {x}}}} $$

链式求导:

$$ \frac{\partial \mathbf{f(u)}}{\partial \mathbf{x}} = \frac{\partial \mathbf{f(u)}}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}} $$

更多详细内容可以参考:Matrix calculus - Wikipedia

AI

上一篇 线性最小二乘法推导
下一篇 利用油猴脚本显示扇贝网真实打卡日记

添加新评论

*
*