linear_regression/polynomial.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

#+TITLE: Gradient Decent Based Polynomial Regression
#+AUTHOR: Loic Guegan

#+OPTIONS: toc:nil

#+LATEX_HEADER: \usepackage{fullpage}
#+latex_header: \hypersetup{colorlinks=true,linkcolor=blue}

First, choose a polynomial function $h_w(x)$ according to the data complexity. 
In our case, we have: 
\begin{equation}
h_w(x) = w_1 + w_2x + w_3x^2
\end{equation}

Then, we should define a cost function. A common approach is to use the *Mean Square Error*
cost function:
\begin{equation}\label{eq:cost}
    J(w) = \frac{1}{2n} \sum_{i=0}^n (h_w(x^{(i)}) - \hat{y}^{(i)})^2
\end{equation}

Note that in Equation \ref{eq:cost} we average by $2n$ and not $n$. This is because it get simplify
while doing the partial derivatives as we will see below. This is a pure cosmetic approach which do
not impact the gradient decent (see [[https://math.stackexchange.com/questions/884887/why-divide-by-2m][here]] for more informations). The next step is to $min_w J(w)$
for each weight $w_i$ (performing the gradient decent). Thus we compute each partial derivatives:
\begin{align}
    \frac{\partial J(w)}{\partial w_1}&=\frac{\partial J(w)}{\partial h_w(x)}\frac{\partial h_w(x)}{\partial w_1}\nonumber\\
    &= \frac{1}{n} \sum_{i=0}^n (h_w(x^{(i)}) - \hat{y}^{(i)})\\
    \text{similarly:}\nonumber\\
    \frac{\partial J(w)}{\partial w_2}&= \frac{1}{n} \sum_{i=0}^n x(h_w(x^{(i)}) - \hat{y}^{(i)})\\
    \frac{\partial J(w)}{\partial w_3}&= \frac{1}{n} \sum_{i=0}^n x^2(h_w(x^{(i)}) - \hat{y}^{(i)})
\end{align}