## Week 12: Analyzing Regression

This final week of classes was dedicated to studying the various statistical benchmarks that will allow us to make conclusions about how good a regression model fits two random variables $$X$$ and $$Y$$. To this end, we recall that in the regression model, we are given data $$(x_i,y_i)$$ and we assume that the labels (or 'responses') $$y_i$$ are drawn from random variables that satisfy $Y_i=\beta_0+\beta_1x_i+\epsilon_1$ where $$\beta_0,\beta_1$$ is any choice of paramaters $$\in \mathbb{R}^2$$ and $$\epsilon\sim N(0,\sigma^2)$$. Viewing $$\beta_0,$$ and $$\beta_1$$ as the parameters of a statistical model led us to the computation of the associated MLE ($$\hat{\beta}_0,\hat{\beta}_1)$$ given by $\hat{\beta}_1=\sum w_i y_i, \text{ where } w_i=\frac{x_i-\overline{x}}{\sum_j (x_j-\overline{x})}\text{ and }\hat{\beta_0}=\overline{y}-\hat{\beta}_0\overline{x}$ This in turn leads to predictions for the labels $$\hat{y}_i=\hat{\beta_0}+\hat{\beta}_1x_i$$.
As a first measure of fit, we can consider the sum of squares of the residuals:$SSE=\sum \hat{\epsilon}^2_i=\sum (y_i-\hat{y}_i)^2$ We will also be interested in the variability (up to a constant factor) of the predicted values $$\hat{y}_i$$and the response values (or labels) $$y_i$$. These are called the regression sum of squares and total sum of squares respectively: $SSR=\sum_i (\hat{y}_i-\overline{\hat{y}})$ $SST = \sum_i (y_i-\overline{y})$ The SSR in particular can be simplified a little, but before we do that, we prove the following lemma:
We have
• $$\sum_i \hat{\epsilon_i}=0$$
• $$\sum_i \hat{y_i}\hat{\epsilon_i}=0$$
• $$\overline{y}=\overline{\hat{y}}$$
For the first claim , we note that $\sum_i \hat{\epsilon}_i = \sum_i (y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)= n(\overline{y}-\hat{\beta}_0-\hat{\beta}_1\overline{x})=0$ by definition of $$\hat{\beta}_0$$. For the second claim we first plan to show that $$\sum_i x_i\hat{\epsilon}_i=0$$.
This turns out to be equivalent to $$\frac{\partial}{\beta_1} L\bigg((x_1,\ldots x_n)\vert (\beta_0,\beta_1)\bigg)=0$$, which we know to be true from last week. Concluding that $$\sum_i \hat{y}_i\epsilon_1=0$$ is rather straightforward now since $(\hat{\beta}_0+\hat{\beta}_1x_i)\hat{\epsilon}_i=\beta_0\sum_i\hat{\epsilon}_i+\hat{\beta}_1\sum_ix_i\epsilon_i=0+0=0$ The last claim follows from the first by dividing by $$n$$
We have $SSR=\sum_i (\hat{y}_i-\overline{y})$
We have $SST=SSE+SSR$
Clearly $SST=\sum (y_i-\hat{y}_i+\hat{y}_i-\overline{y})^2=\sum_i (y_i-\hat{y}_i)^2+2\sum(y_i-\hat{y}_i)(\hat{y}_i-\overline{y})+\sum(\hat{y}_i-\overline{y})^2$ The first term in this sum is SSE and the last is SSR by the lemma. Finally, the middel term is zero as it is equal to $2\sum \epsilon_i(\hat{y}_i-\overline{y})=0-0$ by the lemma again.
This yields yet another important measure for the efficiency of a linear regression model:
The coefficient of determination is the value $R^2=\frac{SSR}{SST}$
The $$R^2$$-coefficient has the following properties:
• $$0\le R^2\le 1$$
• $$R^2=1-\frac{SSE}{SSR}$$
• $$R^2=1$$ iff the predictions coincide with the responses
• $$R^2=0$$ iff $$\hat{\beta_1}=0$$
These are all very easily verified
We now wish to investigate the properties of these quantities as estimators. I.e. after choice of parameter $$(\beta_0,\beta_1)$$ we consider the random variables $Y_i=\beta_0+\beta_1 x_i +\epsilon_i \text{ where } N(0,\sigma^2)$ And simply replacing the $$y_i$$'s with $$Y_i$$ now turns SSE,SSR, SST and $$R^2$$ into random variables. Unfortunately, it is not tue that SSE forms an un biased estimator for $$\sigma^2$$ (compare with the bias of the sample variance estimator). However, we do have:
Let $$s^2=\frac{1}{n-2}SSE$$. Then $$\mathbb{E}[s^2]=\sigma^2$$, ie $$s^2$$ is an unbiased estimator for the variance
The factor $$\frac{1}{n-2}$$ has an intuitive interpretation (thei requires a little B24 knowledge): if we wish to consider the subspace of residuals $$(\hat{\epsilon}_i)_1^n=(\hat{y}_i-y_i)_1^n$$ in $$\mathbb{R}^n$$ for any choice of labels $$y_i \in \mathbb{R}^n$$. By the lemma above, this subspace satisfies tow linear equations $$\sum \epsilon_i=0$$ and $$\sum x_i \epsilon_i=0$$. This space is thus $$n-2$$ dimensional. In general, to make an estimator unbiased, one should divide by the 'dimension of the subspace it spans' also referred to as the degrees of freedom.