Week 5

In this week of classes, we delved a little more into likelihood inference.
Recall: likelihood inference if the science of making inferences of a statistical model \(\Theta\Rightarrow S\) using the likelihood function \[L(-\vert \Delta)=\prod_{x\in \Delta} f_{(-)}(x)\] associated to data \(\Delta \subset S\).
We noited that in general we are not necessary interested in the actual value of the likelihood function, rather the value of certain inferences. This led us to the idea of equivalent datasets:
Two datasets are equivalent if there exists an increasing bijection \(f:\mathbb{R}\longrightarrow \mathbb{R}\) such that \[L(-\Delta)=f\circ L(-\vert \Delta')\] In this case the MLE for both datsets coincides.
In fact there are many situations in statistics where one is interested not in whether tow values are identical, but similar somehow . This led us to definition equivalence relations for any set \(S\) (this is used throughout mathematics)
A relation \(\sim \subset X\times X\) on a set \(X\) is an equivalence relation if it is
  1. reflexive: \(x\sim x \)
  2. symmetric: \(x\sim y\iff y\sim x\)
  3. transitive: \(x\sim y,y\sim z\implies x\sim z \)
In this case, the classes \(\overline{x}=\{y\vert\, x\sim y\}\) form a partition for the set \(X\), which is denoted \(X/\sim\) (I'm not sure I introduced this notation in class)
This led to the idea of a sufficient statistic:
A statistic is simply a function \(T\) which takes in data (a finite subset \(\Delta \subset X\)) and returns a value in some set \(\Sigma\). A statistic is sufficient if \[ T(\Delta)=T(\Delta')\implies \Delta\sim \Delta' \] (where by \(\sim \) we denote the equivalence of datasets mentioned above)
A few remarks have to be made here:
  1. it is a fact of equivalence relations that \(x\sim y\iff \overline{x}=\overline{y}\) (see the assigment)
  2. the book only uses a more restricted definition 6.1.1 where the funcfion \(f\) is a constant. This is not really necessary, and using a more general definition will come in very handy (as it gives us more flexibility)
  3. There is another difference with the book in this treatment. Instead of taking in a dataset \(\Delta \subset S\), the book sneakily considers the sample space \(S^n\) instead for some \(n\) and defines a statistic as taking in some value \(s \in S^n\). (so simply replace the notation \(\Delta\) by \(s\) in the book)
To figure out when a statistic is indeed sufficient, we mentioned the factorization theorem 6.1.1