## Problem Set 3

This is to be completed by November 9th, 2017.

### Exercises

- [Datacamp](https://www.datacamp.com/home
- Complete the lesson “Introduction to Machine Learning”.
- This should have also included “Exploratory Data Analysis”. This has been added to the next week’s assignment.

- MLE for the uniform distribution.
- (Source: Kaelbling/Murphy) Consider a uniform distribution centered on 0 with width 2a. The density function is given by: $$ p(x) = \frac{\chi_{[-a,a]}}{2a}.$${

a. Given a data set $x_1,\cdots, x_n,$ what is the maximum likelihood estimate $a_{MLE}$ of $a$?

b. What probability would the model assign to a new data point $x_{n+1}$ using $a_{MLE}$?

c. Do you see any problem with the above approach? Briefly suggest (in words) a better approach.

- (Source: Kaelbling/Murphy) Consider a uniform distribution centered on 0 with width 2a. The density function is given by: $$ p(x) = \frac{\chi_{[-a,a]}}{2a}.$${
- Calculate the expected value and mode of $\theta$ when $\theta \sim \textrm{Beta}(\alpha, \beta)$,
- Change of variables:
- Let $X\colon S\to T_1$ be a discrete valued random variable with pmf $p_X\colon T_1\to [0,1]$ and let $Y\colon T_1\to T_2$ be a function. Derive the pmf $p_{Y\circ X}\colon T_2\to [0,1]$ in terms of $p_X$ and $Y$.
- Let $X^n\colon S^{\times n}\to \{0,1\}^{\times n}$ be the random variable whose values give $n$-independent samples of a Bernoulli random variable $X$ with parameter $\theta$ (i.e., $p_X(1)=\theta$). Show that $$p_{X^n}(v_1,\cdots, v_n)=\theta^{\sum{v_i}}(1-\theta)^{n-\sum v_i}.$$ Now let $\sigma \colon \{0,1\}^{\times n}\to \{0,…n\}$ be defined by taking the sum of the entries. The composite $\sigma\circ X^{n}$ is called a
*Binomial random variable with parameters $n$ and $\theta.$*Determine $p_{\sigma\circ X^n}(k)$. - Let $X\colon S\to \Bbb R$ be a random variable with piecewise continuous pdf $p_X$ and let $Y\colon \Bbb R\to \Bbb R$ be a differentiable monotonic function. Show that $Y\circ X$ is a random variable and determine $p_{Y\circ X}$.

- Uninformative prior for log-odds ratio:
- (Source: Murphy) Let $$ \phi = \textrm{logit}(\theta) = \log \frac{\theta}{1-\theta}.$$ Show that if $p(\phi)\propto 1,$ then $p(\theta)\propto \textrm{Beta}(0,0)$.

- R Lab:
- Construct and apply a Naive Bayes classifier for a specific text classification problem (e.g., spooky author identification) from scratch. In other words, do not use any modeling libraries. Feel free to use any libraries you like to get the data into an acceptable format.