12 Probability Density Estimation

Q12.1

Construct a histogram estimate of density for a random sample of standard lognormal data using Sturges’ Rule, for sample size $n=100$. Repeat the estimate for the same sample using the correction for skewness proposed by Doane in Equation (12.2). Compare the number of bins and break points using both methods. Compare the density estimates at the deciles of the lognormal distribution with the lognormal density at the same points. Does the suggested correction give better density estimates in this example?

Q12.2

Estimate the IMSE for three histogram density estimates of standard normal data, from a sample size $n=500$. Use Sturges’ Rule, Scott’s Normal Reference Rule, and the FD Rule.

Q12.3

Construct a frequency polygot density estimate for the precip dataset in R. Verify that the estimate satisfies $\int \hat{f}(x)\, dx = 1$ by numerical integration of the density estimate.

Q12.4

Construct a frequency polygot density estimate for the precip dataset, using a bin width determined by subseituting $\hat{\sigma} = IQR / 1.348$ for standard deviation in the usual Normal Reference Rule for a frequency polygon.

Q12.5

Construct a frequency polygon density estimate for the precip dataset, using a bin width determined by the Normal Reference Rule for a frequency polygon adjusted for skewness. The skewness adjustment factor is given in Equation (12.8).

Q12.6

Construct an ASH density estimate for the faithful$eruptions dataset in R, using width $h$ determined by the Normal Reference Rule. Use a weight function corresponding to the biweight kernel, \[ K(t) = \frac{15}{16}(1-t^2)^2 \hspace{1em} \textrm{if} \hspace{1em} |t|<1, \hspace{2em} K(t)=0 \textrm{otherwise.} \]

Q12.7

Construct an ASH density estimate for the precip dataset in R. Choose the best value for width $h^*$ empirically by computing the estimates over a range of possible values of $h$ and comparing the plots of the densities. Does the optimal value $h_n^{fp}$ correspond to the optimal value $h^*$ suggested by comparing the density plots?

Q12.8

The bullafo dtaset in the gss package contains annual snowfall accumulations in Buffalo, New York from 1910 to 1973 with 64 observations. This data was analyzed by Scott. Construct kernel density estimates of the data using Gaussian and biweight kernels. Compare the estimates for different choices of bandwidth. Is the estimate more influenced by the type of kernel or the bandwidth.

Q12.9

Construct a kernel density estimate for simulated data from the normal location mixture $\frac{1}{2} N(0,1) + \frac{1}{2} N(3,1)$. Compare several choices of bandwidth, including (12.13) and (12.14). Plot the true density of the mixture over the density estimate, for comparison. Which choice of smoothing parameter appears to be best?

Q12.10

Apply the reflection boundary technique to obtain a better kernel density estimate for the precipitation data in Example 12.8. Compare the estimatees in Example 12.8 and the improved estimates in a single graph. Also try setting from=0 or cut=0 in the density function.

Q12.11

Write a bivariate density polygon plotting function based on Examples 12.12 and 12.13. Use Example 12.13 to check the results, and othen apply your function to display the bivariate faithful data (Old Faithful geyser).

Q12.12

Plot a bivariate ASH density estimate of the geyser (MASS) data.

Q12.13

Generalize the bivariate ASH algorithm to compute an ASH density estimate for a $d$-dimensional multivariate density, $d \ge 2$.

Q12.14

Write a function to bin three-dimensional data into a three-way contingency table, following the method in the bin2d function of Example 12.12. Check the result on simulated $N_3(0,I)$ data. Compare the marginal frequencies returned by your function to the expected frequencies from a standard univariate normal distribution.