set.seed(1234)
<- rt(n=100, df=4)
x ::truehist(x, las=1, col="#118ab2", border="grey80", nbins=20) MASS
1 Introduction
Q1.1
Generate a random sample \(x_1, \ldots, x_{100}\) of data from the \(t_4\) (df=4) distribution using the rt
function. Use the MASS::truehist
function to display a probability histogram of the sample.
Q1.2
Add the \(t_4\) density curve (dt
) to your histogram in Exercise 1.1 using the curve
function with add=TRUE
.
::truehist(x, las=1, col="#118ab2", border="grey80", nbins=20)
MASScurve(dt(x, df=4), from=-4, to=4, add=T, type="l")
Q1.3
Add an estimated density curve to your histogram in Exercise 1.2 using density
.
::truehist(x, las=1, col="#118ab2", border="grey80", nbins=20)
MASScurve(dt(x, df=4), from=-4, to=4, add=T, type="l")
lines(density(x), col="firebrick", lwd=2)
Q1.4a
Write an R function f
in R to implement the function \(f(x) = (x-a)/b\) that will transform an input vector \(x\) and return the result. The function should take three input arguments: x
, a
, and b
.
<- function(x, a, b) (x-a)/b f
Q1.4b
To transform \(x\) to the interval \([0,1]\) we subtract the minimum value and divide by the range: y <- f(x, a=min(x), b=max(x)-min(x))
. Generate a random sample of a Normal(\(\mu=2\), \(\sigma=2\)) data using rnorm
and use your function f
to transform this sample to the interval \([0,1]\). Print a summary
of both the sample x
and the transformed sample y
to check the result.
set.seed(1234)
<- rnorm(100, mean=2, sd=2)
x <- f(x, min(x), max(x)-min(x))
y data.frame(x=x, y=y) |> summary()
x y
Min. :-2.6914 Min. :0.0000
1st Qu.: 0.2093 1st Qu.:0.2963
Median : 1.2307 Median :0.4007
Mean : 1.6865 Mean :0.4472
3rd Qu.: 2.9424 3rd Qu.:0.5755
Max. : 7.0980 Max. :1.0000
Q1.5
Refer to Exercise 1.4. Suppose that we want to transform the x
sample so that it has mean zero and standard deviation one (studentize the sample). That is, we want \(z_i = (x_i-\bar{x})/s\) for \(i=1,\ldots,n\), where \(s\) is that standard deviation of the sample. Using your function f
this is z <- f(x, a=mean(x), b=sd(x))
. Display a summary and histogram of the studentized sample z
. It should be centered exactly at zero. Use sd(z)
to check that the studentized sample has standard deviation exactly 1.0.
<- f(x, a=mean(x), b=sd(x))
z c(mean=mean(z), sd=sd(z)) |> round(10)
mean sd
0 1
Q1.6
Using your function f
of Exercise 1.4, center and scale your Normal(\(\mu=2\), \(\sigma=2\)) sample by subtracting the sample median and dividing by the sample interquartile range (IQR
).
<- f(x, a=median(x), b=IQR(x))
w data.frame(x=x, w=w) |> summary()
x w
Min. :-2.6914 Min. :-1.4351
1st Qu.: 0.2093 1st Qu.:-0.3737
Median : 1.2307 Median : 0.0000
Mean : 1.6865 Mean : 0.1667
3rd Qu.: 2.9424 3rd Qu.: 0.6263
Max. : 7.0980 Max. : 2.1468
Q1.7
Refer to Example 1.14 where we displayed an array of scatterplots using ggplot
with facet_wrap
. One of the variables in the mpg
data is drv
, a characer vector indicating whether the vehicle is front-wheel drive, real-wheel drive, or four-wheel drive. Add color=drv
in aes
: aes(displ, hwy, color=drv)
and display the revised plot. Your scatterplots should now have automatically generated a legend for drv
color.
library(ggplot2)
|>
mpg ggplot() +
geom_point(aes(displ, hwy, color=drv)) +
facet_wrap(~class)
Q1.8
This exercise is intented to serve as an introduction to report writing with R Markdown. Install the knitr package if it is not installed. Create an html report using R Markdown and knitr in RStudio. The report whould include the code and output of Examples 1.12 and 1.14 with appropriate headings and a brief explanation of each example.
Solution Omitted. Note that Quarto supersedes RMarkdown. Visit https://quarto.org/ to learn more.