Contents

Plotting with R: ggplot2 vs baseR

Histogram

ToDo: Fit a histogram and probability density curve to a sample of data following the Beta(6,3) distribution and compare it to the Beta(6,3) probability density function.

1. Generate Data

The sample size and the length of data points used for density are different, so they cannot be combined into a single dataframe for input into ggplot().

1# beta sample
2S <- 500
3smpl <- rbeta(S,6,3)
4# density of beta
5x <- seq(0,1,by=0.01)
6y <- dbeta(x,6,3)

2. Plot with baseR

First, use hist to create a histogram, then add lines.

Use breaks to specify the number of groups, freq = F to show frequency instead of count, and main to specify the title of the graph.

lines(density(smpl)) can directly plot the probability density curve of the sample, where density() returns a list variable.

lines(x,y) specifies x and y to draw the line, lty = "dashed" specifies the line type as dashed, which distinguishes it from the curve of the sample.

1hist(smpl, breaks = 20, freq = F, main ="Monte Carlo Sample Distribution and Beta Density")
2lines(density(smpl))
3lines(x,y,lty="dashed")

/img/r/hist_baser.png

3. Plot with ggplot2

Each layer of ggplot uses a geom_ function. Since smpl data needs to be plotted in two layers, it is passed into the ggplot main function as a dataframe, while the pdf data points generated by dbeta are separately specified as data for geom_line().

When the aesthetic aes() is not specified for geom_histogram, the y-axis displays frequency. aes(y =..density..) changes the y-axis to frequency. Use binwidth to specify the number of groups. colour and fill are used to approximate the colors of bars specified in baseR.

geom_density() directly draws the probability density curve of the sample, which is equivalent to lines(density(smpl)) in baseR.

labs(title ="") specifies the graph title.

geom_line() specifies x and y to draw the line, which is equivalent to lines(x,y) in baseR. linetype="dashed" specifies the line type as dashed.

stat_function() can directly draw the plot of the specified function, without generating sampling point data. For example, dbeta is the pdf function of the beta distribution.

 1library(ggplot2)
 2ggplot(data.frame(smpl),aes(x=smpl))+
 3  geom_histogram(mapping = aes(y=..density..),
 4                 bins=20, colour="black",fill="grey")+
 5  geom_density()+
 6  #geom_line(data = data.frame(x,y),
 7  #          mapping = aes(x,y),
 8  #          linetype="dashed")+
 9  stat_function(fun = dbeta, args = list(shape1 = 6, shape2 = 3), linetype="dashed")+
10  labs(title = "Monte Carlo Sample Distribution and Beta Density")

/img/r/hist_ggplot.png

To be continued