Intro to GLM: Modeling in R and SAS
Data Description
The following table shows the number of claims for two different companies over various periods, which can be fitted using a Poisson model.
| claims | company | time |
|---|---|---|
| 8 | A | 2 |
| 6 | A | 4 |
| 10 | A | 6 |
| 18 | A | 8 |
| 11 | A | 10 |
| 19 | A | 12 |
| 13 | A | 14 |
| 19 | A | 16 |
| 17 | A | 18 |
| 21 | A | 20 |
| 16 | A | 22 |
| 21 | A | 24 |
| 12 | B | 2 |
| 19 | B | 4 |
| 14 | B | 6 |
| 15 | B | 8 |
| 23 | B | 10 |
| 27 | B | 12 |
| 19 | B | 14 |
| 29 | B | 16 |
| 37 | B | 18 |
| 27 | B | 20 |
| 35 | B | 22 |
| 26 | B | 24 |
Data source: Boland, Philip J. Statistical and probabilistic methods in actuarial science. CRC Press, 2007.
Approach in R
Constructing Data
Use the data.frame function to combine the columns into a dataframe for inputting into the glm() function.
Use factor to specify company as a categorical variable.
1policy <- data.frame(claims = c(8,6,10,18,11,19,13,19,17,21,16,21,
2 12,19,14,15,23,27,19,29,37,27,35,26),
3 company = factor(c(rep("A",12),rep("B",12))),
4 time = c(rep(seq(2,24,2),2)))
glm() Function
The glm() function specifies the target variable ~ predictors, family specifies the Poisson distribution, and data specifies the dataframe used.
1lm <- glm(claims ~ company + time, family = poisson, data = policy)
2summary(lm)
Use summary to output the results.
1## Call:
2## glm(formula = claims ~ company + time, family = poisson, data = policy)
3##
4## Deviance Residuals:
5## Min 1Q Median 3Q Max
6## -1.5490 -0.8039 -0.2589 0.6722 1.7024
7##
8## Coefficients:
9## Estimate Std. Error z value Pr(>|z|)
10## (Intercept) 2.169665 0.126339 17.173 < 2e-16 ***
11## companyB 0.458061 0.095500 4.796 1.61e-06 ***
12## time 0.038313 0.006882 5.567 2.59e-08 ***
13## ---
14## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15##
16## (Dispersion parameter for poisson family taken to be 1)
17##
18## Null deviance: 75.689 on 23 degrees of freedom
19## Residual deviance: 20.422 on 21 degrees of freedom
20## AIC: 139.63
21##
22## Number of Fisher Scoring iterations: 4
Approach in SAS
Constructing Data
In SAS, data is generally read from external files. For manual input data, use cards.
Use company$ to specify company as a string variable.
1/* Create dataset */
2data policy;
3 input claims company$ time;
4 cards;
58 A 2
66 A 4
710 A 6
818 A 8
911 A 10
1019 A 12
1113 A 14
1219 A 16
1317 A 18
1421 A 20
1516 A 22
1621 A 24
1712 B 2
1819 B 4
1914 B 6
2015 B 8
2123 B 10
2227 B 12
2319 B 14
2429 B 16
2537 B 18
2627 B 20
2735 B 22
2826 B 24
29;
30run;
proc genmod Procedure
class company; specifies the company variable as a categorical variable. Unlike R, where the base level is specified in alphabetical order (e.g., A) , SAS uses descending alphabetical order for the base level (e.g., Z). Therefore, (ref=“A”) is used to specify the base level to obtain the same model as in R.
model claims = company timespecifies the target variable and predictors.
/ dist=poisson link=log;specifies the Poisson distribution and log link function.
1/* Fit the Poisson regression model */
2proc genmod data=policy;
3 class company(ref="A");
4 model claims = company time / dist=poisson link=log;
5run;
SAS outputs the results as 6 tables in HTML format.
The GENMOD Procedure
Model Information
| Model Information | |
|---|---|
| Data Set | WORK.POLICY |
| Distribution | Poisson |
| Link Function | Log |
| Dependent Variable | claims |
Number of Observations
| Number of Observations Read | 24 |
|---|---|
| Number of Observations Used | 24 |
Class Level Information
| Class | Levels | Values |
|---|---|---|
| company | 2 | B A |
Criteria For Assessing Goodness Of Fit
| Criterion | DF | Value | Value/DF |
|---|---|---|---|
| Deviance | 21 | 20.4221 | 0.9725 |
| Scaled Deviance | 21 | 20.4221 | 0.9725 |
| Pearson Chi-Square | 21 | 20.6958 | 0.9855 |
| Scaled Pearson X2 | 21 | 20.6958 | 0.9855 |
| Log Likelihood | 932.0033 | ||
| Full Log Likelihood | -66.8166 | ||
| AIC (smaller is better) | 139.6332 | ||
| AICC (smaller is better) | 140.8332 | ||
| BIC (smaller is better) | 143.1673 |
Convergence Status
| Convergence Status |
|---|
| Algorithm converged. |
Analysis Of Parameter Estimates
| Parameter | DF | Estimate | Standard Error | Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | |
|---|---|---|---|---|---|---|---|
| Intercept | 1 | 2.1697 | 0.1263 | 1.9220 2.4173 | 294.93 | <.0001 | |
| company | B | 1 | 0.4581 | 0.0955 | 0.2709 0.6452 | 23.01 | <.0001 |
| company | A | 0 | 0.0000 | 0.0000 | 0.0000 0.0000 | . | . |
| time | 1 | 0.0383 | 0.0069 | 0.0248 0.0518 | 31.00 | <.0001 | |
| Scale | 0 | 1.0000 | 0.0000 | 1.0000 1.0000 |