Contents

Intro to GLM: Modeling in R and SAS

Data Description

The following table shows the number of claims for two different companies over various periods, which can be fitted using a Poisson model.

claims company time
8 A 2
6 A 4
10 A 6
18 A 8
11 A 10
19 A 12
13 A 14
19 A 16
17 A 18
21 A 20
16 A 22
21 A 24
12 B 2
19 B 4
14 B 6
15 B 8
23 B 10
27 B 12
19 B 14
29 B 16
37 B 18
27 B 20
35 B 22
26 B 24

Data source: Boland, Philip J. Statistical and probabilistic methods in actuarial science. CRC Press, 2007.

Approach in R

Constructing Data

Use the data.frame function to combine the columns into a dataframe for inputting into the glm() function.

Use factor to specify company as a categorical variable.

1policy <- data.frame(claims = c(8,6,10,18,11,19,13,19,17,21,16,21,
2                                12,19,14,15,23,27,19,29,37,27,35,26),
3                     company = factor(c(rep("A",12),rep("B",12))),
4                     time = c(rep(seq(2,24,2),2)))

glm() Function

The glm() function specifies the target variable ~ predictors, family specifies the Poisson distribution, and data specifies the dataframe used.

1lm <- glm(claims ~ company + time, family = poisson, data = policy)
2summary(lm)

Use summary to output the results.

 1## Call:
 2## glm(formula = claims ~ company + time, family = poisson, data = policy)
 3## 
 4## Deviance Residuals: 
 5##     Min       1Q   Median       3Q      Max  
 6## -1.5490  -0.8039  -0.2589   0.6722   1.7024  
 7## 
 8## Coefficients:
 9##             Estimate Std. Error z value Pr(>|z|)    
10## (Intercept) 2.169665   0.126339  17.173  < 2e-16 ***
11## companyB    0.458061   0.095500   4.796 1.61e-06 ***
12## time        0.038313   0.006882   5.567 2.59e-08 ***
13## ---
14## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15## 
16## (Dispersion parameter for poisson family taken to be 1)
17## 
18##     Null deviance: 75.689  on 23  degrees of freedom
19## Residual deviance: 20.422  on 21  degrees of freedom
20## AIC: 139.63
21## 
22## Number of Fisher Scoring iterations: 4

Approach in SAS

Constructing Data

In SAS, data is generally read from external files. For manual input data, use cards.

Use company$ to specify company as a string variable.

 1/* Create dataset */
 2data policy;
 3    input claims company$ time;
 4    cards;
 58 A 2
 66 A 4
 710 A 6
 818 A 8
 911 A 10
1019 A 12
1113 A 14
1219 A 16
1317 A 18
1421 A 20
1516 A 22
1621 A 24
1712 B 2
1819 B 4
1914 B 6
2015 B 8
2123 B 10
2227 B 12
2319 B 14
2429 B 16
2537 B 18
2627 B 20
2735 B 22
2826 B 24
29;
30run;

proc genmod Procedure

class company; specifies the company variable as a categorical variable. Unlike R, where the base level is specified in alphabetical order (e.g., A) , SAS uses descending alphabetical order for the base level (e.g., Z). Therefore, (ref=“A”) is used to specify the base level to obtain the same model as in R.

model claims = company timespecifies the target variable and predictors.

/ dist=poisson link=log;specifies the Poisson distribution and log link function.

1/* Fit the Poisson regression model */
2proc genmod data=policy;
3    class company(ref="A");
4    model claims = company time / dist=poisson link=log;
5run;

SAS outputs the results as 6 tables in HTML format.

The GENMOD Procedure
Model Information
Model Information
Data Set WORK.POLICY
Distribution Poisson
Link Function Log
Dependent Variable claims
Number of Observations
Number of Observations Read 24
Number of Observations Used 24
Class Level Information
Class Levels Values
company 2 B A
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 21 20.4221 0.9725
Scaled Deviance 21 20.4221 0.9725
Pearson Chi-Square 21 20.6958 0.9855
Scaled Pearson X2 21 20.6958 0.9855
Log Likelihood 932.0033
Full Log Likelihood -66.8166
AIC (smaller is better) 139.6332
AICC (smaller is better) 140.8332
BIC (smaller is better) 143.1673
Convergence Status
Convergence Status
Algorithm converged.
Analysis Of Parameter Estimates
Parameter DF Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 2.1697 0.1263 1.9220 2.4173 294.93 <.0001
company B 1 0.4581 0.0955 0.2709 0.6452 23.01 <.0001
company A 0 0.0000 0.0000 0.0000 0.0000 . .
time 1 0.0383 0.0069 0.0248 0.0518 31.00 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000