Intro to GLM: Modeling in R and SAS

2023-05-28 706 words 4 minutes

Contents

Data Description

The following table shows the number of claims for two different companies over various periods, which can be fitted using a Poisson model.

claims	company	time
8	A	2
6	A	4
10	A	6
18	A	8
11	A	10
19	A	12
13	A	14
19	A	16
17	A	18
21	A	20
16	A	22
21	A	24
12	B	2
19	B	4
14	B	6
15	B	8
23	B	10
27	B	12
19	B	14
29	B	16
37	B	18
27	B	20
35	B	22
26	B	24

Data source: Boland, Philip J. Statistical and probabilistic methods in actuarial science. CRC Press, 2007.

Approach in R

Constructing Data

Use the data.frame function to combine the columns into a dataframe for inputting into the glm() function.

Use factor to specify company as a categorical variable.

1policy <- data.frame(claims = c(8,6,10,18,11,19,13,19,17,21,16,21,
2                                12,19,14,15,23,27,19,29,37,27,35,26),
3                     company = factor(c(rep("A",12),rep("B",12))),
4                     time = c(rep(seq(2,24,2),2)))

`glm()` Function

The glm() function specifies the target variable ~ predictors, family specifies the Poisson distribution, and data specifies the dataframe used.

1lm <- glm(claims ~ company + time, family = poisson, data = policy)
2summary(lm)

Use summary to output the results.

 1## Call:
 2## glm(formula = claims ~ company + time, family = poisson, data = policy)
 3## 
 4## Deviance Residuals: 
 5##     Min       1Q   Median       3Q      Max  
 6## -1.5490  -0.8039  -0.2589   0.6722   1.7024  
 7## 
 8## Coefficients:
 9##             Estimate Std. Error z value Pr(>|z|)    
10## (Intercept) 2.169665   0.126339  17.173  < 2e-16 ***
11## companyB    0.458061   0.095500   4.796 1.61e-06 ***
12## time        0.038313   0.006882   5.567 2.59e-08 ***
13## ---
14## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15## 
16## (Dispersion parameter for poisson family taken to be 1)
17## 
18##     Null deviance: 75.689  on 23  degrees of freedom
19## Residual deviance: 20.422  on 21  degrees of freedom
20## AIC: 139.63
21## 
22## Number of Fisher Scoring iterations: 4

Approach in SAS

Constructing Data

In SAS, data is generally read from external files. For manual input data, use cards.

Use company$ to specify company as a string variable.

 1/* Create dataset */
 2data policy;
 3    input claims company$ time;
 4    cards;
 58 A 2
 66 A 4
 710 A 6
 818 A 8
 911 A 10
1019 A 12
1113 A 14
1219 A 16
1317 A 18
1421 A 20
1516 A 22
1621 A 24
1712 B 2
1819 B 4
1914 B 6
2015 B 8
2123 B 10
2227 B 12
2319 B 14
2429 B 16
2537 B 18
2627 B 20
2735 B 22
2826 B 24
29;
30run;

`proc genmod` Procedure

class company; specifies the company variable as a categorical variable. Unlike R, where the base level is specified in alphabetical order (e.g., A) , SAS uses descending alphabetical order for the base level (e.g., Z). Therefore, (ref=“A”) is used to specify the base level to obtain the same model as in R.

model claims = company timespecifies the target variable and predictors.

/ dist=poisson link=log;specifies the Poisson distribution and log link function.

1/* Fit the Poisson regression model */
2proc genmod data=policy;
3    class company(ref="A");
4    model claims = company time / dist=poisson link=log;
5run;

SAS outputs the results as 6 tables in HTML format.

The GENMOD Procedure

Model Information

Model Information
Data Set	WORK.POLICY
Distribution	Poisson
Link Function	Log
Dependent Variable	claims

Number of Observations

Number of Observations Read	24
Number of Observations Used	24

Class Level Information

Class	Levels	Values
company	2	B A

Criteria For Assessing Goodness Of Fit

Criterion	DF	Value	Value/DF
Deviance	21	20.4221	0.9725
Scaled Deviance	21	20.4221	0.9725
Pearson Chi-Square	21	20.6958	0.9855
Scaled Pearson X2	21	20.6958	0.9855
Log Likelihood		932.0033
Full Log Likelihood		-66.8166
AIC (smaller is better)		139.6332
AICC (smaller is better)		140.8332
BIC (smaller is better)		143.1673

Convergence Status

Convergence Status
Algorithm converged.

Analysis Of Parameter Estimates

Parameter		DF	Estimate	Standard Error	Wald 95% Confidence Limits	Wald Chi-Square	Pr > ChiSq
Intercept		1	2.1697	0.1263	1.9220 2.4173	294.93	<.0001
company	B	1	0.4581	0.0955	0.2709 0.6452	23.01	<.0001
company	A	0	0.0000	0.0000	0.0000 0.0000	.	.
time		1	0.0383	0.0069	0.0248 0.0518	31.00	<.0001
Scale		0	1.0000	0.0000	1.0000 1.0000

claims	company	time
8	A	2
6	A	4
10	A	6
18	A	8
11	A	10
19	A	12
13	A	14
19	A	16
17	A	18
21	A	20
16	A	22
21	A	24
12	B	2
19	B	4
14	B	6
15	B	8
23	B	10
27	B	12
19	B	14
29	B	16
37	B	18
27	B	20
35	B	22
26	B	24

claims	company	time
8	A	2
6	A	4
10	A	6
18	A	8
11	A	10
19	A	12
13	A	14
19	A	16
17	A	18
21	A	20
16	A	22
21	A	24
12	B	2
19	B	4
14	B	6
15	B	8
23	B	10
27	B	12
19	B	14
29	B	16
37	B	18
27	B	20
35	B	22
26	B	24

Contents

Intro to GLM: Modeling in R and SAS

Data Description

Approach in R

Constructing Data

glm() Function

Approach in SAS

Constructing Data

proc genmod Procedure

The GENMOD Procedure

Model Information

Number of Observations

Class Level Information

Criteria For Assessing Goodness Of Fit

Convergence Status

Analysis Of Parameter Estimates

`glm()` Function

`proc genmod` Procedure

claims	company	time
8	A	2
6	A	4
10	A	6
18	A	8
11	A	10
19	A	12
13	A	14
19	A	16
17	A	18
21	A	20
16	A	22
21	A	24
12	B	2
19	B	4
14	B	6
15	B	8
23	B	10
27	B	12
19	B	14
29	B	16
37	B	18
27	B	20
35	B	22
26	B	24