How to deal with glm.fit error “NA/NaN/Inf” for logistic regression model in R?

R ProgrammingServer Side ProgrammingProgramming

When we create a general linear model for logistic regression model, we need to specify the distribution family as binomial. The error “NA/NaN/Inf” occurs when we do not specify the distribution family. Hence, family="binomial" needs to be used inside glm function while creating the logistic regression model.

Example 1

Following snippet creates a sample data frame −

iv1<-rpois(20,5)
iv2<-rpois(20,2)
iv3<-rpois(20,5)
DV1<-sample(0:1,20,replace=TRUE)
df1<-data.frame(iv1,iv2,iv3,DV1)
df1

The following dataframe is created −

  iv1  iv2 iv3 DV1
1   5   2   6  0
2   3   1   3  1
3   3   4   8  1
4   5   3   3  1
5   8   2   6  1
6   3   1   4  0
7   6   1   8  1
8   3   1   7  0
9   9   2   6  0
10  7   2   4  0
11  6   4   5  1
12 12   2   4  1
13  6   2   2  0
14  5   1   3  0
15  4   1  10  0
16  3   3   4  0
17  4   1   6  1
18  9   3   4  1
19  7   1   3  1
20  4   3   4  0

To create logistic regression model for data in df1, add the following code to the above snippet −

Model_1<-glm(factor(DV1)~iv1+iv2+iv3,data=df1)

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, −

NA/NaN/Inf in 'y'

In addition: Warning messages −

1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors

3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

To create logistic regression model for data in df1 with distribution family as binomial, add the following code to the above snippet −

iv1<-rpois(20,5)
iv2<-rpois(20,2)
iv3<-rpois(20,5)
DV1<-sample(0:1,20,replace=TRUE)
df1<-data.frame(iv1,iv2,iv3,DV1)
Model_1<-glm(factor(DV1)~iv1+iv2+iv3,data=df1,family="binomial")
summary(Model_1)

Outpu

If you execute all the above given codes as a single program, it generates the following output −

Call:
glm(formula = factor(DV1) ~ iv1 + iv2 + iv3, family = "binomial",
data = df1)

Deviance Residuals:
   Min    1Q      Median    3Q     Max
-1.61472 -1.05484 -0.07657 1.07422 1.71351

Coefficients:
             Estimate  Std. Error z value Pr(>|z|)
(Intercept) -2.59874   2.15616  -1.205   0.228
iv1          0.26684   0.22055   1.210   0.226
iv2          0.38736   0.47527   0.815   0.415
iv3          0.06822   0.23316   0.293   0.770

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 27.726 on 19 degrees of freedom
Residual deviance: 25.223 on 16 degrees of freedom
AIC: 33.223

Number of Fisher Scoring iterations: 4

Example 2

Following snippet creates a sample data frame −

x1<-runif(20,2,10)
x2<-rnorm(20)
DV2<-sample(0:1,20,replace=TRUE)
df2<-data.frame(x1,x2,DV2)
df2

The following dataframe is created −

     x1       x2         DV2
1  9.599662  -0.37487878  1
2  3.670901  -1.05763026  0
3  5.856532  -1.63384915  1
4  5.140322   0.70749809  1
5  7.215530  -0.45739769  0
6  2.347001   0.25501067  1
7  7.997737   0.32140975  0
8  4.880330   0.45770428  1
9  4.680856   1.36704134  1
10 3.720922   0.45992890  0
11 9.192565   0.91105622  0
12 7.699731  -0.35100775  1
13 3.183395   1.31957271  1
14 5.571414   0.82899477  0
15 6.724491   0.01077159  0
16 8.844951  -0.27490769  1
17 6.509826   0.25185960  1
18 9.098870  -1.75332078  1
19 2.230271  -0.52357984  1
20 4.004921   0.51763553  1

To create logistic regression model for data in df1, add the following code to the above snippet −

Model_2<-glm(factor(DV2)~x1+x2,data=df2)

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, −

NA/NaN/Inf in 'y'

In addition: Warning messages −

1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors

3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

To create logistic regression model for data in df2 with distribution family as binomial, add the following code to the above snippet −

x1<-runif(20,2,10)
x2<-rnorm(20)
DV2<-sample(0:1,20,replace=TRUE)
df2<-data.frame(x1,x2,DV2)
Model_2<-glm(factor(DV2)~x1+x2,data=df2,family="binomial")
summary(Model_2)

Outpu

If you execute all the above given codes as a single program, it generates the following output −

Call:
glm(formula = factor(DV2) ~ x1 + x2, family = "binomial", data = df2)

Deviance Residuals:
   Min    1Q     Median   3Q   Max
-1.7809 -1.2987 0.8107 0.9623 1.0866

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.5657   1.4628    1.070   0.284
x1         -0.1536   0.2236   -0.687   0.492
x2         -0.3353   0.6104   -0.549   0.583

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 25.898 on 19 degrees of freedom
Residual deviance: 25.267 on 17 degrees of freedom
AIC: 31.267

Number of Fisher Scoring iterations: 4
raja
Updated on 09-Nov-2021 07:02:40

Advertisements