Why do we get warning 'newdata' had 1 row but variables found have X rows while predicting a linear model in R?

R ProgrammingServer Side ProgrammingProgramming

The reason we get newdata had 1 row warning is the newdata is not correctly defined. We should give the name of the explanatory variable or independent variable to the newdata so that the model can identify that we are passing the mean of the explanatory variable, otherwise it considers all the values of the explanatory hence the result of the predict function yields the predicted values for the sample size.

Example

 Live Demo

Consider the below data frame −

set.seed(123)
x<-rnorm(20,0.05)
y<-rpois(20,5)
df<-data.frame(x,y)
df   x        y
 1 -0.5104756 3
 2 -0.1801775 4
 3  1.6087083 4
 4  0.1205084 4
 5  0.1792877 3
 6  1.7650650 3
 7  0.5109162 3
 8 -1.2150612 5
 9 -0.6368529 4
10 -0.3956620 7
11  1.2740818 2
12  0.4098138 5
13  0.4507715 7
14  0.1606827 2
15 -0.5058411 5
16  1.8369131 3
17  0.5478505 3
18 -1.9166172 6
19  0.7513559 8
20 -0.4227914 4

Creating the linear model −

M<-lm(y~x,data=df)

Now let’s predict the value of y for the mean of x −

predict(M,newdata=data.frame(mean(df$x)),interval="confidence")
   fit      lwr       upr
 1 4.645695 3.690676 5.600715
 2 4.459543 3.635161 5.283925
 3 3.451347 2.071115 4.831579
 4 4.290080 3.520452 5.059707
 5 4.256952 3.489416 5.024489
 6 3.363226 1.876124 4.850329
 7 4.070050 3.260221 4.879880
 8 5.042792 3.669549 6.416034
 9 4.716920 3.697691 5.736149
10 4.580988 3.678189 5.483786
11 3.639939 2.475080 4.804798
12 4.127031 3.339496 4.914565
13 4.103947 3.308320 4.899575
14 4.267438 3.499558 5.035318
15 4.643083 3.690292 5.595875
16 3.322734 1.785518 4.859949
17 4.049235 3.229372 4.869097
18 5.438181 3.566862 7.309500
19 3.934541 3.043288 4.825795
20 4.596277 3.681723 5.510832

Warning message −

'newdata' had 1 row but variables found have 20 rows

To get rid of this warning we need to define the newdata for x variable as shown below −

predict(M,newdata=data.frame(x=mean(df$x)),interval="confidence")
   fit   lwr     upr
1 4.25 3.482529 5.017471

Same thing happens when we try to predict y for a fixed value −

predict(M,newdata=data.frame(1.2),interval="confidence")
    fit      lwr      upr
 1 4.645695 3.690676 5.600715
 2 4.459543 3.635161 5.283925
 3 3.451347 2.071115 4.831579
 4 4.290080 3.520452 5.059707
 5 4.256952 3.489416 5.024489
 6 3.363226 1.876124 4.850329
 7 4.070050 3.260221 4.879880
 8 5.042792 3.669549 6.416034
 9 4.716920 3.697691 5.736149
10 4.580988 3.678189 5.483786
11 3.639939 2.475080 4.804798
12 4.127031 3.339496 4.914565
13 4.103947 3.308320 4.899575
14 4.267438 3.499558 5.035318
15 4.643083 3.690292 5.595875
16 3.322734 1.785518 4.859949
17 4.049235 3.229372 4.869097
18 5.438181 3.566862 7.309500
19 3.934541 3.043288 4.825795
20 4.596277 3.681723 5.510832

Warning message −

'newdata' had 1 row but variables found have 20 rows

predict(M,newdata=data.frame(x=1.2),interval="confidence")
    fit     lwr     upr
1 3.681691 2.56125 4.802131
raja
Updated on 29-Aug-2020 08:01:43

Advertisements