What are some examples of data sets with missing values in R?


Instructors/educators often need to teach missing value imputation to their students; hence they require datasets that contains some missing values or they need to create one. We also have some data sets with missing values available in R such as airquality data in base R and food data in VIM package. There could be many other packages that contain data sets with missing values but it would take a lot of time to explore them. Thus, we have shared the example of airquality and some data sets from VIM package.

Example 1

 Live Demo

head(airquality,20)

Output

 Ozone Solar.R Wind Temp Month Day
1 41    190    7.4    67    5    1
2 36    118    8.0    72    5    2
3 12    149    12.6   74    5    3
4 18    313    11.5   62    5    4
5 NA    NA    14.3    56    5    5
6 28    NA    14.9    66    5    6
7 23    299    8.6    65    5    7
8 19    99    13.8    59    5    8
9 8     19    20.1    61    5    9
10 NA   194   8.6     69    5    10
11 7    NA    6.9     74    5    11
12 16  256    9.7     69    5    12
13 11  290    9.2     66    5    13
14 14  274    10.9    68    5    14
15 18  65     13.2    58    5    15
16 14  334    11.5    64    5    16
17 34  307    12.0    66    5    17
18 6    78    18.4    57    5    18
19 30  322    11.5    68    5    19
20 11  44      9.7    62    5    20

Example

 Live Demo

> summary(airquality)

Output

   Ozone       Solar.R       Wind       Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
   Month          Day
Min. :5.000 Min. : 1.0
1st Qu.:6.000 1st Qu.: 8.0
Median :7.000 Median :16.0
Mean :6.993 Mean :15.8
3rd Qu.:8.000 3rd Qu.:23.0
Max. :9.000 Max. :31.0

Example 2

Loading VIM package −

> library(VIM)
> summary(SBS5242)

Output

   PW          BWS_F          Umsatz       PERSA
Min. : 21.5 Min. : 10.22 Min. : 5.003 Min. : -0.60
1st Qu.: 365.1 1st Qu.: 171.30 1st Qu.: 306.267 1st Qu.: 3.21
Median : 985.1 Median : 503.18 Median : 801.245 Median : 105.09
Mean : 2515.4 Mean : 1406.27 Mean : 2100.978 Mean : 3147.40
3rd Qu.: 2691.6 3rd Qu.: 1309.43 3rd Qu.: 2548.480 3rd Qu.: 1002.91
Max. :43888.8 Max. :35081.07 Max. :23558.504 Max. :127175.91
NA's :5 NA's :5 NA's :5 NA's :5
   BEZWD       BEZWDVK          BESCH          USB
Min. : 10.43 Min. : -0.6602 Min. : -0.1676 Min. : -0.6841
1st Qu.: 192.40 1st Qu.: 0.0000 1st Qu.: 3.9790 1st Qu.: 0.5444
Median : 517.46 Median : 5.5174 Median : 9.4356 Median : 4.7794
Mean : 1453.76 Mean : 18.1511 Mean : 17.6972 Mean : 12.0593
3rd Qu.: 1417.21 3rd Qu.: 20.4039 3rd Qu.: 20.1053 3rd Qu.: 18.0577
Max. :37577.19 Max. :379.0521 Max. :310.0948 Max. :105.3674
NA's :5 NA's :5 NA's :5 NA's :5
   ISACH
Min. : -0.925
1st Qu.: 3.753
Median : 31.026
Mean : 191.003
3rd Qu.: 127.189
Max. :6575.334
NA's :5

Example 3

summary(bcancer)

Output

    ID    clump_thickness uniformity_cellsize uniformity_cellshape
Min. : 61634 Min. : 1.000 Min. : 1.000 Min. : 1.000
1st Qu.: 870688 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 1.000
Median : 1171710 Median : 4.000 Median : 1.000 Median : 1.000
Mean : 1071704 Mean : 4.418 Mean : 3.134 Mean : 3.207
3rd Qu.: 1238298 3rd Qu.: 6.000 3rd Qu.: 5.000 3rd Qu.: 5.000
Max. :13454352 Max. :10.000 Max. :10.000 Max. :10.000

adhesion epithelial_cellsize bare_nuclei chromatin
Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 2.000
Median : 1.000 Median : 2.000 Median : 1.000 Median : 3.000
Mean : 2.807 Mean : 3.216 Mean : 3.545 Mean : 3.438
3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000
Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
NA's :16
normal_nucleoli mitoses class
Min. : 1.000 Min. : 1.000 benign :458
1st Qu.: 1.000 1st Qu.: 1.000 malignant:241
Median : 1.000 Median : 1.000
Mean : 2.867 Mean : 1.589
3rd Qu.: 4.000 3rd Qu.: 1.000
Max. :10.000 Max. :10.000

Example 4

summary(brittleness)

Output

   TK104       TK105       TK107
Min. :188.0 Min. :223.0 Min. :240.0
1st Qu.:369.5 1st Qu.:370.0 1st Qu.:425.0
Median :423.5 Median :460.0 Median :479.0
Mean :421.0 Mean :472.2 Mean :470.1
3rd Qu.:482.2 3rd Qu.:549.0 3rd Qu.:548.5
Max. :697.0 Max. :709.0 Max. :733.0
NA's :3 NA's :2

Example 5

summary(food)

Output

   Country    Real.coffee Instant.coffee Tea Sweetener
Austria: 1 Min. :27.00 Min. :10.00 Min. :40.00 Min. : 2.0
Belgium: 1 1st Qu.:71.50 1st Qu.:17.00 1st Qu.:62.50 1st Qu.:11.0
Denmark: 1 Median :89.00 Median :39.00 Median :84.50 Median :19.0
England: 1 Mean :78.56 Mean :39.25 Mean :78.50 Mean :18.0
Finland: 1 3rd Qu.:96.00 3rd Qu.:54.25 3rd Qu.:92.25 3rd Qu.:26.5
France : 1 Max. :98.00 Max. :86.00 Max. :99.00 Max. :35.0
(Other):10 NA's :1
Biscuits Powder.soup Tin.soup Potatoes
Min. :22.00 Min. :27.00 Min. : 1.00 Min. : 2.00
1st Qu.:56.00 1st Qu.:36.25 1st Qu.: 3.75 1st Qu.: 6.50
Median :62.00 Median :47.00 Median :11.50 Median :10.00
Mean :60.67 Mean :49.00 Mean :18.31 Mean :12.75
3rd Qu.:75.00 3rd Qu.:58.00 3rd Qu.:20.00 3rd Qu.:17.00
Max. :91.00 Max. :75.00 Max. :76.00 Max. :39.00
NA's :1
Frozen.fish Frozen.veggies Apples Oranges
Min. : 4.00 Min. : 2.00 Min. :22.00 Min. :42.00
1st Qu.:13.75 1st Qu.: 6.50 1st Qu.:56.75 1st Qu.:65.25
Median :19.50 Median :13.00 Median :71.50 Median :72.00
Mean :21.88 Mean :15.88 Mean :66.81 Mean :70.50
3rd Qu.:26.25 3rd Qu.:21.50 3rd Qu.:81.00 3rd Qu.:77.25
Max. :54.00 Max. :45.00 Max. :87.00 Max. :94.00

Tinned.fruit Jam Garlic Butter
Min. : 8.00 Min. :16.00 Min. : 5.00 Min. :31.00
1st Qu.:28.00 1st Qu.:40.25 1st Qu.:11.00 1st Qu.:64.50
Median :43.00 Median :54.00 Median :25.50 Median :83.00
Mean :41.94 Mean :55.19 Mean :42.31 Mean :75.81
3rd Qu.:50.75 3rd Qu.:72.00 3rd Qu.:81.50 3rd Qu.:94.00
Max. :89.00 Max. :91.00 Max. :91.00 Max. :97.00

Margarine Olive.oil Yoghurt Crisp.bread
Min. :24.00 Min. :13.00 Min. : 2.00 Min. : 3.00
1st Qu.:47.75 1st Qu.:29.50 1st Qu.: 5.50 1st Qu.:10.50
Median :79.00 Median :52.50 Median :13.00 Median :21.00
Mean :69.12 Mean :54.19 Mean :20.53 Mean :27.75
3rd Qu.:94.00 3rd Qu.:83.25 3rd Qu.:30.50 3rd Qu.:31.00
Max. :97.00 Max. :94.00 Max. :57.00 Max. :93.00
NA's :1

Updated on: 05-Dec-2020

608 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements