How to split month and year from 6-digit numbers in an R data frame column?


Sometimes we get data that is not in the form to proceed with the analysis and one such situation is dates stored in 6-digit numbers as 202105 that represents fifth month of year 2021 instead of date format as 2021/05. Therefore, we need to split the date and extract the month and year from the number. This can be done easily with the help of transform function as shown in the below examples.

Example1

Consider the below data frame −

Live Demo

> Date<-sample(c(202101,202102,202103),20,replace=TRUE)
> Response1<-rnorm(20)
> df1<-data.frame(Date,Response1)
> df1

Output

   Date    Response1
1 202103   0.946367628
2 202103   1.241718518
3 202101  -0.657920816
4 202103  -0.809622853
5 202102  -1.996697735
6 202103  -0.008161429
7 202103  -1.076252898
8 202103   1.677882433
9 202102   1.172454683
10 202102  0.150296104
11 202101 -0.084385416
12 202101 -1.448539196
13 202101  1.264693614
14 202101  0.256453476
15 202103 -0.466133896
16 202103 -0.469300432
17 202103  0.724691974
18 202101  0.663945007
19 202103  0.156898247
20 202103  0.364965504

Creating columns for Month and Year from Date in df1 −

> transform(df1,Month=substr(Date,5,6),Year=substr(Date,1,4))

Output

    Date     Response1 Month Year
1 202103   0.946367628 03    2021
2 202103   1.241718518 03    2021
3 202101  -0.657920816 01    2021
4 202103  -0.809622853 03    2021
5 202102  -1.996697735 02    2021
6 202103  -0.008161429 03    2021
7 202103  -1.076252898 03    2021
8 202103   1.677882433 03    2021
9 202102   1.172454683 02    2021
10 202102  0.150296104 02    2021
11 202101 -0.084385416 01    2021
12 202101 -1.448539196 01    2021
13 202101  1.264693614 01    2021
14 202101  0.256453476 01    2021
15 202103 -0.466133896 03    2021
16 202103 -0.469300432 03    2021
17 202103  0.724691974 03    2021
18 202101  0.663945007 01    2021
19 202103  0.156898247 03    2021
20 202103  0.364965504 03    2021

Example2

Live Demo

> Date<-sample(c(202010,202011,202012),20,replace=TRUE)
> Rate<-rpois(20,10)
> df2<-data.frame(Date,Rate)
> df2

Output

   Date   Rate
1 202011  9
2 202010  14
3 202011  13
4 202012  16
5 202012  10
6 202012  8
7 202012  3
8 202011  20
9 202010  9
10 202011 13
11 202010 12
12 202011 14
13 202012 10
14 202011 15
15 202011 11
16 202011 5
17 202011 15
18 202010 15
19 202011 12
20 202012 5

Creating columns for Month and Year from Date in df2 −

> transform(df2,Year=substr(Date,1,4),Month=substr(Date,5,6))

Output

   Date   Rate Year Month
1 202011  9    2020  11
2 202010  14   2020  10
3 202011  13   2020  11
4 202012  16   2020  12
5 202012  10   2020  12
6 202012  8    2020  12
7 202012  3    2020  12
8 202011  20   2020  11
9 202010  9    2020  10
10 202011 13   2020  11
11 202010 12   2020  10
12 202011 14   2020  11
13 202012 10   2020  12
14 202011 15   2020  11
15 202011 11   2020  11
16 202011 5    2020  11
17 202011 15   2020  11
18 202010 15   2020  10
19 202011 12   2020  11
20 202012 5    2020  12

Updated on: 04-Mar-2021

356 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements