Found 2038 Articles for R Programming

How to do an inner join and outer join of two data frames in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 15:00:12

602 Views

An inner join return only the rows in which the left table have matching keys in the right table and an outer join returns all rows from both tables, join records from the left which have matching keys in the right table. This can be done by using merge function.ExampleInner Join> df1 = data.frame(CustomerId = c(1:5), Product = c(rep("Biscuit", 3), rep("Cream", 2))) > df1   CustomerId Product 1 1 Biscuit 2 2 Biscuit 3 3 Biscuit 4 4 Cream 5 5 Cream > df2 = data.frame(CustomerId = c(2, 5, 6), City = c(rep("Chicago", 2), rep("NewYorkCity", 1))) > df2 CustomerId City ... Read More

Why we should use set.seed in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:58:52

5K+ Views

The use of set.seed is to make sure that we get the same results for randomization. If we randomly select some observations for any task in R or in any statistical software it results in different values all the time and this happens because of randomization. If we want to keep the values that are produced at first random selection then we can do this by storing them in an object after randomization or we can fix the randomization procedure so that we get the same results all the time.ExampleRandomization without set.seed> sample(1:10) [1] 4 10 5 3 1 6 ... Read More

How to make list of data frames in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:57:34

203 Views

This can be done by using list function.Example> df1

What is the use of tilde operator (~) in R?

Nizamuddin Siddiqui
Updated on 11-Jul-2020 12:55:17

3K+ Views

Tilde operator is used to define the relationship between dependent variable and independent variables in a statistical model formula. The variable on the left-hand side of tilde operator is the dependent variable and the variable(s) on the right-hand side of tilde operator is/are called the independent variable(s). So, tilde operator helps to define that dependent variable depends on the independent variable(s) that are on the right-hand side of tilde operator.Example> Regression_Model Regression_Data Regression_Model_New < - lm(y~ . , data = Regression_Data)This will have the same output as the previous model, but we cannot use tilde with dot if ... Read More

How to filter rows that contain a certain string in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:55:13

1K+ Views

We can do this by using filter and grepl function of dplyr package.ExampleConsider the mtcars data set.> data(mtcars) > head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 ... Read More

How to change the orientation and font size of x-axis labels using ggplot2 in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:53:37

713 Views

This can be done by using theme argument in ggplot2Example> df df x y 1 long text label a -0.8080940 2 long text label b 0.2164785 3 long text label c 0.4694148 4 long text label d 0.7878956 5 long text label e -0.1836776 6 long text label f 0.7916155 7 long text label g 1.3170755 8 long text label h 0.4002917 9 long text label i 0.6890988 10 long text label j 0.6077572Plot is created as follows −> library(ggplot2) > ggplot(df, aes(x=x, y=y)) + geom_point() + theme(text = element_text(size=20), axis.text.x = element_text(angle=90, hjust=1))

How to select only numeric columns from an R data frame?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:51:29

740 Views

The easiest way to do it is by using select_if function of dplyr package but we can also do it through lapply.Using dplyr> df df X1 X2 X3 X4 X5 1 1 11 21 a k 2 2 12 22 b l 3 3 13 23 c m 4 4 14 24 d n 5 5 15 25 e o 6 6 16 26 f p 7 7 17 27 g q 8 8 18 28 h r 9 9 19 29 i s 10 10 20 30 j t >library("dplyr") > select_if(df, is.numeric) X1 X2 X3 1 1 11 21 2 2 12 22 3 3 13 23 4 4 14 24 5 5 15 25 6 6 16 26 7 7 17 27 8 8 18 28 9 9 19 29 10 10 20 30Using lapply> numeric_only df[ , numeric_only] X1 X2 X3 1 1 11 21 2 2 12 22 3 3 13 23 4 4 14 24 5 5 15 25 6 6 16 26 7 7 17 27 8 8 18 28 9 9 19 29 10 10 20 30

How to delete columns by their name in data.table in R?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:50:15

877 Views

We can do this by setting the column to NULLExample> library(data.table) > df data_table data_table[, x:=NULL] > data_table numbers 1: 1 2: 2 3: 3 4: 4 5: 5 6: 6 7: 7 8: 8 9: 9 10: 10To delete two columns> df Data_table Data_table numbers 1: 0 2: 1 3: 2 4: 3 5: 4 6: 5 7: 6 8: 7 9: 8 10: 9

How to simulate discrete uniform random variable in R?

Nizamuddin Siddiqui
Updated on 11-Jul-2020 12:54:09

3K+ Views

There is no function in base R to simulate discrete uniform random variable like we have for other random variables such as Normal, Poisson, Exponential etc. but we can simulate it using rdunif function of purrr package.The rdunif function has the following syntax −> rdunif(n, b , a)Here, n = Number of random values to returnb = Maximum value of the distribution, it needs to be an integer because the distribution is discretea = Minimum value of the distribution, it needs to be an integer because the distribution is discreteExampleLet’s say you want to simulate 10 ages between 21 to ... Read More

How to standardize columns in an R data frame?

Nizamuddin Siddiqui
Updated on 06-Jul-2020 14:46:24

188 Views

This can be done by using scale function.Example> data data x y 1 49.57542 2.940931 2 49.51565 2.264866 3 50.70819 2.918803 4 49.09796 2.416676 5 49.90089 2.349696 6 49.03445 3.883145 7 51.29564 4.072614 8 49.11014 3.526852 9 49.41255 3.320530 10 49.42131 3.033730 > standardized_data standardized_data x y [1,] -0.1774447 -0.20927607 [2,] -0.2579076 -1.28232321 [3,] 1.3476023 -0.24439768 [4,] -0.8202493 -1.04137095 [5,] 0.2607412 -1.14768085 [6,] -0.9057468 1.28619932 [7,] 2.1384776 1.58692277 [8,] -0.8038439 0.72069363 [9,] -0.3967165 0.39321942 [10,] -0.3849124 -0.06198639 attr(,"scaled:center") x y 49.707220 3.072784 attr(,"scaled:scale") x y 0.7427788 0.6300430

Advertisements