Sometimes we get very dirty data and that is the reason data analysis is a difficult task. Most of the data scientists look for clean data but it is almost impossible due to data warehouses often just focus on the data availability instead of the quality of data. One of the head scratching situations is getting an unnecessary value placed at different position in a random manner, $ sign is also a that type of value. We can remove this from an R data frame by using lapply function.ExampleConsider the below data frame:Live Demo> x y df1 df1Outputx y 1 ... Read More
To find the confidence interval for a lm model (linear regression model), we can use confint function and there is no need to pass the confidence level because the default is 95%. This can be also used for a glm model (general linear model). Check out the below examples to see the output of confint for a glm model.Example1Live Demo> set.seed(3214) > x1 y1 Model1 summary(Model1)OutputCall: glm(formula = y1 ~ x1, family = "binomial") Deviance Residuals: Min 1Q Median 3Q Max -1.6360 -1.4156 0.7800 0.8567 0.9946 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.34851 1.17554 0.296 0.767 ... Read More
Obviously, the default font of axes-labels is not italic in R just like any other statistical analysis tool but we can make it using ggplot2. For this purpose, we can use theme function of ggplot2 package where we have an option to change the font of the axis labels using axis.text.x argument.ExampleConsider the below data frame:Live Demo> x y df dfOutput x y 1 A 24 2 B 23 3 C 25 4 D 27Loading ggplot2 package and creating a bar plot:Example> library(ggplot2) > ggplot(df, aes(x, y))+geom_bar(stat="identity")Output:Creating bar plot with italic X-axis labels:Example> ggplot(df, aes(x, y))+geom_bar(stat="identity")+theme(axis.text.x=element_text(face=c("italic", "italic", "italic", ... Read More
To find the correlation matrix for a data frame, we can use cor function with the data frame object name but if there exist missing values in the data frame then it is not that straight forward. In such type of situations, we can use complete.obs with the cor function so that the missing values will be ignored while calculating the correlation coefficients.Example1Consider the below data frame:Live Demo> x1 x2 x3 df1 df1Output x1 x2 x3 1 NA 3 512 2 8 7 512 3 5 2 520 4 NA 1 NA 5 NA 2 512 6 NA 4 ... Read More
We can say that orthogonal is a synonym of perpendicular. If the inner product (inner product is generalization of dot product) of two polynomials is zero then we call them orthogonal polynomials. In R, we can find the orthogonal product by using poly function as shown in the below examples.Example1Live Demo> x xOutput[1] 1.53798786 -0.85463326 2.39444451 0.82559418 -2.22197322 -1.04243823 [7] -0.04693054 -0.68691236 -1.63040923 -1.42408865Example> orthogonal_x orthogonal_xOutput 1 2 [1, ] 0.41743651 -0.01687537 [2, ] -0.12158589 -0.21414848 [3, ] 0.61038362 0.54027924 [4, ... Read More
The main statistical parameters that are used to create a boxplot are mean and standard deviation but in general, the boxplot is created with the whole data instead of these values. If we don’t have whole data but mean and standard deviation are available then the boxplot can be created by finding all the limits of a boxplot using mean as a measure of central tendency.ExampleConsider the below data frame:Live Demo> df dfOutputmean sd Category 1 24 1.1 A 2 25 2.1 B 3 27 1.5 C 4 24 1.8 DLoading ggplot2 package and creating the boxplot of each category ... Read More
A 3D-array is a 3-dimensional array and it is actually a collection of 2D arrays. We can create a 3D-array of a data frame in R by using simplify2array function, this function will break the data frame into arrays that will form a 3D-array.Example1Consider the below data frame:Live Demo> set.seed(254) > x y z a b c df1 df1Outputx y z a b c 1 0 4 6 9 5 5 2 0 5 1 4 2 1 3 0 6 1 4 5 6 4 1 6 3 5 4 12 5 1 9 8 6 6 11 6 ... Read More
To create a frequency table of a string vector, we just need to use table function. For example, if we have a vector x that contains randomly sampled 100 values of first five English alphabets then the table of vector x can be created by using table(x). This will generate a table along with the name of the vector.Example1Live Demo> x1 x1Output[1] "d" "d" "a" "c" "a" "a" "c" "a" "d" "c" "a" "d" "d" "b" "c" "a" "b" "c" "d" [20] "b"Example> table(x1)Outputx1 a b c d 6 3 5 6Example2Live Demo> x2 x2Output[1] "w" "j" "p" "y" "r" ... Read More
A QR code is a barcode which is used to read the information about the object on which it is printed. This helps us to detect the useful information relevant to the object so that we can proceed with the next step depending on the operation. In R, we can create QR code by using qrcode_gen function of qrcode package.Example1Loading qrcode package:> library(qrcode)Creating a QR code for tutorialspoint:> qrcode_gen('www.tutorialspoint.com')Output:Example2> qrcode_gen('www.tutorix.com')Output:Example3> qrcode_gen('www.r-project.org')Output:
The easiest way to create a duplicate column in an R data frame is setting the new column with $ sign and if we want to have a different name then we can simply pass a new name. For example, if we have a data frame df that contains a column x and we want to have a new column x1 having same values as in x then it can be done as df$x1 set.seed(254) > x y z a b c df dfOutputx y z a b c 1 A 0.8709244 9 0.072625990 5.125432 26.84561 2 B 1.7993156 3 ... Read More
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP