Article Categories

Selected Reading

Data Manipulation in R with data.table

R Programming Data Science Server Side Programming

Data manipulation is a crucial step in the data analysis process, as it allows us to prepare and organize our data in a way that is suitable for the specific analysis or visualization. There are many different tools and techniques for data manipulation, depending on the type and structure of the data, as well as the specific goals of the manipulation.

The data.table package is an R package that provides an enhanced version of the data.frame class in R. It?s syntax and features make it easier and faster to manipulate and work with large datasets.

The date.table is one of the most downloaded packages by developers and an ideal choice for Data Scientists.

Installating data.table package

Installing data.table package is as simple as installing other packages. You can use the below commands in CRAN?s command line tool to install this package ?

Installing ?data.table? package using CRAN

install.packages('data.table')

Installing dev version from Gitlab

install.packages("data.table",
repos="https://Rdatatable.gitlab.io/data.table")

Importing Datasets

In R programming language, we have tons of built-in datasets that one may use as demo data to demonstrate how the R functions work.

One such popular inbuilt dataset is "Iris" dataset. This dataset provides us the measurement of four different attributes of 50 flowers (three different species).

The way we deal with datasets in data.table is quite different from dealing datasets in data.frame. Let?s go deep into this and get some insights.

The data.table provides us fread() function (fast read) which is basically data.table?s version of read.csv() function. Similar to read.csv() function it can read a file stored locally as well as capable enough to read files hosted on a website.

Example

Consider the below program that imports iris data stored as a CSV file on the internet ?

<div class="code-mirror  language-java" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"># <span class="token class-name">Importing</span> library
<span class="token function">library</span><span class="token punctuation">(</span>data<span class="token punctuation">.</span>table<span class="token punctuation">)</span>
# <span class="token class-name">Creating</span> a dataset
myDataset <span class="token operator"><</span><span class="token operator">-</span> <span class="token function">fread</span><span class="token punctuation">(</span><span class="token string">"https://raw.githubusercontent.com/gexijin/learnR/master/datasets/iris.csv"</span><span class="token punctuation">)</span>
# print the iris dataset
<span class="token function">print</span><span class="token punctuation">(</span>myDataset<span class="token punctuation">)</span>
</div>

Output

[1] "data.table" "data.frame"

As you see from the above output, the imported data is directly stored as a data.table.

The data.table generally inherits from a data.frame class and therefore is a data.frame by itself. Therefore, those functions that accept a data.frame will get the job done for data.table as well.

Displaying IRIS Dataset

Example

<div class="code-mirror  language-rscript" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Importing library</span>
library<span class="token punctuation">(</span>data.table<span class="token punctuation">)</span>
<span class="token comment"># Creating a dataset</span>
myDataset <span class="token operator"><-</span> fread<span class="token punctuation">(</span>

"https<span class="token operator">:</span><span class="token operator">/</span><span class="token operator">/</span>raw.githubusercontent.com<span class="token operator">/</span>gexijin<span class="token operator">/</span>learnR<span class="token operator">/</span>master<span class="token operator">/</span>datasets<span class="token operator">/</span>iris.cs
v"<span class="token punctuation">)</span>
<span class="token comment"># print the iris dataset</span>
print<span class="token punctuation">(</span>myDataset<span class="token punctuation">)</span>
</div>

Output

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:        5.1         3.5          1.4         0.2    setosa
  2:        4.9         3.0          1.4         0.2    setosa
  3:        4.7         3.2          1.3         0.2    setosa
  4:        4.6         3.1          1.5         0.2    setosa
  5:        5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:        6.7         3.0          5.2         2.3 virginica
147:        6.3         2.5          5.0         1.9 virginica
148:        6.5         3.0          5.2         2.0 virginica
149:        6.2         3.4          5.4         2.3 virginica
150:        5.9         3.0          5.1         1.8 virginica

There are 150 rows and 5 columns in the Iris data set.

Let?s print first six rows from the iris dataset

head(myDataset)

Output

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:          5.1         3.5          1.4         0.2  setosa
2:          4.9         3.0          1.4         0.2  setosa
3:          4.7         3.2          1.3         0.2  setosa
4:          4.6         3.1          1.5         0.2  setosa
5:          5.0         3.6          1.4         0.2  setosa
6:          5.4         3.9          1.7         0.4  setosa

Filtering Rows Based on a Condition

The main problem with data.frame package was that this package is not well aware of its column names. Therefore, it becomes difficult sometimes when we need to select or filter some rows on the basis of column conditions.

The data.table package comes with advanced features that make it capable of knowing its column names. Using data.table package we can easily filter out rows by passing column conditions inside the square bracket.

myDataset[column_condition]

Here column_condition specifies the column conditions on the basis of which certain rows will be selected.

Let us consider an example to filter the dataset with the condition "Sepal.Length==5.1 & Petal.Length==1.4".

Example

<div class="code-mirror  language-java" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"># <span class="token class-name">Importing</span> library
<span class="token function">library</span><span class="token punctuation">(</span>data<span class="token punctuation">.</span>table<span class="token punctuation">)</span>
# <span class="token class-name">Creating</span> a dataset
myDataset <span class="token operator"><</span><span class="token operator">-</span> <span class="token function">fread</span><span class="token punctuation">(</span>
<span class="token string">"https://raw.githubusercontent.com/gexijin/learnR/master/datasets/iris.csv"</span><span class="token punctuation">)</span>
# datatable syntax <span class="token keyword">to</span> <span class="token namespace">filter</span> rows
# based on column condition
myDataset<span class="token punctuation">[</span><span class="token class-name">Sepal<span class="token punctuation">.</span>Length</span><span class="token operator">==</span><span class="token number">5.1</span> <span class="token operator">&</span> <span class="token class-name">Petal<span class="token punctuation">.</span>Length</span><span class="token operator">==</span><span class="token number">1.4</span><span class="token punctuation">,</span><span class="token punctuation">]</span>
</div>

Output

	    Sepal.Width Petal.Length Petal.Width Species
1:          5.1         3.5          1.4         0.2  setosa
2:          5.1         3.5          1.4         0.3  setosa

As you can see above in the output, two rows have been filtered out that matches with the column condition provided inside of square brackets.

Selecting Columns

We will now see how we can select columns of a dataset using data.table package. The basic syntax of selecting columns is given below,

myDataset[, column_number, with = F]

Her column_number must be equal to the column that you want to subset (Columns are 1-based)

Example

Let?s consider an example in which we want to select second column of the iris dataset ?

<div class="code-mirror  language-java" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token function">library</span><span class="token punctuation">(</span>data<span class="token punctuation">.</span>table<span class="token punctuation">)</span>

# <span class="token class-name">Creating</span> a dataset
myDataset <span class="token operator"><</span><span class="token operator">-</span> <span class="token function">fread</span><span class="token punctuation">(</span>
  <span class="token string">"https://raw.githubusercontent.com/gexijin/learnR/master/datasets/iris.csv"</span><span class="token punctuation">)</span> 
# data<span class="token punctuation">.</span>table syntax <span class="token keyword">to</span> <span class="token namespace">subset</span> second column
myDataset<span class="token punctuation">[</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token keyword">with</span> <span class="token operator">=</span> <span class="token class-name">F</span><span class="token punctuation">]</span>
</div>

Output

     Sepal.Width
  1:         3.5
  2:         3.0
  3:         3.2
  4:         3.1
  5:         3.6
 ---            
146:         3.0
147:         2.5
148:         3.0
149:         3.4
150:         3.0

As you can see above in the output, the second column of the iris dataset is selected.

Example

Now let?s select multiple columns. In the below example, we select two columns, i.e., 'Petal.Length' and 'Species'.

<div class="code-mirror  language-java" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"># <span class="token class-name">Importing</span> library
<span class="token function">library</span><span class="token punctuation">(</span>data<span class="token punctuation">.</span>table<span class="token punctuation">)</span>

# <span class="token class-name">Creating</span> a dataset
myDataset <span class="token operator"><</span><span class="token operator">-</span> <span class="token function">fread</span><span class="token punctuation">(</span>
  <span class="token string">"https://raw.githubusercontent.com/gexijin/learnR/master/datasets/iris.csv"</span><span class="token punctuation">)</span> 

columns <span class="token operator"><</span><span class="token operator">-</span> <span class="token function">c</span><span class="token punctuation">(</span><span class="token string">'Petal.Length'</span><span class="token punctuation">,</span> <span class="token string">'Species'</span><span class="token punctuation">)</span>

# selecting two columns<span class="token operator">-</span> <span class="token string">'Petal.Length'</span> and <span class="token string">'Species'</span>
myDataset<span class="token punctuation">[</span><span class="token punctuation">,</span> columns<span class="token punctuation">,</span> <span class="token keyword">with</span> <span class="token operator">=</span> <span class="token class-name">F</span><span class="token punctuation">]</span>
</div>

Output

     Petal.Length   Species
  1:          1.4    setosa
  2:          1.4    setosa
  3:          1.3    setosa
  4:          1.5    setosa
  5:          1.4    setosa
 ---                       
146:          5.2 virginica
147:          5.0 virginica
148:          5.2 virginica
149:          5.4 virginica
150:          5.1 virginica

Here we selected two columns, 'Petal.Length' and 'Species'.

Conclusion

In this tutorial, we have covered different data manipulation techniques like importing datasets, filtering out rows on the basis of column conditions, etc. I hope this tutorial will help you to strengthen your knowledge in the field of data science.

Bhuwanesh Nainwal

Updated on: 2026-03-11T20:07:51+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next

Article Categories

Data Manipulation in R with data.table

Installating data.table package

Importing Datasets

Example

Output

Displaying IRIS Dataset

Example

Output

Output

Filtering Rows Based on a Condition

Example

Output

Selecting Columns

Example

Output

Example

Output

Conclusion

Learn More in Our Tutorials

Kickstart Your Career