How to extract words from a string vector in R?


To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")). If we want to start at any other word then starting value will be changed accordingly.

Example

 Live Demo

x<-c("R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.")
x

Output

[1] "R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S."

Example

library(stringr)
word(x,start=1,end=5,sep=fixed(" "))

Output

[1] "R is a programming language"

Example

word(x,start=1,end=20,sep=fixed(" "))

Output

[1] "R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross"

Example

word(x,start=1,end=10,sep=fixed(" "))

Output

[1] "R is a programming language and software environment for statistical"

Example

word(x,start=1,end=15,sep=fixed(" "))

Output

[1] "R is a programming language and software environment for statistical analysis, graphics representation and reporting."

Example

word(x,start=1,end=50,sep=fixed(" "))

Output

[1] "R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public"

Example

word(x,start=11,end=20,sep=fixed(" "))

Output

[1] "analysis, graphics representation and reporting. R was created by Ross"

Example

word(x,start=51,end=60,sep=fixed(" "))

Output

[1] "License, and pre-compiled binary versions are provided for various operating"

Example

word(x,start=6,end=10,sep=fixed(" "))

Output

[1] "and software environment for statistical"

Example

word(x,start=11,end=60,sep=fixed(" "))

Output

[1] "analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating"

Example

word(x,start=1,end=90,sep=fixed(" "))

Output

[1] "R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka),"

Example

word(x,start=11,end=90,sep=fixed(" "))

Output

[1] "analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka),"

Example

word(x,start=21,end=90,sep=fixed(" "))

Output

[1] "Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka),"

Example

word(x,start=51,end=100,sep=fixed(" "))

Output

[1] "License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell"

Updated on: 10-Feb-2021

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements