- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to extract website name from their links in R?
If we have a list of website links and we want to extract the website name from those links then it is a time-consuming task because we would need to copy each name one-by-one. Therefore, it is better to extract them using a function in R and save time. To extract the website name from the website link, we can use suffix_extract function of urltools package. This will extract the host, subdomain, domain and suffix. And it is known that the domain values are the website names.
Loading urltools package −
library(urltools)
Website links stored in a vector −
Web_Links<-c("https://www.grammarly.com/grammar-check","https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/","https://www.tutorialspoint.com/machine_learning/index.htm","https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort","https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F","http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245","https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F","https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/","https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi","http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=","https://www.savaari.com/delhi/delhi-to-bareilly-cabs","https://www.olxgroup.com/search/operations/delhi-ncr/all-brands","https://unbelievable-facts.com/work-with-us","https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp","https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/","http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software")
Printing the vector of website links −
Web_Links
[1] "https://www.grammarly.com/grammar-check" [2] "https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/" [3] "https://www.tutorialspoint.com/machine_learning/index.htm" [4] "https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort" [5] "https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F" [6] "http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245" [7] "https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F" [8] "https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/" [9] "https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi" [10] "http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=" [11] "https://www.savaari.com/delhi/delhi-to-bareilly-cabs" [12] "https://www.olxgroup.com/search/operations/delhi-ncr/all-brands" [13] "https://unbelievable-facts.com/work-with-us" [14] "https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp" [15] "https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/" [16] "http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software"
Extracting website names −
host subdomain 1 www.grammarly.com www 2 sceptermarketing.com <NA> 3 www.tutorialspoint.com www 4 www.rdocumentation.org www 5 www-islaah-in.cdn.ampproject.org www-islaah-in.cdn 6 qoitrat.org <NA> 7 theislamicinformation-com.cdn.ampproject.org theislamicinformation-com.cdn 8 parenting.firstcry.com parenting 9 www.amazon.in www 10 ridenow.co.in <NA> 11 www.savaari.com www 12 www.olxgroup.com www 13 unbelievable-facts.com <NA> 14 www.tataaiginsurance.in www 15 www.dummies.com www 16 www.sthda.com www
domain suffix 1 grammarly com 2 sceptermarketing com 3 tutorialspoint com 4 rdocumentation org 5 ampproject org 6 qoitrat org 7 ampproject org 8 firstcry com 9 amazon in 10 ridenow co.in 11 savaari com 12 olxgroup com 13 unbelievable-facts com 14 tataaiginsurance in 15 dummies com 16 sthda com
- Related Articles
- How can BeautifulSoup be used to extract ‘href’ links from a website?
- How to get website links using Invoke-WebRequest in PowerShell?
- How to extract only factor columns name from an R data frame?
- How to extract characters from a string in R?
- How to extract statistical summary from boxplot in R?
- How to extract the frequencies from a histogram in R?
- How to extract first value from a list in R?
- How to extract words from a string vector in R?
- How to extract number from string in R data frame?
- How to remove list elements by their name in R?
- How to extract column name and type from MySQL?
- How can BeautifulSoup package be used to extract the name of the domain of the website in Python?
- How to delete columns by their name in data.table in R?
- How to extract first two characters from a string in R?
- How to extract correlation coefficient value from correlation test in R?

Advertisements