How to extract text from a web page using Selenium and save it as a text file?


We can extract text from a webpage using Selenium webdriver and save it as a text file using the getText method. It can extract the text for an element which is displayed (and not hidden by CSS).

We have to locate the element on the page using any of the locators like id, class, name, xpath, css, tag name, link text or partial link text. Once the text is obtained, we shall write its content to a file with the help of File class.

Let us obtain the text – You are browsing the best resource for Online Education from the below page −

Example

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import java.util.concurrent.TimeUnit;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import java.nio.charset.Charset;
public class GetTxtSaveFile{
   public static void main(String[] args) {
      System.setProperty("webdriver.gecko.driver",
         "C:\Users\ghs6kor\Desktop\Java\geckodriver.exe");
      WebDriver driver = new FirefoxDriver();
      //implicit wait
      driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
      //URL launch
      driver.get("https://www.tutorialspoint.com/index.htm");
      // identify element
      WebElement e = driver.findElement(By.tagName("h4"));
      //obtain text
      String s = e.getText();
      //write text to file

      File f = new File("savetxt.txt");
      try{
         FileUtils.writeStringToFile(f, s, Charset.defaultCharset());
      }catch(IOException exc){
         exc.printStackTrace();
      }
      driver.quit();
   }
}

Output

The savetxt.txt file gets generated within the project which captures the text from the page.

Updated on: 06-Apr-2021

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements