jsoup - Loading File



Overview

Jsoup.parse(file, string) method can be used to load a file from file system with required encoding string passed.

Syntax

Document document = Jsoup.parse(inputFile, "UTF-8");
System.out.println(document.title());

Where

  • document − document object represents the HTML DOM.

  • Jsoup − main class to parse the given HTML String.

  • inputFile − File object representing the file on file system.

Get the data using document object

Element body = document.body();

Here body represents element children of the document's body element and is equivalent to document.getElementsByTag("body").

Read tag values

Elements paragraphs = body.getElementsByTag("p");
for (Element paragraph : paragraphs) {
   System.out.println(paragraph.text());
}

Following is the html file we've used in this example −

<html>
   <head>
      <title>Sample Title</title>
   </head>
   <body>
      <p>Sample Content</p>
   </body>
</html>

Example - Parsing a local html file and read Title of HTML

JsoupTester.java

package com.tutorialspoint;

import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class JsoupTester {
   public static void main(String[] args) throws IOException {
      File input = new File("test.htm");
      Document document = Jsoup.parse(input, "UTF-8");
      System.out.println(document.title());
   }
}

Verify the result

Compile and run the JsoupTester to verify the result −

Sample Title

Example - Parsing a local html file and read Body of HTML

JsoupTester.java

package com.tutorialspoint;

import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTester {
   public static void main(String[] args) throws IOException {
      File input = new File("test.htm");
      Document document = Jsoup.parse(input, "UTF-8");
      Element body = document.body();
      Elements paragraphs = body.getElementsByTag("p");
      for (Element paragraph : paragraphs) {
         System.out.println(paragraph.text());
      } 
   }
}

Verify the result

Compile and run the JsoupTester to verify the result −

Sample Content
Advertisements