jsoup - Working with URLs


Advertisements


Following example will showcase methods which can provide relative as well as absolute URLs present in the html page.

Syntax

String url = "http://www.tutorialspoint.com/";
Document document = Jsoup.connect(url).get();
Element link = document.select("a").first();         

System.out.println("Relative Link: " + link.attr("href"));
System.out.println("Absolute Link: " + link.attr("abs:href"));
System.out.println("Absolute Link: " + link.absUrl("href"));

Where

  • document − document object represents the HTML DOM.

  • Jsoup − main class to connect to a url and get the html content.

  • link − Element object represent the html node element representing anchor tag.

  • link.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.

  • link.attr("abs:href") − provides the absolute url after resolving against the document's base URI.

  • link.absUrl("href") − provides the absolute url after resolving against the document's base URI.

Description

Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JsoupTester {
   public static void main(String[] args) throws IOException {
   
      String url = "http://www.tutorialspoint.com/";
      Document document = Jsoup.connect(url).get();

      Element link = document.select("a").first();
      System.out.println("Relative Link: " + link.attr("href"));
      System.out.println("Absolute Link: " + link.attr("abs:href"));
      System.out.println("Absolute Link: " + link.absUrl("href"));
   }
}

Verify the result

Compile the class using javac compiler as follows:

C:\jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.

C:\jsoup>java JsoupTester

See the result.

Relative Link: index.htm
Absolute Link: https://www.tutorialspoint.com/index.htm
Absolute Link: https://www.tutorialspoint.com/index.htm


Advertisements