jsoup - Working with URLs



Overview

Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.

Syntax

String url = "http://www.google.com/";
Document document = Jsoup.connect(url).get();
Element link = document.select("a").first();         

System.out.println("Relative Link: " + link.attr("href"));
System.out.println("Absolute Link: " + link.attr("abs:href"));
System.out.println("Absolute Link: " + link.absUrl("href"));

Where

  • document − document object represents the HTML DOM.

  • Jsoup − main class to connect to a url and get the html content.

  • link − Element object represent the html node element representing anchor tag.

  • link.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.

  • link.attr("abs:href") − provides the absolute url after resolving against the document's base URI.

  • link.absUrl("href") − provides the absolute url after resolving against the document's base URI.

Example - Selecting Attributes of a URL after Connecting

JsoupTester.java

package com.tutorialspoint;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JsoupTester {
   public static void main(String[] args) throws IOException {
   
      String url = "http://www.google.com/";
      Document document = Jsoup.connect(url).get();

      Element link = document.select("a").first();
      System.out.println("Relative Link: " + link.attr("href"));
      System.out.println("Absolute Link: " + link.attr("abs:href"));
      System.out.println("Href: " + link.absUrl("href"));
   }
}

Verify the result

Compile and run the JsoupTester to verify the result −

Relative Link: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header
Absolute Link: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header
Href: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header

Example - Getting Exception while Connecting

During connection, we can get exception as well. For example, hitting tutorialspoint.com using http instead of https results in exception.

JsoupTester.java

package com.tutorialspoint;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JsoupTester {
   public static void main(String[] args) throws IOException {
   
      String url = "http://www.tutorialspoint.com/";
      Document document = Jsoup.connect(url).get();

      Element link = document.select("a").first();
      System.out.println("Relative Link: " + link.attr("href"));
      System.out.println("Absolute Link: " + link.attr("abs:href"));
      System.out.println("Href: " + link.absUrl("href"));
   }
}

Verify the result

Compile and run the JsoupTester to verify the result −

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=[http://www.tutorialspoint.com/]
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:913)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:866)
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:365)
	at org.jsoup.helper.HttpConnection.get(HttpConnection.java:350)
	at com.tutorialspoint.JsoupTester.main(JsoupTester.java:13)
Advertisements