
- jsoup - Home
- jsoup - Overview
- jsoup - Environment Setup
- Examples - Input
- jsoup - Parsing String
- jsoup - Parsing Body
- jsoup - Loading URL
- jsoup - Loading File
- Examples - Extracting Data
- jsoup - Using DOM Methods
- jsoup - Using Selector Syntax
- jsoup - Extract Attributes
- jsoup - Extract Text
- jsoup - Extract HTML
- jsoup - Working with URLs
- Examples - Modifying Data
- jsoup - Set Attributes
- jsoup - Set HTML
- jsoup - Set Text Content
- Examples - Cleaning HTML
- jsoup - Sanitize HTML
- jsoup Useful Resources
- jsoup - Quick Guide
- jsoup - Useful Resources
- jsoup - Discussion
jsoup - Working with URLs
Overview
Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.
Syntax
String url = "http://www.google.com/"; Document document = Jsoup.connect(url).get(); Element link = document.select("a").first(); System.out.println("Relative Link: " + link.attr("href")); System.out.println("Absolute Link: " + link.attr("abs:href")); System.out.println("Absolute Link: " + link.absUrl("href"));
Where
document − document object represents the HTML DOM.
Jsoup − main class to connect to a url and get the html content.
link − Element object represent the html node element representing anchor tag.
link.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.
link.attr("abs:href") − provides the absolute url after resolving against the document's base URI.
link.absUrl("href") − provides the absolute url after resolving against the document's base URI.
Example - Selecting Attributes of a URL after Connecting
JsoupTester.java
package com.tutorialspoint; import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; public class JsoupTester { public static void main(String[] args) throws IOException { String url = "http://www.google.com/"; Document document = Jsoup.connect(url).get(); Element link = document.select("a").first(); System.out.println("Relative Link: " + link.attr("href")); System.out.println("Absolute Link: " + link.attr("abs:href")); System.out.println("Href: " + link.absUrl("href")); } }
Verify the result
Compile and run the JsoupTester to verify the result −
Relative Link: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header Absolute Link: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header Href: https://about.google/?fg=1&utm_source=google-IN&utm_medium=referral&utm_campaign=hp-header
Example - Getting Exception while Connecting
During connection, we can get exception as well. For example, hitting tutorialspoint.com using http instead of https results in exception.
JsoupTester.java
package com.tutorialspoint; import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; public class JsoupTester { public static void main(String[] args) throws IOException { String url = "http://www.tutorialspoint.com/"; Document document = Jsoup.connect(url).get(); Element link = document.select("a").first(); System.out.println("Relative Link: " + link.attr("href")); System.out.println("Absolute Link: " + link.attr("abs:href")); System.out.println("Href: " + link.absUrl("href")); } }
Verify the result
Compile and run the JsoupTester to verify the result −
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=[http://www.tutorialspoint.com/] at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:913) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:866) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:365) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:350) at com.tutorialspoint.JsoupTester.main(JsoupTester.java:13)