 
- jsoup - Home
- jsoup - Overview
- jsoup - Environment Setup
- Examples - Input
- jsoup - Parsing String
- jsoup - Parsing Body
- jsoup - Loading URL
- jsoup - Loading File
- Examples - Extracting Data
- jsoup - Using DOM Methods
- jsoup - Using Selector Syntax
- jsoup - Extract Attributes
- jsoup - Extract Text
- jsoup - Extract HTML
- jsoup - Working with URLs
- Examples - Modifying Data
- jsoup - Set Attributes
- jsoup - Set HTML
- jsoup - Set Text Content
- Examples - Cleaning HTML
- jsoup - Sanitize HTML
- jsoup Useful Resources
- jsoup - Quick Guide
- jsoup - Useful Resources
- jsoup - Discussion
jsoup - Overview
Introduction
jsoup is a Java based library to work with HTML based content. It provides a very convenient API to extract and manipulate data, using the best of DOM, CSS, and jquery-like methods. It implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
jsoup libary implements the WHATWG HTML5 specification, and parses an HTML content to the same DOM as per the modern browsers.
Functionalities of jsoup
jsoup library provides following functionalities.
- Multiple Read Support − It reads and parses HTML using URL, file, or string. 
- CSS Selectors − It can find and extract data, using DOM traversal or CSS selectors. 
- DOM Manipulation − It can manipulate the HTML elements, attributes, and text. 
- Prevent XSS attacks − It can clean user-submitted content against a given safe white-list, to prevent XSS attacks. 
- Tidy − It outputs tidy HTML. 
- Handles invalid data − jsoup can handle unclosed tags, implicit tags and can reliably create the document structure.