jsoup - Overview



jsoup is a Java based library to work with HTML based content. It provides a very convenient API to extract and manipulate data, using the best of DOM, CSS, and jquery-like methods. It implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

jsoup libary implements the WHATWG HTML5 specification, and parses an HTML content to the same DOM as per the modern browsers.

jsonp library provides following functionalities.

  • Multiple Read Support − It reads and parses HTML using URL, file, or string.

  • CSS Selectors − It can find and extract data, using DOM traversal or CSS selectors.

  • DOM Manipulation − It can manipulate the HTML elements, attributes, and text.

  • Prevent XSS attacks − It can clean user-submitted content against a given safe white-list, to prevent XSS attacks.

  • Tidy − It outputs tidy HTML.

  • Handles invalid data − jsoup can handle unclosed tags, implicit tags and can reliably create the document structure.

Advertisements