jsoup is a Java based library to work with HTML based content. It provides a very convenient API to extract and manipulate data, using the best of DOM, CSS, and jquery-like methods. It implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

jsonp library provides following functionalities.

  • Multiple Read Support − It reads and parses HTML using URL, file, or string.

  • CSS Selectors − It can find and extract data, using DOM traversal or CSS selectors.

  • DOM Manipulation − It can manipulate the HTML elements, attributes, and text.

  • Prevent XSS attacks − It can clean user-submitted content against a given safe white-list, to prevent XSS attacks.

  • Tidy − It outputs tidy HTML.

  • Handles invalid data − jsoup can handle unclosed tags, implicit tags and can reliably create the document structure.