Open source Java HTML parser, with the best of HTML5 DOM methods and CSS selectors, for easy data extraction.
jsoup is a Java library designed for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
0 / day
0 / day
0 pages per visit
Domain Rating
Domain Authority
Citation Level
English, etc
Parse HTML from a URL, file, or string; find and extract data, using DOM traversal or CSS selectors.
Extract and manipulate data, using DOM traversal or CSS selectors.
Clean user-submitted content against a safe white-list, to prevent XSS attacks.
Use CSS selectors to find elements, and then manipulate their attributes, text, and HTML.
Provides a very convenient API for extracting and manipulating data, using the best of DOM methods.
Implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
Works on all major platforms and is compatible with all Java versions from 1.5 onwards.
jsoup is an open-source project distributed under the MIT License.
MIT License
https://github.com/jhy/jsoup
Comprehensive documentation available at https://jsoup.org/cookbook/
Active community support through forums and GitHub issues.
Contributions are welcome. Please read the contributing guide on GitHub.
The latest stable version is 1.14.3 as of the last update.
jsoup has minimal dependencies, making it lightweight and easy to integrate into projects.
Security headers report is a very important part of user data protection. Learn more about http headers for jsoup.org