HTML/XML Parser for Java
What HTML parsers have the following features:
- Reliable and bug-free
- Parses HTML and XML
- Handles erroneous HTML
- Has a DOM implementation
- Relatively simple, object-oriented API
What parser you think is better?
Check out Web Harvest. It's both a library you can use and a data extraction tool, which sounds to me that's exactly what you want to do. You create XML script files to instruct the scraper how to extract the information you need and from where. The provided GUI is very useful to quickly test the scripts.
Check out the project's samples page to see if it's a good fit for what you are trying to do.