Xidel – HTML/XML/JSON data extraction tool
From the webpage:
…
FeaturesIt supports:
- Extract expressions:
- CSS 3 Selectors: to extract simple elements
- XPath 3.0: to extract values and calculate things with them
- XQuery 3.0: to create new documents from the extracted values
- JSONiq: to work with JSON apis
- Templates: to extract several expressions in an easy way using a annotated version of the page for pattern-matching
- XPath 2.0/XQuery 1.0: compatibility mode for the old XPath/XQuery version
- Following:
- HTTP Codes: Redirections like 30x are automatically followed, while keeping things like cookies
- Links: It can follow all links on a page as well as some extracted values
- Forms: It can fill in arbitrary data and submit the form
- Output formats:
- Adhoc: just prints the data in a human readable format
- XML: encodes the data as XML
- HTML: encodes the data as HTML
- JSON: encodes the data as JSON
- bash/cmd: exports the data as shell variables
- Connections: HTTP / HTTPS as well as local files or stdin
- Systems: Windows (using wininet), Linux (using synapse+openssl), Mac (synapse)
…
Xidel is a very good excuse to practice your XML (XPath/XQuery) on a daily basis!
Not to mention being an interchangeable way to share web scraping scripts for websites.
Enjoy!