Slice and dice html on the command line using CSS selectors.
Let's say you want to grab all the links on http://example.com/foo/bar:
$ que "a->href" "http://example.com/foo/bar"
Let's say that gave you 3 lines that looked like this:
/some/url?val=1
/some/url2?val=2
/some/url3?val=3
Ugh, that's not very helpful, so let's modify our argument a bit:
$ que "a->http://example.com{href}" "http://example.com/foo/bar"
Now, that will print:
http://example.com/some/url?val=1
http://example.com/some/url2?val=2
http://example.com/some/url3?val=3
Not sure how to use CSS Selectors?
- Beautiful Soup CSS select docs
- JQuery's CSS Selector docs
- Sauce Labs Tutorial
- W3CSchools CSS Selector Reference
The selector is divided into two parts separated by ->
, the first part is the traditional selector talked about in the above links and the second part is the attributes you want to print to the screen for each match:
$ css.selector->attribute,...
The Selector part uses Python's string formatting syntax so you can embed the attributes you want within a larger string.
Find all the "Download" links on a page:
que has support for the the non-standard :contains css selector
$ curl http://example.com | que "a:contains(Download)->href"
Select all the links with attribute data
that starts with "foo":
$ curl http://example.com | que "a[data|=foo]->href"
You can use pip to install stable:
$ pip install que
or the latest and greatest (which might be different than what's on pypi:
$ pip install git+https://github.com/jaymon/que#egg=que
- If you need a way more fully featured html command line parser, try hq.