-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an extract
method
#2523
Comments
I'm building something to do this right now. I dig the direction you're going. Some ideas... What if an array meant that an array should be returned? In addition to being more ergonomic, I think it will also ensure that the result's type is properly inferred. Also, I think there's a pretty easy way to allow objects to mean objects, even when there are config objects in the mix... const res = $.extract({
singleStr: 'h1', // throws if more than one element selected
arrayOfStr: ['h1, h2'], // uses multiple selectors
tupleOfStr: ['h1', 'h2'], // literally a tuple, throws if either selector returns more than one item
arrayOfObj: [{
// an object is a config if it exactly matches config type, otherwise object return is expected
int: {selector: '#length'},
}],
}); That object idea does leave room for ambiguity, and it will be a bit annoying to type. What about support for nested const res = $.extract({
// nested object
meta: $.extract('#scope', {
deep: meta: $.extract({}),
}),
}); Regarding scoping for performance, do you think there's major gains to be had from scanning the entire extract tree to optimize all the selectors automatically? In my scraping, I trawl every bit of the dom for data redundancy, but the selectors are grouped by the desired bit of data, not their position in the dom. I often wonder if I'm missing out, but I haven't had a chance to test it. Most scraping also includes data processing. How about first-class support for funcs? Here too, you can infer types, including and for more than just strings... const res = $.extract({
singleStr: ($) => $('h1').text().trim(),
arrayOfStr: [
($) => $('h1, h2').toArray().map((el) => $(el).text().trim()),
],
arrayOfObj: [{
int: ($) => parseInt($('#length').text()) || 0,
}],
}); The last thing I'll comment on is type SelectConfig = {
selector?: string;
// XOR these...
parse?: <T>($: CheerioAPI) => T;
content?: 'text' | 'html';
prop?: keyof HTMLElement;
attr?: string;
data?: string;
style?: string;
} Anyway, really cool you're thinking about this direction! This really must be a huge portion of what Cheerio users are doing. |
Thanks for the feedback! See some responses below.
That's the idea!
An individual selector should stand for the first match; I've added a The idea with multiple array elements was to allow users to extract different properties. Eg. $.extract({
titles: [
// The document's `<title>` tag. Will use the `textContent`.
'title',
// The Open Graph `title` property. Will use the `content` attribute.
{ selector: 'meta[property="og:title"]', out: 'content' }
}]
}) Ideally, there should still be a way to limit the number of elements retrieved. That way, we could support use-cases such as https://github.com/microlinkhq/metascraper/blob/b3379a9300ad1ed6de155592866b1e555e1f5382/packages/metascraper-title/index.js
I tried to model this by allow $.extract({
posts: [
{
selector: ".post",
out: {
title: ".title",
body: ".body",
link: { selector: "a :has(> .title)", out: "href" },
},
},
],
}) This extracts the title, body and link for every post; all the nested selectors are relative to
Yes, although this is quite complicated to do and won't be a part of the initial version of this.
I tried to achieve this by allowing functions for the
There are currently three different values: (1) a string that will be passed to Cheerio's If we didn't overload the object, the alternative would be runtime errors for users that don't use TS. Removing that potential issue seems worth the added complexity. As for using |
Just to clarify, I like this solution. Two selectors within an array meaning two values. I'm not sure if it's just me, but I still read your initial spec to mean that
FWIW, this is exactly how I scrape data now. No knowing when Amazon is going to change their DOM, so I have a bunch of selectors for each bit of data, plus a test that picks the best match. I know
The one thing about I'm still inclined to want to keep all my selectors grouped by the data they return (a la the above example) because it makes it much easier to process in the next step. (Otherwise I need to maintain two mappings, rather than just one.)
I don't think there'd be any benefit if you still had to nest the function. A string selector with an out func accomplishes the same thing. I was just thinking about streamlining the interface a bit. It's okay either way.
I just took a look at the prop API. It's much cooler than I'd realized. Maybe the only thing I'll suggest then is that the prop name be changed from |
Hi and thanks for the great library! I noticed the
cheerio blows up with
I'm using version |
This was just merged and a new release hasn't been issued yet. I'm working through my list for remaining changes, so this hopefully won't take long. |
Hi, It looks like this isnt released yet. Any timing updates? |
Any estimation for the new release? |
What happened to this feature, its exactly what I needed and seems to be documented, but it doesn't seem to be available? |
Liked what's been discussed here. Needed this and grew tired of waiting for Cheerio, so I just published my implementation of these ideas + own takes: Might be useful to others as well. |
Thanks for writing this and sharing @denkan, I've been playing with it this evening and its exactly what I need, I will feed back any bugs I find in your own github repo |
Any update ? |
Life... am i right? Great work so far on this all ya'all. Much needed library for sure. Keep up the good work. |
This should not be documented in the user guide, if it's not actually released yet: |
Since the website is also here in this repo, perhaps it would be better to have each release with a corresponding tag. And only the latest released version of the website (with relevant docs) would get actually deployed to the web. |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [cheerio](https://cheerio.js.org/) ([source](https://togithub.com/cheeriojs/cheerio)) | [`1.0.0-rc.9` -> `1.0.0-rc.12`](https://renovatebot.com/diffs/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12) | [![age](https://developer.mend.io/api/mc/badges/age/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>cheeriojs/cheerio (cheerio)</summary> ### [`v1.0.0-rc.12`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.12) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.11...v1.0.0-rc.12) Bugfix release. Fixed issues: - Align `prop` undefined handling with jQuery by [@​fb55](https://togithub.com/fb55) in [https://github.com/cheeriojs/cheerio/pull/2557](https://togithub.com/cheeriojs/cheerio/pull/2557) - Allow deep imports of `cheerio/lib/utils` by [@​blixt](https://togithub.com/blixt) in [https://github.com/cheeriojs/cheerio/pull/2601](https://togithub.com/cheeriojs/cheerio/pull/2601) #### New Contributors - [@​blixt](https://togithub.com/blixt) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2601](https://togithub.com/cheeriojs/cheerio/pull/2601) **Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.11...v1.0.0-rc.12 ### [`v1.0.0-rc.11`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.11) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.10...v1.0.0-rc.11) `[email protected]` is hopefully the last RC before the 1.0.0 release of Cheerio. There are two APIs that will be added for the next major release: An `exract` method ([https://github.com/cheeriojs/cheerio/issues/2523](https://togithub.com/cheeriojs/cheerio/issues/2523)) and NodeJS specific loader methods ([https://github.com/cheeriojs/cheerio/issues/2051](https://togithub.com/cheeriojs/cheerio/issues/2051)). These are still in flux and I'd appreciate feedback on the proposals. A big thank you to everyone that contributed to this release! This includes code contributors, as well as the amazing financial support on [GitHub Sponsors](https://togithub.com/sponsors/cheeriojs)! Under the hood, a lot of work for this release went into updating parse5, cheerio's default HTML parser. Have a look at [parse5's release notes](https://togithub.com/inikulin/parse5/releases/tag/v7.0.0) to see what has changed there. #### Breaking - Cheerio is now a dual CommonJS and ESM module. That means that deep imports will now fail in newer versions of Node. [https://github.com/cheeriojs/cheerio/pull/2508](https://togithub.com/cheeriojs/cheerio/pull/2508) - `script` and `style` contents are added again in `.text()` [https://github.com/cheeriojs/cheerio/pull/2509](https://togithub.com/cheeriojs/cheerio/pull/2509) - To keep the old behavior, switch `.text()` to `.prop('innerText')` - The TypeScript types inherited from upstream dependencies have changed. [https://github.com/cheeriojs/cheerio/pull/2503](https://togithub.com/cheeriojs/cheerio/pull/2503) - Node types are now using tagged unions, which will make consumption a bit easier. #### Features - Relevant options are now forwarded to `cheerio-select` [https://github.com/cheeriojs/cheerio/pull/2511](https://togithub.com/cheeriojs/cheerio/pull/2511) - Custom pseudo classes can now be specified [using the `pseudos` option](https://cheerio.js.org/interfaces/CheerioOptions.html#pseudos). - For the `.prop()` method: - Add `textContent` and `innerText` props [https://github.com/cheeriojs/cheerio/pull/2214](https://togithub.com/cheeriojs/cheerio/pull/2214) - Users can now specify a `baseURI` option, which will lead to `href` and `src` props to be resolved as URLs. [https://github.com/cheeriojs/cheerio/pull/2510](https://togithub.com/cheeriojs/cheerio/pull/2510) - Added a `slim` export, which will always use htmlparser2 [https://github.com/cheeriojs/cheerio/pull/1960](https://togithub.com/cheeriojs/cheerio/pull/1960) #### Fixes - Have `text` turn passed values to strings [https://github.com/cheeriojs/cheerio/pull/2047](https://togithub.com/cheeriojs/cheerio/pull/2047) - Include `undefined` in the return type of `get` by [@​glen-84](https://togithub.com/glen-84) in [https://github.com/cheeriojs/cheerio/pull/2392](https://togithub.com/cheeriojs/cheerio/pull/2392) - Recognise comments as HTML [https://github.com/cheeriojs/cheerio/pull/2504](https://togithub.com/cheeriojs/cheerio/pull/2504) - Add missing `undefined` return value [https://github.com/cheeriojs/cheerio/pull/2505](https://togithub.com/cheeriojs/cheerio/pull/2505) - Export missing static methods [https://github.com/cheeriojs/cheerio/pull/2506](https://togithub.com/cheeriojs/cheerio/pull/2506) - Have style parsing add malformed fields to previous field [https://github.com/cheeriojs/cheerio/pull/2521](https://togithub.com/cheeriojs/cheerio/pull/2521) #### Refactor - Use `domutils` module directly [https://github.com/cheeriojs/cheerio/pull/1928](https://togithub.com/cheeriojs/cheerio/pull/1928) - Hand-roll `isHTML` [https://github.com/cheeriojs/cheerio/pull/1935](https://togithub.com/cheeriojs/cheerio/pull/1935) - Move initialization logic to `load` [https://github.com/cheeriojs/cheerio/pull/1951](https://togithub.com/cheeriojs/cheerio/pull/1951) - Only return elements in `closest` [https://github.com/cheeriojs/cheerio/pull/2057](https://togithub.com/cheeriojs/cheerio/pull/2057) - Remove unnecessary code, be more explicit [https://github.com/cheeriojs/cheerio/pull/2279](https://togithub.com/cheeriojs/cheerio/pull/2279) - Use stricter TS, ESLint configs [https://github.com/cheeriojs/cheerio/pull/2507](https://togithub.com/cheeriojs/cheerio/pull/2507) - Update exported values [https://github.com/cheeriojs/cheerio/pull/2512](https://togithub.com/cheeriojs/cheerio/pull/2512) #### Development Experience - Migrate husky to v6 by [@​DavideViolante](https://togithub.com/DavideViolante) in [https://github.com/cheeriojs/cheerio/pull/1934](https://togithub.com/cheeriojs/cheerio/pull/1934) - Update CI by [@​XhmikosR](https://togithub.com/XhmikosR) in [https://github.com/cheeriojs/cheerio/pull/2149](https://togithub.com/cheeriojs/cheerio/pull/2149) - Set permissions for GitHub actions by [@​neilnaveen](https://togithub.com/neilnaveen) in [https://github.com/cheeriojs/cheerio/pull/2453](https://togithub.com/cheeriojs/cheerio/pull/2453) #### Docs - Update README "is not a web browser" section by [@​mxschmitt](https://togithub.com/mxschmitt) in [https://github.com/cheeriojs/cheerio/pull/2127](https://togithub.com/cheeriojs/cheerio/pull/2127) #### New Contributors - [@​DavideViolante](https://togithub.com/DavideViolante) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/1934](https://togithub.com/cheeriojs/cheerio/pull/1934) - [@​mxschmitt](https://togithub.com/mxschmitt) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2127](https://togithub.com/cheeriojs/cheerio/pull/2127) - [@​glen-84](https://togithub.com/glen-84) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2392](https://togithub.com/cheeriojs/cheerio/pull/2392) - [@​neilnaveen](https://togithub.com/neilnaveen) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2453](https://togithub.com/cheeriojs/cheerio/pull/2453) **Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.10...v1.0.0-rc.11 ### [`v1.0.0-rc.10`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.10) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.9...v1.0.0-rc.10) **Fixes:** - `.html(node)` now moves passed nodes ([#​1923](https://togithub.com/cheeriojs/cheerio/issues/1923), fixes [#​940](https://togithub.com/cheeriojs/cheerio/issues/940)) [`258b26b`](https://togithub.com/cheeriojs/cheerio/commit/258b26b) - Boolean attributes are no longer special in xmlMode ([#​1903](https://togithub.com/cheeriojs/cheerio/issues/1903), fixes [#​1805](https://togithub.com/cheeriojs/cheerio/issues/1805)) [`b393e4a`](https://togithub.com/cheeriojs/cheerio/commit/b393e4a) - Rename parser adapter files ([#​1873](https://togithub.com/cheeriojs/cheerio/issues/1873), fixes [#​1847](https://togithub.com/cheeriojs/cheerio/issues/1847)) [`8f55dd8`](https://togithub.com/cheeriojs/cheerio/commit/8f55dd8) - Make `filter` work on all collections ([#​1870](https://togithub.com/cheeriojs/cheerio/issues/1870), fixes [#​1867](https://togithub.com/cheeriojs/cheerio/issues/1867)) [`fb8d31e`](https://togithub.com/cheeriojs/cheerio/commit/fb8d31e) - Bump cheerio-select ([#​1922](https://togithub.com/cheeriojs/cheerio/issues/1922), fixes https://www.npmjs.com/advisories/1754) [`5cd2b9c`](https://togithub.com/cheeriojs/cheerio/commit/5cd2b9c) **Documentation:** - Document how to define TS types for Plug-Ins ([#​1915](https://togithub.com/cheeriojs/cheerio/issues/1915), fixes [#​1778](https://togithub.com/cheeriojs/cheerio/issues/1778)) [`880fd2c`](https://togithub.com/cheeriojs/cheerio/commit/880fd2c) - Remove obsolete Testing section [`e0c7cbb`](https://togithub.com/cheeriojs/cheerio/commit/e0c7cbb) - Remove now-invalid `require` [`5dfbd35`](https://togithub.com/cheeriojs/cheerio/commit/5dfbd35) **Refactors:** - Wrap shared behavior in `traversing` ([#​1909](https://togithub.com/cheeriojs/cheerio/issues/1909)) [`58e090a`](https://togithub.com/cheeriojs/cheerio/commit/58e090a) - Move `is` to `traversing`, optimize ([#​1908](https://togithub.com/cheeriojs/cheerio/issues/1908)) [`1c6fa3e`](https://togithub.com/cheeriojs/cheerio/commit/1c6fa3e) - Change order of arguments of internal `domEach` ([#​1892](https://togithub.com/cheeriojs/cheerio/issues/1892)) [`feda230`](https://togithub.com/cheeriojs/cheerio/commit/feda230) - Have `load` export a function ([#​1869](https://togithub.com/cheeriojs/cheerio/issues/1869)) [`c370f4e`](https://togithub.com/cheeriojs/cheerio/commit/c370f4e) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/sammyfilly/Canary-nextjs).
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [cheerio](https://cheerio.js.org/) ([source](https://togithub.com/cheeriojs/cheerio)) | [`1.0.0-rc.9` -> `1.0.0-rc.12`](https://renovatebot.com/diffs/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12) | [![age](https://developer.mend.io/api/mc/badges/age/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>cheeriojs/cheerio (cheerio)</summary> ### [`v1.0.0-rc.12`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.12) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.11...v1.0.0-rc.12) Bugfix release. Fixed issues: - Align `prop` undefined handling with jQuery by [@​fb55](https://togithub.com/fb55) in [https://github.com/cheeriojs/cheerio/pull/2557](https://togithub.com/cheeriojs/cheerio/pull/2557) - Allow deep imports of `cheerio/lib/utils` by [@​blixt](https://togithub.com/blixt) in [https://github.com/cheeriojs/cheerio/pull/2601](https://togithub.com/cheeriojs/cheerio/pull/2601) #### New Contributors - [@​blixt](https://togithub.com/blixt) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2601](https://togithub.com/cheeriojs/cheerio/pull/2601) **Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.11...v1.0.0-rc.12 ### [`v1.0.0-rc.11`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.11) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.10...v1.0.0-rc.11) `[email protected]` is hopefully the last RC before the 1.0.0 release of Cheerio. There are two APIs that will be added for the next major release: An `exract` method ([https://github.com/cheeriojs/cheerio/issues/2523](https://togithub.com/cheeriojs/cheerio/issues/2523)) and NodeJS specific loader methods ([https://github.com/cheeriojs/cheerio/issues/2051](https://togithub.com/cheeriojs/cheerio/issues/2051)). These are still in flux and I'd appreciate feedback on the proposals. A big thank you to everyone that contributed to this release! This includes code contributors, as well as the amazing financial support on [GitHub Sponsors](https://togithub.com/sponsors/cheeriojs)! Under the hood, a lot of work for this release went into updating parse5, cheerio's default HTML parser. Have a look at [parse5's release notes](https://togithub.com/inikulin/parse5/releases/tag/v7.0.0) to see what has changed there. #### Breaking - Cheerio is now a dual CommonJS and ESM module. That means that deep imports will now fail in newer versions of Node. [https://github.com/cheeriojs/cheerio/pull/2508](https://togithub.com/cheeriojs/cheerio/pull/2508) - `script` and `style` contents are added again in `.text()` [https://github.com/cheeriojs/cheerio/pull/2509](https://togithub.com/cheeriojs/cheerio/pull/2509) - To keep the old behavior, switch `.text()` to `.prop('innerText')` - The TypeScript types inherited from upstream dependencies have changed. [https://github.com/cheeriojs/cheerio/pull/2503](https://togithub.com/cheeriojs/cheerio/pull/2503) - Node types are now using tagged unions, which will make consumption a bit easier. #### Features - Relevant options are now forwarded to `cheerio-select` [https://github.com/cheeriojs/cheerio/pull/2511](https://togithub.com/cheeriojs/cheerio/pull/2511) - Custom pseudo classes can now be specified [using the `pseudos` option](https://cheerio.js.org/interfaces/CheerioOptions.html#pseudos). - For the `.prop()` method: - Add `textContent` and `innerText` props [https://github.com/cheeriojs/cheerio/pull/2214](https://togithub.com/cheeriojs/cheerio/pull/2214) - Users can now specify a `baseURI` option, which will lead to `href` and `src` props to be resolved as URLs. [https://github.com/cheeriojs/cheerio/pull/2510](https://togithub.com/cheeriojs/cheerio/pull/2510) - Added a `slim` export, which will always use htmlparser2 [https://github.com/cheeriojs/cheerio/pull/1960](https://togithub.com/cheeriojs/cheerio/pull/1960) #### Fixes - Have `text` turn passed values to strings [https://github.com/cheeriojs/cheerio/pull/2047](https://togithub.com/cheeriojs/cheerio/pull/2047) - Include `undefined` in the return type of `get` by [@​glen-84](https://togithub.com/glen-84) in [https://github.com/cheeriojs/cheerio/pull/2392](https://togithub.com/cheeriojs/cheerio/pull/2392) - Recognise comments as HTML [https://github.com/cheeriojs/cheerio/pull/2504](https://togithub.com/cheeriojs/cheerio/pull/2504) - Add missing `undefined` return value [https://github.com/cheeriojs/cheerio/pull/2505](https://togithub.com/cheeriojs/cheerio/pull/2505) - Export missing static methods [https://github.com/cheeriojs/cheerio/pull/2506](https://togithub.com/cheeriojs/cheerio/pull/2506) - Have style parsing add malformed fields to previous field [https://github.com/cheeriojs/cheerio/pull/2521](https://togithub.com/cheeriojs/cheerio/pull/2521) #### Refactor - Use `domutils` module directly [https://github.com/cheeriojs/cheerio/pull/1928](https://togithub.com/cheeriojs/cheerio/pull/1928) - Hand-roll `isHTML` [https://github.com/cheeriojs/cheerio/pull/1935](https://togithub.com/cheeriojs/cheerio/pull/1935) - Move initialization logic to `load` [https://github.com/cheeriojs/cheerio/pull/1951](https://togithub.com/cheeriojs/cheerio/pull/1951) - Only return elements in `closest` [https://github.com/cheeriojs/cheerio/pull/2057](https://togithub.com/cheeriojs/cheerio/pull/2057) - Remove unnecessary code, be more explicit [https://github.com/cheeriojs/cheerio/pull/2279](https://togithub.com/cheeriojs/cheerio/pull/2279) - Use stricter TS, ESLint configs [https://github.com/cheeriojs/cheerio/pull/2507](https://togithub.com/cheeriojs/cheerio/pull/2507) - Update exported values [https://github.com/cheeriojs/cheerio/pull/2512](https://togithub.com/cheeriojs/cheerio/pull/2512) #### Development Experience - Migrate husky to v6 by [@​DavideViolante](https://togithub.com/DavideViolante) in [https://github.com/cheeriojs/cheerio/pull/1934](https://togithub.com/cheeriojs/cheerio/pull/1934) - Update CI by [@​XhmikosR](https://togithub.com/XhmikosR) in [https://github.com/cheeriojs/cheerio/pull/2149](https://togithub.com/cheeriojs/cheerio/pull/2149) - Set permissions for GitHub actions by [@​neilnaveen](https://togithub.com/neilnaveen) in [https://github.com/cheeriojs/cheerio/pull/2453](https://togithub.com/cheeriojs/cheerio/pull/2453) #### Docs - Update README "is not a web browser" section by [@​mxschmitt](https://togithub.com/mxschmitt) in [https://github.com/cheeriojs/cheerio/pull/2127](https://togithub.com/cheeriojs/cheerio/pull/2127) #### New Contributors - [@​DavideViolante](https://togithub.com/DavideViolante) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/1934](https://togithub.com/cheeriojs/cheerio/pull/1934) - [@​mxschmitt](https://togithub.com/mxschmitt) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2127](https://togithub.com/cheeriojs/cheerio/pull/2127) - [@​glen-84](https://togithub.com/glen-84) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2392](https://togithub.com/cheeriojs/cheerio/pull/2392) - [@​neilnaveen](https://togithub.com/neilnaveen) made their first contribution in [https://github.com/cheeriojs/cheerio/pull/2453](https://togithub.com/cheeriojs/cheerio/pull/2453) **Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.10...v1.0.0-rc.11 ### [`v1.0.0-rc.10`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.10) [Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.9...v1.0.0-rc.10) **Fixes:** - `.html(node)` now moves passed nodes ([#​1923](https://togithub.com/cheeriojs/cheerio/issues/1923), fixes [#​940](https://togithub.com/cheeriojs/cheerio/issues/940)) [`258b26b`](https://togithub.com/cheeriojs/cheerio/commit/258b26b) - Boolean attributes are no longer special in xmlMode ([#​1903](https://togithub.com/cheeriojs/cheerio/issues/1903), fixes [#​1805](https://togithub.com/cheeriojs/cheerio/issues/1805)) [`b393e4a`](https://togithub.com/cheeriojs/cheerio/commit/b393e4a) - Rename parser adapter files ([#​1873](https://togithub.com/cheeriojs/cheerio/issues/1873), fixes [#​1847](https://togithub.com/cheeriojs/cheerio/issues/1847)) [`8f55dd8`](https://togithub.com/cheeriojs/cheerio/commit/8f55dd8) - Make `filter` work on all collections ([#​1870](https://togithub.com/cheeriojs/cheerio/issues/1870), fixes [#​1867](https://togithub.com/cheeriojs/cheerio/issues/1867)) [`fb8d31e`](https://togithub.com/cheeriojs/cheerio/commit/fb8d31e) - Bump cheerio-select ([#​1922](https://togithub.com/cheeriojs/cheerio/issues/1922), fixes https://www.npmjs.com/advisories/1754) [`5cd2b9c`](https://togithub.com/cheeriojs/cheerio/commit/5cd2b9c) **Documentation:** - Document how to define TS types for Plug-Ins ([#​1915](https://togithub.com/cheeriojs/cheerio/issues/1915), fixes [#​1778](https://togithub.com/cheeriojs/cheerio/issues/1778)) [`880fd2c`](https://togithub.com/cheeriojs/cheerio/commit/880fd2c) - Remove obsolete Testing section [`e0c7cbb`](https://togithub.com/cheeriojs/cheerio/commit/e0c7cbb) - Remove now-invalid `require` [`5dfbd35`](https://togithub.com/cheeriojs/cheerio/commit/5dfbd35) **Refactors:** - Wrap shared behavior in `traversing` ([#​1909](https://togithub.com/cheeriojs/cheerio/issues/1909)) [`58e090a`](https://togithub.com/cheeriojs/cheerio/commit/58e090a) - Move `is` to `traversing`, optimize ([#​1908](https://togithub.com/cheeriojs/cheerio/issues/1908)) [`1c6fa3e`](https://togithub.com/cheeriojs/cheerio/commit/1c6fa3e) - Change order of arguments of internal `domEach` ([#​1892](https://togithub.com/cheeriojs/cheerio/issues/1892)) [`feda230`](https://togithub.com/cheeriojs/cheerio/commit/feda230) - Have `load` export a function ([#​1869](https://togithub.com/cheeriojs/cheerio/issues/1869)) [`c370f4e`](https://togithub.com/cheeriojs/cheerio/commit/c370f4e) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/X-oss-byte/Nextjs).
Ugh, why is this feature documented if it's not actually released yet? 😢 |
Super confusing and time consuming to read docs added by this commit 976b087 for a proposed feature with no apparent implementation work evident in the repo. A new user like me, while not wanting to be mistaken for an ungrateful or entitled whiner, is left wondering if this kind of thing is representative of what I should expect from the rest of cheerio or if this is a rare exception. |
Remove it from the docs, if its not in the latest release. |
Where is |
May 2024, still not implemented and still on the docs? Or why am I getting |
Why would you take the time to document a feature that's not implemented? So weird! |
It is implemented, just not released yet. |
Its been a year and more, and the documentation shows it. Can haz,plz? |
So do we just check out and build the main branch to get this? |
A much needed feature, documented for 2+ years, implemented but not released ...... It would be funny if we didn't all lose time on this. It shouldn't be needed to say to release documentation and features together |
Fixed with the 1.0 release! |
One common use-case for cheerio is to extract multiple values from a document, and store them in an object. Doing so manually currently isn't a great experience. There are several packages built on top of cheerio that improve this: For example https://github.com/matthewmueller/x-ray and https://github.com/IonicaBizau/scrape-it. Commercial scraping providers also allow bulk extractions: https://www.scrapingbee.com/documentation/data-extraction/
We should add an API to make this use-case easier. The API should be along the lines of:
The text was updated successfully, but these errors were encountered: