-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(data): add jsDelivr hits #263
Conversation
This is cool, thanks! |
@Haroenv we didn't make any index config changes as I thought that's best left up to you but we'd want to:
|
cc @MartinKolarik, to see if this is proper usage of the API (seeing that we will be calling it ±1000000 times spread over a day and a few thousand times as well regularly |
I didn’t look at the implementation yet, note that the process regularly gets shut down and started again, so any timers won’t work :) I suggest it to follow the same flow as requesting things from npm |
I believe this will work fine but let me know what you think after you check the code. It loads all stats on startup (there's no perf difference in getting stats for all packages vs just a hundred) and then updates the data every 24 hours if the process still runs. |
@@ -14,11 +14,9 @@ export function info() { | |||
})); | |||
} | |||
|
|||
const logWarning = ({ error, type, packages }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Haroenv just a note that changes in this file are a bugfix not related to jsDelivr. This function expected packages
to be an array but it was actually a string everywhere so it would fail on packages.join()
src/jsDelivr.js
Outdated
import c from './config.js'; | ||
import log from './log.js'; | ||
|
||
const hits = new Map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work in memory? I'd expect this to blow up the memory usage (which is already at / over the limit of our runtime now).
I like the idea of fetching it on beforehand, but I wonder if it shouldn't be something like this:
- export a function which starts this caching
- call that at init phase in index.js rather than a side-effect of this module
- persist and read it from disk (decide whether this it possible / worth it, we're using Heroku)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called in init phase, the side-effect here is only the setInterval
which runs later to refresh the data. I agree that it isn't so nice but wasn't sure if there's a better place to put it.
It's less than 1 MB in JSON, I'd expect not much more in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were talking about a significantly bigger amount of data, 1MB shouldn't be a problem
src/jsDelivr.js
Outdated
return pkgs.map(({ name }) => ({ jsDelivrHits: hits.get(name) || 0 })); | ||
} | ||
|
||
setInterval(loadHits, 24 * 60 * 60 * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ms('1 day')
is preferable here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we do a weekly "bootstrap", where we throw away all data and start from scratch. It's probably enough to have data up to date weekly, rather than adding a timer IMO
src/jsDelivr.js
Outdated
|
||
export async function loadHits() { | ||
const hitsJSONpromise = got(c.jsDelivrHitsEndpoint, { json: true }).catch( | ||
error => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it would be better to reject with the error here because if this fails at the start there won't be hits for any packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should probably reject here, since otherwise every package will log if this can't be loaded
Updated as per comments |
@Haroenv anything else we should address? |
Nothing particular, I just haven't had the time to try this out (see the perf changes) |
Ok 😄 @drgy one thing we should probably do is add this to the schema in the readme: https://github.com/algolia/npm-search#schema |
I'll try this first thing tomorrow |
Awesome. |
seems to be working ok so far, will see how it changes over the coming 20-ish hours the replication takes. Some packages already have the jsDelivr number there in the -bootstrap index |
Fully live now |
All live now, you're good to use |
@Haroenv thanks! Now the followup question is, how do we use this as an additional ranking criteria? |
We'd have to set up a replica with that different ranking. I'll take a look at that soon-ish, since I need to discuss if there's place on the machine we're using for a double in records :) |
Would it be possible to use jsDelivr hits and npm downloads at the same time? I'd like to achieve something like this for packages with the same textual relevance:
Alternatively, the 2nd and 3rd position could be swapped in which case this might make sense even for the main index and remove the need of a replica. |
that could be possible as well. if you have a suggestion for the settings for something that works for npm downloads, as well as jsdelivr I'll accept the PR. However note that the tie breaking already is more complex than that (you can see it in config.js) |
# 1.0.0 (2021-07-19) ### Bug Fixes * 1.0.1 ([#655](#655)) ([5c2cb7f](5c2cb7f)) * add expiresAt field ([#643](#643)) ([dba5d2a](dba5d2a)) * add new worker to bootstrap ([#636](#636)) ([ebbe3df](ebbe3df)) * cache dns ([#654](#654)) ([e80d437](e80d437)) * cache total downloads ([#653](#653)) ([99be307](99be307)) * deprecated facets should be boolean ([#638](#638)) ([19d30d0](19d30d0)) * docker build ([#651](#651)) ([947058d](947058d)) * expiresAt can be a numericFilter ([#664](#664)) ([e89fd14](e89fd14)) * improve logging + remove catchup ([#647](#647)) ([cbc545d](cbc545d)) * increase mem + round downloadRatio ([#644](#644)) ([8ef8425](8ef8425)) * mini fixes ([#659](#659)) ([d34bcc1](d34bcc1)) * setup circleci ([#593](#593)) ([4472405](4472405)) * stop using unpkg ([#658](#658)) ([aae2d86](aae2d86)) * throw outside try ([#661](#661)) ([d36a77a](d36a77a)) * typo ([#637](#637)) ([94851af](94851af)) * up semantic release ([#667](#667)) ([94d8d6c](94d8d6c)) * various ([#663](#663)) ([18fea1e](18fea1e)) * **algolia:** missing config param ([#387](#387)) ([d25ea19](d25ea19)) * **alternative names:** remove prismjs -> prismjs.js ([a1bad34](a1bad34)) * **deps:** update dependency @sentry/node to v5.10.2 ([9c445b0](9c445b0)) * **deps:** update dependency @sentry/node to v5.11.0 ([a858954](a858954)) * **deps:** update dependency @sentry/node to v5.12.4 ([efd6140](efd6140)) * **deps:** update dependency @sentry/node to v5.15.4 ([965fffb](965fffb)) * **deps:** update dependency @sentry/node to v5.15.5 ([89f234e](89f234e)) * **deps:** update dependency @sentry/node to v5.17.0 ([3563f6d](3563f6d)) * **deps:** update dependency @sentry/node to v5.19.1 ([394cb8c](394cb8c)) * **deps:** update dependency @sentry/node to v5.30.0 ([56421c5](56421c5)) * **deps:** update dependency @sentry/node to v5.6.2 ([667e12f](667e12f)) * **deps:** update dependency @sentry/node to v5.7.0 ([55b410d](55b410d)) * **deps:** update dependency @sentry/node to v5.7.1 ([bec31ba](bec31ba)) * **deps:** update dependency @sentry/node to v5.9.0 ([6599c79](6599c79)) * **deps:** update dependency algoliasearch to v3.34.0 ([11f49b6](11f49b6)) * **deps:** update dependency algoliasearch to v3.35.0 ([c4faa7a](c4faa7a)) * **deps:** update dependency algoliasearch to v3.35.1 ([837ba44](837ba44)) * **deps:** update dependency algoliasearch to v4.9.3 ([#628](#628)) ([78e3617](78e3617)) * **deps:** update dependency async to v2.6.3 ([4a9cf53](4a9cf53)) * **deps:** update dependency async to v3.2.0 ([3aa436e](3aa436e)) * **deps:** update dependency bunyan to v1.8.15 ([912e7bc](912e7bc)) * **deps:** update dependency dotenv to v8.1.0 ([b785e8f](b785e8f)) * **deps:** update dependency dotenv to v8.2.0 ([ad5f3fb](ad5f3fb)) * **deps:** update dependency dtrace-provider to v0.8.8 ([4879231](4879231)) * **deps:** update dependency gravatar-url to v3.1.0 ([f66b8ee](f66b8ee)) * **deps:** update dependency hot-shots to v6.4.1 ([f84aa5f](f84aa5f)) * **deps:** update dependency hot-shots to v6.5.1 ([2bdeb8e](2bdeb8e)) * **deps:** update dependency hot-shots to v6.8.1 ([1a58429](1a58429)) * **deps:** update dependency hot-shots to v6.8.2 ([a09e193](a09e193)) * **deps:** update dependency hot-shots to v6.8.5 ([871e2e5](871e2e5)) * **deps:** update dependency hot-shots to v6.8.7 ([fc61f4b](fc61f4b)) * **deps:** update dependency lodash to v4.17.13 [security] ([ad8a7ea](ad8a7ea)) * **deps:** update dependency lodash to v4.17.14 ([10e1777](10e1777)) * **deps:** update dependency lodash to v4.17.15 ([a0f2d0d](a0f2d0d)) * **deps:** update dependency lodash to v4.17.19 [security] ([38bd4e0](38bd4e0)) * **deps:** update dependency lodash to v4.17.21 ([baf7442](baf7442)) * **deps:** update dependency ms to v2.1.3 ([b4f0289](b4f0289)) * **deps:** update dependency nano to v8.2.2 ([a4befee](a4befee)) * **deps:** update dependency nano to v8.2.3 ([2c2272c](2c2272c)) * **deps:** update dependency nice-package to v3.1.2 ([55d8953](55d8953)) * **deps:** update dependency object-sizeof to v1.5.1 ([33296d3](33296d3)) * **deps:** update dependency object-sizeof to v1.5.2 ([eeb434a](eeb434a)) * **deps:** update dependency object-sizeof to v1.6.0 ([715f2f6](715f2f6)) * **deps:** update dependency object-sizeof to v1.6.1 ([24945f3](24945f3)) * **dev:** upgrade env ([#592](#592)) ([3c66c56](3c66c56)) * **dev:** upgrade env /2 ([#595](#595)) ([a86cd71](a86cd71)) * **formatPkg:** remove non-existing versions ([c37d6d6](c37d6d6)), closes [#534](#534) * **package.json:** add repo url ([#649](#649)) ([6b248b5](6b248b5)) * empty change ([#405](#405)) ([475e366](475e366)) * id of null ([#406](#406)) ([8e5fb1d](8e5fb1d)) * kill process regurlarly, for cache and bootstrap ([#412](#412)) ([9c778b2](9c778b2)) * **esm:** avoid errors, slightly deal with arrays ([f5eefa9](f5eefa9)) * **formatPkg:** cleaned main can be an array ([#395](#395)) ([7ef7f2f](7ef7f2f)) * **getFilesList:** call using package object ([6b954d5](6b954d5)) * **jsdelivr:** fetch just npm hits ([#375](#375)) ([25d29dd](25d29dd)), closes [#371](#371) * **lint:** correct setup to require extension ([#381](#381)) ([29afbd5](29afbd5)) * **saveDocs:** filter out wrong docs more robustly ([bc81351](bc81351)) * **size:** more exact truncating of readme ([#559](#559)) ([f6187c1](f6187c1)) * **ts:** main can be array ([b619daa](b619daa)) * **TS:** infer definitions correctly ([#357](#357)) ([143aa06](143aa06)) * **TS:** pass correct object ([cdf334b](cdf334b)) * **TS:** support scoped packages ([#364](#364)) ([655e86a](655e86a)) * **unpkg:** remove json flag + add unit test ([#392](#392)) ([d706694](d706694)) * import correctly got ([bb11884](bb11884)) * multiple small bugs after [#379](#379) ([#380](#380)) ([0580052](0580052)) * **config:** fully correct objectIDs ([b25fd81](b25fd81)) * **config:** use allowed chars for objectID ([34f41bb](34f41bb)) * **deps:** update dependency algoliasearch to v3.27.0 ([6c87eed](6c87eed)) * **deps:** update dependency algoliasearch to v3.27.1 ([0985d20](0985d20)) * **deps:** update dependency algoliasearch to v3.28.0 ([d48ad9c](d48ad9c)) * **deps:** update dependency algoliasearch to v3.29.0 ([d6057d5](d6057d5)) * **deps:** update dependency algoliasearch to v3.30.0 ([1a571ad](1a571ad)) * **deps:** update dependency algoliasearch to v3.31.0 ([5448c89](5448c89)) * **deps:** update dependency algoliasearch to v3.32.0 ([f52c1a8](f52c1a8)) * **deps:** update dependency algoliasearch to v3.32.1 ([c93f30f](c93f30f)) * **deps:** update dependency algoliasearch to v3.33.0 ([e26d4d9](e26d4d9)) * **deps:** update dependency async to v2.6.2 ([f9a9cb3](f9a9cb3)) * **deps:** update dependency babel-preset-env to v1.7.0 ([9081d2d](9081d2d)) * **deps:** update dependency bunyan-debug-stream to v1.1.0 ([f3c9d7e](f3c9d7e)) * **deps:** update dependency bunyan-debug-stream to v1.1.1 ([deccb8b](deccb8b)) * **deps:** update dependency dotenv to v6 ([#213](#213)) ([1b40279](1b40279)) * **deps:** update dependency dotenv to v6.1.0 ([0c8cc10](0c8cc10)) * **deps:** update dependency dotenv to v6.2.0 ([a54c1eb](a54c1eb)) * **deps:** update dependency got to v8.3.1 ([2376f53](2376f53)) * **deps:** update dependency got to v8.3.2 ([fcf2550](fcf2550)) * **deps:** update dependency hosted-git-info to v2.7.1 ([751b0af](751b0af)) * **deps:** update dependency lodash to v4.17.10 ([075a877](075a877)) * **deps:** update dependency lodash to v4.17.11 ([e49680a](e49680a)) * **deps:** update dependency ms to v2.1.2 ([cb207be](cb207be)) * **deps:** update dependency nice-package to v3.0.4 ([7a2b490](7a2b490)) * **deps:** update dependency nice-package to v3.1.0 ([361d409](361d409)) * **deps:** update dependency object-sizeof to v1.3.0 ([976f0fd](976f0fd)) * **deps:** update dependency object-sizeof to v1.3.1 ([fe25f6a](fe25f6a)) * **deps:** update dependency object-sizeof to v1.4.0 ([ad57ee8](ad57ee8)) * **formatPkg:** correct name ([b8175f3](b8175f3)) * **formatPkg:** don't discard packages without author, but with owners[] ([da66fb9](da66fb9)) * **npm:** allow undefined downloads ([a0d9c5a](a0d9c5a)) * **npm:** catch errors ([483c0c4](483c0c4)) * **stage:** push correct stage to statemanager ([00b0571](00b0571)) * **ts:** no double slashes ([dd84f88](dd84f88)) * **unpkg:** catch errors ([4efcd01](4efcd01)) * set settings on bootstrap when we start ([e35c0d1](e35c0d1)) * wait for deletion to happen beore continuing ([0734436](0734436)) * **bootstrap:** move to production only in bootstrap ([#126](#126)) ([b26dce6](b26dce6)) * **changelog:** add defaults to catch errors properly ([91e6ebd](91e6ebd)) * **changelog:** fall back to master if the gitHead is undefined ([52fe6ff](52fe6ff)) * **changelogs:** guard for null and undefined ([0a0a748](0a0a748)) * **computed:** use the cleaned package to match keys ([44a839c](44a839c)) * **deletes:** handle npm deletions ([1ad5025](1ad5025)) * **dependedUpon:** encode start and en keys ([24c5fe9](24c5fe9)) * **deps:** pin dependencies ([d1c1377](d1c1377)) * **deps:** update dependency algoliasearch to v3.24.11 ([e8a61bc](e8a61bc)) * **deps:** update dependency algoliasearch to v3.24.12 ([cea8a73](cea8a73)) * **deps:** update dependency algoliasearch to v3.25.1 ([7457f4e](7457f4e)) * **deps:** update dependency algoliasearch to v3.26.0 ([6fde846](6fde846)) * **deps:** update dependency dotenv to v5.0.0 ([#107](#107)) ([e972e19](e972e19)) * **deps:** update dependency dotenv to v5.0.1 ([acc314c](acc314c)) * **deps:** update dependency got to v8.0.3 ([2717b36](2717b36)) * **deps:** update dependency got to v8.2.0 ([64c2318](64c2318)) * **deps:** update dependency got to v8.3.0 ([19efaf8](19efaf8)) * **deps:** update dependency hosted-git-info to v2.6.0 ([0091297](0091297)) * **deps:** update dependency lodash to v4.17.5 ([d07ad04](d07ad04)) * **downloads:** be resilient for 404 or downloads endpoint for a chunk ([866fbcf](866fbcf)) * **downloads:** filter out scoped packages ([76f571a](76f571a)), closes [#36](#36) * **downloads:** set default of 0 ([d157450](d157450)) * **formatPkg:** rewrite get info into separate functions ([92706ce](92706ce)) * **gitHead:** fix bad backward compat ([dc34d24](dc34d24)) * **gitHead:** fix bad backward compat ([195108f](195108f)) * **gitHead:** put back gitHead ([92d373d](92d373d)), closes [#53](#53) [#64](#64) * **memleak:** in watch mode, do not use promise chain ([3f2e860](3f2e860)) * **memleak:** maybe fix it ([711b830](711b830)) * **memory:** don't keep a reference of the `chain` in watch ([f973913](f973913)) * **merge:** bad merge from me ([359f498](359f498)) * **schema:** backwards-compatible ([1b24b21](1b24b21)) * **settings:** put synonyms and rules in the configure file ([#128](#128)) ([af8e709](af8e709)), closes [#123](#123) * **stateManager:** don't assume starting at "zzz" ([c79e6cd](c79e6cd)) * **timeouts:** increase pouch timeout ([#174](#174)) ([a9ccb77](a9ccb77)) * **url:** try to fix url for good ([7da9daf](7da9daf)) * **watch:** add missing return ([30f6e43](30f6e43)) * **watch:** avoid memleak by not piling up docs ([#130](#130)) ([4522ee5](4522ee5)) ### Features * add health API ([#650](#650)) ([95587a3](95587a3)) * add methods to process a single package ([#652](#652)) ([a3c41f3](a3c41f3)) * prepare docker ([#648](#648)) ([21b5d02](21b5d02)) * process package in queue instead of batch ([#656](#656)) ([c4f2aa2](c4f2aa2)) * **babel:** add a forced keyword to babel plugins ([440f344](440f344)) * **changelog:** add changes variations ([e5ce4dc](e5ce4dc)) * **changelog:** detect /changelog.markdown ([bcf21a1](bcf21a1)) * **changelog:** get from jsDelivr filelist if possible ([#640](#640)) ([dd386d2](dd386d2)) * **data:** add "bin" ([446d212](446d212)) * **data:** add "versions" attribute ([766a9c3](766a9c3)) * **data:** add concatenated name ([72ab12e](72ab12e)), closes [#33](#33) * **data:** add flagging of type=module ([#386](#386)) ([7cd0765](7cd0765)) * **data:** add jsDelivr hits ([#263](#263)) ([adff89d](adff89d)) * **deprecated:** add the attribute for faceting ([#160](#160)) ([afe02c8](afe02c8)), closes [#159](#159) * **devDeps:** add devDependencies ([01058ef](01058ef)) * **faceting:** allow searching in keywords and owner ([8dd2cda](8dd2cda)) * **formatPkg:** add .js to alternative names ([#383](#383)) ([8463308](8463308)), closes [#217](#217) * **jsDelivr:** move code, add tests, preload data correctly ([#384](#384)) ([373d341](373d341)) * **keywords:** add webpack-scaffold ([#296](#296)) ([d4e57a7](d4e57a7)) * **npm:** Include directory details from repository objects ([#320](#320)) ([ccb1766](ccb1766)) * **process:** redo bootstrap after X amount of time ([a79d999](a79d999)), closes [#20](#20) * **quality:** add a flag for very low quality packages ([314cafb](314cafb)) * **query rules:** add filtering on attr:value ([#221](#221)) ([ebcbf56](ebcbf56)) * **ranking:** do tie breaking based on the magnitude of downloads ([#178](#178)) ([85b631f](85b631f)) * **relevance:** add some synonyms ([#192](#192)) ([760f34a](760f34a)) * **relevance:** enable alternative names query rule ([#195](#195)) ([01217e8](01217e8)), closes [#194](#194) * **relevance:** put name, description and eywords on same level ([#188](#188)) ([ee62193](ee62193)) * **relevance:** use jsDelivr hits for ranking ([#269](#269)) ([9039f76](9039f76)) * **relevancy:** add deprecated in account when sorting ([0b2add3](0b2add3)) * **requests:** add user-agent and httpsAgent ([#646](#646)) ([5a48ad3](5a48ad3)) * **schema:** move git head into githubRepo ([5cbf4e4](5cbf4e4)) * **tracking:** save which stage is currently activated ([dbb7b98](dbb7b98)) * **ts:** allow faceting ([e19e0b0](e19e0b0)) * **ts:** use jsdelivr to check for d.ts ([#645](#645)) ([fbe2e97](fbe2e97)) * **typescript:** pre-load definitely typed pkg ([#639](#639)) ([3968726](3968726)) * add Sentry ([#390](#390)) ([8c08fd5](8c08fd5)) * experimental modules compat ([4f31ab3](4f31ab3)) * full TS migration ([#626](#626)) ([fddc2a8](fddc2a8)) * refacto (part 2) ([#396](#396)) ([2df582b](2df582b)) * **sentry:** wait for the right amount of time. ([#391](#391)) ([d2f00e2](d2f00e2)) * move algolia ([#385](#385)) ([e5d7bec](e5d7bec)) * refacto (part 1) ([#371](#371)) ([c024451](c024451)) * upgrade packages ([#374](#374)) ([3c70053](3c70053)) * **relevance:** merge all the query rules ([#194](#194)) ([9a24fcc](9a24fcc)) * **settings:** allow to make a PR which changes both the settings and the data ([#179](#179)) ([e8f7c2a](e8f7c2a)) * **tags:** add `tags` to the schema ([57a476e](57a476e)) * **third-party:** add handling of Angular CLI schematics, and rework registry subset ([#169](#169)) ([bfab179](bfab179)) * **vue-cli:** add a forced keyword to vue-cli plugins ([3d6ed42](3d6ed42)) * **yeoman:** Identify yeoman generators through computedKeywords ([#181](#181)) ([08c81af](08c81af)) * Add repository info ([#101](#101)) ([29f6fa0](29f6fa0)) ### Reverts * Revert "Revert "chore(deps): update babel monorepo to v7.6.2"" ([4cf094e](4cf094e)) * Revert "Revert "chore(deps): update dependency lint-staged to v9.4.0"" ([11bd8d6](11bd8d6))
As discussed in #187