-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thoughts on a more modular design #2
Comments
Thanks for the thoughtful feedback, much appreciated! :) I agree on all points, this was basically the direction I wanted this module to go but then a deadline came up. 😅 But potential feedback like this was the reason I already published the current version, despite not being too happy with it's current form. I must say I'm very impressed by how modular and clean your modules are (e.g. The only concern I have is that the event listeners exposed by Puppeteer might not be sufficient for all use cases and some monkey patching might still be required, but this can definitely be avoided in most common scenarios. Let me look into this a little bit, will take inspiration from your packages and test out some concepts and make a new branch as a basis for discussions. :) Thanks! |
Sounds good -- yeah, I hear you about the event listeners, so as long as we ensure that plugins are capable of monkey patching I think it should be fine. Happy to help out with this btw once you have your mind set on a direction. |
Alright, I gave it a first shot: https://github.com/berstend/puppeteer-extra/tree/rewrite const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-font-size')())
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')({
customFn: (ua) => 'MyCoolAgent/' + ua})
)
const browser = await puppeteer.launch({headless: false}) As you can tell the project structure is There is currently no monkey patching involved, puppeteers event listeners where more powerful than I thought. In terms of how things are set up: Main module
Plugins:
Caveats:
Happy to hear your thoughts on this, these fundamental concepts are hard to change later when others are developing plugins as well 😄 |
This is really awesome @berstend -- definitely in line with what I was thinking! A few comments / thoughts:
|
Thanks for the feedback @transitive-bullshit :)
This was a fun exercise. It's interesting to balance the trade-offs between making the plugin framework as agnostic as possible and still being a little opinionated to make the typical 90% use cases very simple (while the other 10% still possible of course). Would be great if you have time for another look, I think the current design should work well and stay maintainable but I'm happy about feedback from a potential plugin developer. :)
Yeah definitely, this was one of the main reasons to start the project 😄 |
The new Writing the remaining plugins was a breeze with the new modular design. Especially in the new stealth plugin this really shines, as it is itself just requiring separate plugins on demand using the agnostic dependency system. Thanks again for the suggestion, I feel I improved as a developer through that exercise. 👍 Closing this PR for now, let me know when you have further feedback/suggestions. :) |
Really happy with how this turned out as well :) I'll definitely be submitting some PRs for more plugins in the coming weeks. Keep up the great work! |
There are some really helpful plugins here. Definitely cool stuff. Isn't it the case that when you call const myPuppeteer = puppeteer
.use(stealth)
.use(anonymizeUA)
const myOtherPuppeteer = puppeteer
.use(recaptcha)
const myBrowser = await myPuppeteer.launch()
const myOtherBrowser = await myOtherPuppeteer.launch() |
Hey @m59peacemaker - thanks for using this relevant PR for your comment 😄👍 You are correct, the current behaviour is not really intuitive: const puppeteerExtra1 = require("puppeteer-extra")
const puppeteerExtra2 = require("puppeteer-extra")
const puppeteerVanilla = require("puppeteer")
const checkForBeer = name => async browser => {
const page = await browser.newPage()
await page.goto("https://httpbin.org/headers", {
waitUntil: "domcontentloaded"
})
const content = await page.content()
console.log(`- ${name}: User agent includes beer:`, content.includes("Beer"))
await browser.close()
}
const UserAgentPlugin = require("puppeteer-extra-plugin-anonymize-ua")
const uaPlugin = UserAgentPlugin({
customFn: ua => ua.replace("Chrome", "Beer")
})
puppeteerExtra1.use(uaPlugin)
async function main() {
await puppeteerExtra1
.launch({ headless: true })
.then(checkForBeer("puppeteerExtra1 (beer)"))
await puppeteerExtra2
.launch({ headless: true })
.then(checkForBeer("puppeteerExtra2 (no beer?)"))
await puppeteerVanilla
.launch({ headless: true })
.then(checkForBeer("puppeteerVanilla (no beer?)"))
}
main() Output: ❯❯❯ node foo.js
- puppeteerExtra1 (beer): User agent includes beer: true
- puppeteerExtra2 (no beer?): User agent includes beer: true
- puppeteerVanilla (no beer?): User agent includes beer: false It would be a good idea to return a new instance here, so composition of different puppeteer browser configs are possible. The culprit for that behavior can be found here, we already instantiate a The reason for that was to keep the usage as simpel as possible and to imitate how puppeteer is being used (being a "drop-in replacement" and all). For them that works as there's no modifiers on a I'm wondering what would be the nicest API here (in terms of developer experience). I'm also growing a distaste of default exports the more I move to typescript, so this would be a good chance to overhaul the API signature of the whole project in general. 😄 Also in our pursuit to "hide" class constructors (and not having the developer to use the pesky Happy to hear thoughts on what the perfect API signature would look like to others and overhaul this in a major release. |
FWIW chaining |
As far as the drop in replacement concern, I think instantiating the drop in replacement is reasonable for most people. Also having the instantiated version somewhere is an option. // original:
const puppeteer = require('puppeteer')
puppeteer.launch()
// some drop-in possibilities:
const puppeteer = require('puppeteer-extra')()
const puppeteer = require('puppeteer-extra/drop-in') I'm glad you're into the suggestion! |
Yeah definitely! This project is actively maintained and I'm always happy to improve things based on suggestions. If I had all the time in the world I would rewrite most things anyway (with now improved domain knowledge) 😄 To keep your suggestion backwards compatible (not a hard requirement for a future solution): // Current behaviour, single instance (most common use-case?)
const puppeteer = require('puppeteer-extra')
// New addition to support instantiating multiple puppeteer-extra's
const puppeteer = require('puppeteer-extra/multi')() To throw another possibility into the mix (that would emphasize the lightweight nature of this project): const puppeteer = require('puppeteer')
const { makeExtra } = require('puppeteer-extra')
makeExtra(puppeteer) // or `enableExtras()`
puppeteer
.use(stealth)
.use(anonymizeUA) or const vanillaPuppeteer = require('puppeteer')
const { makeExtra } = require('puppeteer-extra')
const puppeteer = makeExtra(vanillaPuppeteer)
.use(stealth)
.use(anonymizeUA) or const { makePuppeteer } = require('puppeteer-extra')
const puppeteer = makePuppeteer()
.use(stealth)
.use(anonymizeUA) I've noticed that certain cloud environments (apify, etc.) come with puppeteer pre-shipped and require it automatically, this approach would make it possible to still use puppeteer-extra then. I'm gonna think about this for a little bit, also in regards to #21 and look at how similar projects approach that. In case multi puppeteer-extra is desperately needed, here's a workaround. 😄 |
This is now supported 😄 const { addExtra } = require('puppeteer-extra')
const puppeteer = addExtra(require('puppeteer-firefox')) |
I'm definitely onboard with the core idea behind this module, and I have several additional use cases I'd like to add, but there are a few things currently stopping me from jumping in:
newPage
seems like it could lead to trouble; why not just listen to the built-in Browser event targetcreated which would do the same thing but in a less intrusive way?This seems like a perfect use case for a plugin pattern, where the only addition to the public puppeteer API is a single use method that takes in a plugin and adds it to a private list of registered plugins.
PuppeteerExtra.launch
would then create a new browser and run it through each of the registered plugins to add whatever hooks / functionality that plugin wants to the browser instance.The only special case here is plugins that want to affect the functionality / params of
puppeteer.launch
, so any registered plugin which defines alaunch
method would be allowed to edit the launch params before creating the browser instance.This is a very similar pattern to how passport / express / babel work, and it scales really nicely once you get the base
puppeteer-extra
andpuppeteer-extra-plugin
interfaces defined.Looking forward to hearing your thoughts :)
The text was updated successfully, but these errors were encountered: