-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated org research tooling #100
Comments
Debugging scraper
How to look at the browser?
|
Reconsider what to scrapeLinkedIn does not allow scraping, at least within its private pages. If we are to collect employee info, it may be blocked my bot verification check. Of course caching and a best-effort mindset would help, but requires more work and less outcome - which affects our answer to the question: is it worth it going down this route? Because, you can always just visit the site. But of course, review data (numeric and qualitative) is still relatively easy to retrieve. Maybe a research hub could be feasible and useful - contains various sections, allow (and expect) some section left empty (due to network issue, page structure change, bot check, etc), while applying caching to minimize scraping. We imagine such research hub should be:
|
Develop a micro service that can research the following:
Then, we can have some cronjob to POST data to appl-tracky, and display data in UI.
Scraping in Go
We need golang javascript scraper, this blog sum up some great scrapers.
The text was updated successfully, but these errors were encountered: