-
-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
✨ Follow dynamically-built URLs (#146)
* ✨ Follow dynamically-built URLs * Add tests and documentation * Update OSX chromedriver version
- Loading branch information
1 parent
391771f
commit 3032901
Showing
10 changed files
with
164 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Helper Functions | ||
|
||
Here is a list of functions that can be useful for web scraping. | ||
|
||
## `follow_url()` | ||
|
||
This function allows adding dynamically created URLs to the list of URLs to be scraped. | ||
|
||
=== "Python" | ||
|
||
```python | ||
from dude import select, follow_url | ||
|
||
|
||
@select(css=".url", group_css=".custom-group") | ||
def url(element: BeautifulSoup) -> Dict: | ||
|
||
follow_url(element["href"]) | ||
|
||
return {"url": element["href"]} | ||
``` | ||
|
||
## `get_current_url()` | ||
|
||
This functions allows access to the current URL that is being scraped. | ||
It can be useful when used together with `follow_url()` function. | ||
|
||
=== "Python" | ||
|
||
```python | ||
from dude import select, follow_url, get_current_url | ||
|
||
|
||
@select(css=".url", group_css=".custom-group") | ||
def url(element: BeautifulSoup) -> Dict: | ||
|
||
follow_url(urljoin(get_current_url(), element["href"])) | ||
|
||
return {"url": element["href"]} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[tool.poetry] | ||
name = "pydude" | ||
version = "0.18.0" | ||
version = "0.19.0" | ||
repository = "https://github.com/roniemartinez/dude" | ||
description = "dude uncomplicated data extraction" | ||
authors = ["Ronie Martinez <[email protected]>"] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters