-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: migrate docs to material for mkdocs (#37)
- Loading branch information
Showing
31 changed files
with
1,763 additions
and
458 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,4 +20,4 @@ | |
"prHourlyLimit": 3, | ||
"automerge": false, | ||
"printConfig": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
version: 2 | ||
|
||
build: | ||
os: ubuntu-24.04 | ||
tools: | ||
python: "3.12" | ||
|
||
python: | ||
install: | ||
- requirements: docs/requirements.txt | ||
|
||
mkdocs: | ||
configuration: mkdocs.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Integrations | ||
|
||
As trrex builds a regular expression pattern, it can be used by any | ||
library that expects a regular expression | ||
|
||
## Working with pandas | ||
|
||
```python | ||
|
||
import trrex as tx | ||
import pandas as pd | ||
|
||
df = pd.DataFrame(["The quick brown fox", "jumps over", "the lazy dog"], columns=["text"]) | ||
pattern = tx.make(["dog", "fox"]) | ||
df["text"].str.contains(pattern) | ||
``` | ||
|
||
As you can see from the above example it works with any pandas function | ||
that receives a regular expression. | ||
|
||
## Efficient gazetteer for spacy | ||
|
||
It can be used in conjunction with spacy EntityRuler to build a | ||
gazetteer | ||
|
||
```python | ||
import trrex as tx | ||
from spacy.lang.en import English | ||
|
||
nlp = English() | ||
ruler = nlp.add_pipe("entity_ruler") | ||
patterns = [ { | ||
"label": "ORG", "pattern": [ {"TEXT": {"REGEX": tx.make(["Amazon", "Apple", "Netflix", "Netlify"])}} ], }, | ||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": | ||
"francisco"}]}, ] | ||
ruler.add_patterns(patterns) | ||
|
||
doc = nlp("Netflix HQ is in Los Gatos.") | ||
[(ent.text, [ent.label]()) for ent in doc.ents] | ||
``` | ||
|
||
## Fuzzy matching with regex | ||
|
||
We can take advantage of the fuzzy matching of the regex module: | ||
|
||
```python | ||
import regex | ||
import trrex as tx | ||
|
||
pattern = tx.make(["monkey", "monster", "dog", "cat"], prefix="", suffix=r"{1<=e<=2}") | ||
regex.search(pattern, "This is really a master dag", regex.BESTMATCH) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# trrex: efficient keyword mining with regular expressions | ||
|
||
The package includes a function that represents a collection of keywords | ||
(strings) as a regular expression. This regular expression can be used | ||
for multiple purposes, such as keyword replacement, keyword extraction, | ||
fuzzy matching, and other similar tasks. | ||
|
||
```python | ||
import re | ||
import trrex as tx | ||
|
||
pattern = tx.make(["baby", "bat", "bad"]) | ||
re.findall(pattern, "The baby was scared by the bad bat.") | ||
``` | ||
|
||
## Installation | ||
|
||
First, obtain at least Python 3.6 and virtualenv if you do not already | ||
have them. Using a virtual environment is strongly recommended, since it | ||
will help you to avoid clutter in your system-wide libraries. Once the | ||
requirements are met, you can use pip: | ||
|
||
```bash | ||
pip install trrex | ||
``` | ||
|
||
## Examples | ||
|
||
Here are some quick examples of what you can do with trrex. | ||
|
||
To begin, import re and trrex: | ||
|
||
```python | ||
import re | ||
import trrex as tx | ||
``` | ||
|
||
### Search for any keyword | ||
|
||
You can search for keywords by using re.search: | ||
|
||
```python | ||
keywords = tx.make(["baby", "bad", "bat"]) | ||
match = re.search(keywords, "I saw a bat") | ||
``` | ||
|
||
In this case we find *bat* the only keyword appearing in the text. | ||
|
||
### Replace a keyword | ||
|
||
You can replace a keyword by using re.sub: | ||
|
||
```python | ||
keywords = tx.make(["baby", "bad", "bat"]) | ||
replaced = re.sub(keywords, "bowl", "The bat is round") | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: trrex |
Oops, something went wrong.