Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Datasets with loaders #430

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
42871c7
HF loader addition
Repcak2000 Mar 15, 2024
e53a1ad
feat: requirements update
Repcak2000 Apr 17, 2024
c4861f1
fix(pre-commit.ci): auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
e6d7ef1
fix: ruff reformat check
Repcak2000 Apr 17, 2024
af089c6
fix: SRAI consultations
Repcak2000 Apr 19, 2024
534fd4c
chore: minor changes ;)
Calychas Apr 19, 2024
bf9b982
chore: pre-commit fixes
Calychas Apr 19, 2024
3fe1461
update tooling
Calychas Apr 22, 2024
697697f
add changelog
Calychas Apr 22, 2024
9c0eeab
chore: hf -> huggingface
Calychas Apr 22, 2024
c71849f
chore: remove config.yamls
Calychas Apr 23, 2024
4052073
chore: update lock
RaczeQ Apr 26, 2024
3a1b986
chore: update lock
RaczeQ Apr 26, 2024
3d6f640
chore: update licenses
RaczeQ Apr 26, 2024
157d0b3
chore: change manual pre commit dependency graph
RaczeQ Apr 26, 2024
e10bd8e
chore: change manual pre commit dependency graph
RaczeQ Apr 26, 2024
41ccc5a
chore: change manual pre commit dependency graph
RaczeQ Apr 26, 2024
6c6c6ef
chore: remove keplergl from dev dependencies
RaczeQ Apr 26, 2024
3694fdb
chore: change lock
RaczeQ Apr 26, 2024
5996089
update vscode settings, add .tool-versions
Calychas Apr 26, 2024
dbfec76
test: add for the rest of hf datasets
Calychas Apr 26, 2024
70d116a
chore: remove union syntactic sugar
Calychas Apr 26, 2024
d7ad17c
update pdm to 2.15.1
Calychas Apr 26, 2024
ae0e243
fix: notebooks
Calychas Apr 26, 2024
73839d5
chore: lock torch version for intel-based macos
RaczeQ Apr 28, 2024
01be194
chore: bump quackosm version
RaczeQ Apr 29, 2024
f01ac4f
update lock
Calychas Jun 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/workflows/run-manual-pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,12 @@ jobs:
- uses: pre-commit/[email protected]
with:
extra_args: --all-files --hook-stage manual --verbose
- name: Show dependencies graph
- name: Show dependencies graph (current lock)
run: |
pdm install -d -G license --skip=post_install
pdm install --skip=post_install
pdm run pipdeptree --license
- name: Show dependencies graph (newest dependencies)
run: |
pdm lock --lockfile pdm.newest.lock --strategy no_cross_platform -dG:all
pdm install --lockfile pdm.newest.lock --skip=post_install
pdm run pipdeptree --license
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ repos:
- id: conventional-pre-commit
stages: [commit-msg]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: 'v0.3.7'
rev: 'v0.4.1'
hooks:
- id: ruff
types_or: [ python, pyi, jupyter ]
Expand All @@ -28,7 +28,7 @@ repos:
args: ["--config-file", "pyproject.toml"]
additional_dependencies: ['types-requests', 'types-six']
- repo: https://github.com/pdm-project/pdm
rev: 2.14.0
rev: 2.15.0
hooks:
- id: pdm-lock-check
- id: pdm-export
Expand Down
1 change: 1 addition & 0 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pdm 2.15.4
2 changes: 1 addition & 1 deletion .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"recommendations": ["njpwerner.autodocstring", "charliermarsh.ruff"]
"recommendations": ["njpwerner.autodocstring", "charliermarsh.ruff", "matangover.mypy"]
}
8 changes: 3 additions & 5 deletions .vscode/settings.json.default
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
{
"python.linting.enabled": true,
"python.linting.mypyEnabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": true,
"source.organizeImports": true
"source.fixAll": "always",
"source.organizeImports": "always"
}
},
"python.formatting.provider": "black",
"autoDocstring.docstringFormat": "google",
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.analysis.typeCheckingMode": "off",
"mypy.runUsingActiveInterpreter": true,
}
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

### Added

- Initial implementation of datasets [#430](https://github.com/kraina-ai/srai/pull/430) for feature enrichment and benchmarking.

## [0.7.4] - 2024-05-05

### Added
Expand Down
67 changes: 67 additions & 0 deletions examples/datasets/airbnb_multicity.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from srai.datasets import AirbnbMulticityDataset\n",
"\n",
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"airbnb_multicity = AirbnbMulticityDataset()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hf_token = os.getenv(\"HF_TOKEN\")\n",
"airbnb_multicity_gdf = airbnb_multicity.load(hf_token=hf_token)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"airbnb_multicity_gdf.head()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
74 changes: 74 additions & 0 deletions examples/datasets/brightkite.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from srai.datasets import BrightkiteDataset\n",
"\n",
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"brightkite = BrightkiteDataset()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hf_token = os.getenv(\"HF_TOKEN\")\n",
"brightkite_gdf = brightkite.load(hf_token=hf_token)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"brightkite_gdf.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
106 changes: 106 additions & 0 deletions examples/datasets/chicago_crime.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from srai.datasets import ChicagoCrimeDataset\n",
"\n",
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chicago_crime = ChicagoCrimeDataset()\n",
"hf_token = os.getenv(\"HF_TOKEN\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load default data "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chicago_crime_default = chicago_crime.load(hf_token=hf_token)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chicago_crime_default.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load data from 2022"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chicago_crime_2022 = chicago_crime.load(hf_token=hf_token, version=\"2022\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chicago_crime_2022.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Loading