Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Analyze dependency and imports using deptry #534

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Nov 16, 2024

I wanted to know if we could auto-detect dependencies which are added to the project but not used today. The deptry tool has some fale positivies (mostly because SQLAlchemy modules aren't imported directly) but otherwise seems to add value.

This PR resolves (or ignores) all dependency issues, resulting in removal of a couple unused packages, and declaring transitive dependencies which we referenced directly but not directly declared as needed.

When wanting to not add version constraints for transitive dependencies also used by PyAirbyte, in some cases I applied "*" version restriction - this makes the dependency explicit while not adding any new restrictions beyond what the dependencies already restrict to.

$ poetry run deptry .
Scanning 115 files...

airbyte/_connector_base.py:13:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/_executors/declarative.py:11:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/_executors/util.py:9:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/_message_iterators.py:13:1: DEP003 'typing_extensions' imported but it is a transitive dependency
airbyte/_util/document_rendering.py:8:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/_util/meta.py:112:16: DEP001 'IPython' imported but missing from the dependency definitions
airbyte/_util/telemetry.py:44:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/caches/_state_backend.py:10:1: DEP003 'pytz' imported but it is a transitive dependency
airbyte/cli.py:71:8: DEP003 'click' imported but it is a transitive dependency
airbyte/cli.py:72:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/records.py:74:8: DEP003 'pytz' imported but it is a transitive dependency
airbyte/secrets/base.py:11:1: DEP003 'pydantic_core' imported but it is a transitive dependency
airbyte/sources/base.py:11:8: DEP001 'yaml' imported but missing from the dependency definitions
airbyte/validate.py:17:8: DEP001 'yaml' imported but missing from the dependency definitions
docs/generate.py:16:8: DEP004 'pdoc' imported but declared as a dev dependency
examples/run_declarative_manifest_source.py:13:8: DEP001 'yaml' imported but missing from the dependency definitions
examples/run_perf_test_reads.py:79:1: DEP003 'typing_extensions' imported but it is a transitive dependency
pyproject.toml: DEP002 'duckdb' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'numpy' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'psycopg' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'psycopg2-binary' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'types-pyyaml' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'sqlalchemy-bigquery' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'pyiceberg' defined as a dependency but not used in the codebase
pyproject.toml: DEP002 'grpcio' defined as a dependency but not used in the codebase
Found 25 dependency issues.

For more information, see the documentation: https://deptry.com/

Summary by CodeRabbit

  • New Features

    • Introduced a new job for dependency analysis in the CI/CD workflow to enhance dependency management.
    • Added several new dependencies to improve functionality and development processes.
  • Chores

    • Updated Python version compatibility to support versions >=3.10,<3.13.
    • Maintained existing dependency configurations for stability.

Copy link

vercel bot commented Nov 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
pyairbyte-api-docs ❌ Failed (Inspect) Nov 16, 2024 6:07am

Copy link

coderabbitai bot commented Nov 16, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces significant updates to the pyproject.toml file, focusing on dependency management by adding multiple new dependencies and removing some existing ones. Notably, airbyte-protocol-models-pdv2, click, and others have been added, while numpy has been removed. Additionally, the development dependencies section has been expanded with new packages like coverage and pytest-mock. A new job for dependency analysis has also been added to the GitHub Actions workflow, enhancing the CI/CD pipeline without altering existing jobs.

Changes

File Change Summary
pyproject.toml Added multiple dependencies including airbyte-protocol-models-pdv2, click, and others; removed numpy; updated dev dependencies and added [tool.deptry.per_rule_ignores].
.github/workflows/python_pytest.yml Introduced a new job dependency-analysis for performing dependency checks using Deptry.

Sequence Diagram(s)

sequenceDiagram
    participant Developer
    participant CI/CD
    participant DependencyAnalysis

    Developer->>CI/CD: Push changes
    CI/CD->>DependencyAnalysis: Trigger dependency analysis job
    DependencyAnalysis->>CI/CD: Analyze dependencies
    CI/CD-->>Developer: Analysis complete
Loading

Possibly related PRs

  • Chore(deps): Bump CDK, nltk, and python-ulid to latest #537: This PR updates the pyproject.toml file with modifications to dependency specifications, which is directly related to the changes made in the main PR regarding dependency management.
  • Chore: Bump cdk to 6.6.4 #540: This PR also modifies the pyproject.toml file by updating the version constraint for the airbyte-cdk dependency, which aligns with the main PR's focus on dependency updates.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
pyproject.toml (3)

75-75: LGTM! Quick thought about tool integration though...

The addition of deptry looks good! Since we have quite a comprehensive set of dev tools already (ruff, mypy, etc.), would you consider adding a deptry check to the [tool.poe.tasks] section to make it part of our standard checks? Something like:

[tool.poe.tasks]
-check = { shell = "ruff check . && mypy . && pytest --collect-only -qq" }
+check = { shell = "ruff check . && mypy . && deptry . && pytest --collect-only -qq" }

wdyt? 🤔


Line range hint 182-183: Quick thought about tool overlap! 🤔

I notice we have ruff's I (isort) rules enabled and comprehensive import configurations. Since deptry also analyzes imports, should we document somewhere (maybe in README.md) how these tools complement each other? This could help clarify:

  • ruff (I): import sorting and conventions
  • deptry: dependency usage analysis
    wdyt?

Also applies to: 205-206


Hey there! I've got some interesting findings about those dependencies! 🔍

After checking the actual usage patterns, here's what I found:

  • yaml is actively used throughout the codebase for config loading and dumping - this should definitely be added as a direct dependency
  • pytz is actively used in records.py for timezone handling - this needs to be added too
  • duckdb is used in several files including processors and caches - this is a needed dependency
  • psycopg is used in test fixtures (conftest.py) - this is required for testing
  • numpy shows no direct usage or imports as np - maybe we could drop this one?

Would you like to update the dependencies like this? 👇

  • Add: pyyaml and pytz as direct dependencies
  • Keep: duckdb and psycopg as they're actively used
  • Remove: numpy since it's not used directly (unless it's a transitive dependency?)

wdyt? 🤔

🔗 Analysis chain

Line range hint 18-47: Let's verify those dependency findings from deptry! 🕵️

The deptry scan revealed some interesting insights:

  1. Missing dependencies ('yaml', 'pytz')
  2. Unused dependencies ('duckdb', 'numpy', 'psycopg')

Would you like me to help verify these findings? We can run a quick check to confirm the usage patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify dependency usage patterns

# Check for yaml imports
echo "Checking for yaml imports..."
rg -l "import yaml|from yaml" || echo "No yaml imports found"

# Check for pytz imports
echo "Checking for pytz imports..."
rg -l "import pytz|from pytz" || echo "No pytz imports found"

# Check for duckdb usage
echo "Checking for duckdb usage..."
rg -l "import duckdb|from duckdb" || echo "No duckdb imports found"

# Check for numpy usage
echo "Checking for numpy usage..."
rg -l "import numpy|from numpy" || echo "No numpy imports found"

# Check for psycopg usage
echo "Checking for psycopg usage..."
rg -l "import psycopg|from psycopg" || echo "No psycopg imports found"

Length of output: 1089


Script:

#!/bin/bash
# Let's check the actual usage patterns more deeply to understand if these are just imports or actual usage

# For yaml - check actual usage beyond imports
echo "Checking yaml usage patterns..."
rg "yaml\." -A 2

# For numpy - check if it might be used as 'np'
echo "Checking for numpy as np..."
rg -l "import numpy as np|from numpy as np"

# For psycopg - check actual usage in conftest
echo "Checking psycopg usage in conftest..."
rg "psycopg" tests/conftest.py -A 2

# For duckdb - check if it's used in actual data processing
echo "Checking duckdb actual usage..."
rg "duckdb\." -A 2

# Check if pytz is used beyond imports
echo "Checking pytz usage patterns..."
rg "pytz\." -A 2

Length of output: 4507

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 186d0c5 and 814a103.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Dec 4, 2024

/poetry-lock

poetry lock job started... Check job output.

poetry lock applied successfully.

Resolving dependencies... changes
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
pyproject.toml (2)

Line range hint 14-14: Python version constraint looks good, but shall we document the reason for <3.13? 🤔

The constraint aligns well with sqlalchemy-bigquery's requirement. Would it be helpful to add a comment explaining this relationship, similar to how we documented other version constraints? wdyt?


75-75: Consider integrating deptry into the CI pipeline? 🛠️

Since we're adding deptry for dependency analysis, would it make sense to add it to the [tool.poe.tasks] section as a check task? This way, we could run it alongside other checks like ruff and mypy. wdyt?

Example addition:

[tool.poe.tasks]
-check = { shell = "ruff check . && mypy . && pytest --collect-only -qq" }
+check = { shell = "ruff check . && mypy . && deptry . && pytest --collect-only -qq" }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 814a103 and a46098d.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)
🔇 Additional comments (1)
pyproject.toml (1)

Line range hint 17-43: Shall we address the dependency issues flagged by deptry? 📦

The scan revealed several potential issues:

  1. Unused dependencies:

    • duckdb
    • numpy
    • psycopg
  2. Missing dependencies:

    • yaml
    • pytz

Would you like me to help verify these findings? We could:

  1. Check for indirect usage of the "unused" dependencies
  2. Verify if the missing ones are actually transitive dependencies

Let's check for actual usage:

@aaronsteers aaronsteers marked this pull request as ready for review December 4, 2024 05:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
.github/workflows/python_pytest.yml (1)

191-211: Consider uploading deptry results as an artifact.

The job looks good! For better visibility and debugging, would you consider uploading the deptry results as an artifact? This could help track dependency issues over time, wdyt?

 - name: Run Deptry
   run: |
     poetry run deptry . > deptry-results.txt
+
+- name: Upload Deptry Results
+  if: always()
+  uses: actions/upload-artifact@v4
+  with:
+    name: deptry-analysis
+    path: deptry-results.txt
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a46098d and 6924285.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • .github/workflows/python_pytest.yml (1 hunks)
  • pyproject.toml (3 hunks)
🔇 Additional comments (1)
pyproject.toml (1)

347-361: LGTM! Well-structured deptry configuration.

The deptry configuration is clear and well-documented. The ignore rules are thoughtfully organized with helpful comments explaining each exclusion.

pyproject.toml Show resolved Hide resolved
@aaronsteers aaronsteers changed the title chore: analyze dependency and imports using deptry Chore: Analyze dependency and imports using deptry Dec 4, 2024
@aaronsteers aaronsteers changed the title Chore: Analyze dependency and imports using deptry Chore: Analyze dependency and imports using deptry Dec 4, 2024
@aaronsteers aaronsteers enabled auto-merge (squash) December 4, 2024 05:25
@aaronsteers aaronsteers merged commit fa505de into main Dec 4, 2024
20 of 22 checks passed
@aaronsteers aaronsteers deleted the a/chore/validate-dependencies-with-deptry branch December 4, 2024 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant