Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Cloud deploy functionality out of experimental status #419

Merged
merged 33 commits into from
Dec 10, 2024

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Oct 15, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced client credentials-based authentication for API interactions.
    • Added utility functions for generating unique identifiers.
    • New methods for listing workspaces, sources, and destinations.
    • Enhanced CloudWorkspace with new resource listing methods and improved error handling.
    • Added functionality for translating cache types into destination configurations.
    • Introduced a new exception class for handling duplicate resource errors.
    • Added new test functions for workspace, source, and destination retrieval.
  • Bug Fixes

    • Improved error handling in various methods, ensuring more specific exceptions are raised.
  • Documentation

    • Updated documentation to reflect new methods and authentication changes.
  • Tests

    • Enhanced integration tests for better clarity and consistency, utilizing public methods for CRUD operations.
    • Updated test fixtures for improved organization and clarity.
    • Removed obsolete tests and improved type annotations across various test files.
  • Chores

    • Updated dependency versions and modified configuration files for improved maintainability.

Copy link

coderabbitai bot commented Oct 15, 2024

📝 Walkthrough

Walkthrough

The pull request introduces substantial changes to the Airbyte codebase, focusing on enhancing API interactions and transitioning from API key-based to client credentials-based authentication. Key updates include the addition of new utility functions for listing resources, modifications to existing functions to accommodate the new authentication model, and improvements in error handling and logic across various classes. New utility functions for text manipulation are also introduced, while several test files are updated to align with these changes, ensuring consistency and clarity in the testing framework.

Changes

File Path Change Summary
airbyte/_util/api_util.py Transitioned to client credentials-based authentication. Updated function signatures to require client_id and client_secret. Added functions for listing workspaces, sources, and destinations. Enhanced error handling and refactored response logic. Updated create_connection to use stream_configurations_obj.
airbyte/_util/text_util.py Introduced utility functions generate_ulid and generate_random_suffix for generating unique identifiers.
airbyte/caches/base.py Removed private attributes and added class-level attributes. Updated method return types and added new methods for length and bool checks.
airbyte/cloud/connections.py Added properties for source and destination. Updated logic in existing properties and method signatures to use client credentials. Renamed and updated deletion logic in permanently_delete method.
airbyte/cloud/connectors.py Introduced abstract base class CloudConnector with properties and methods for managing cloud connectors.
airbyte/cloud/experimental.py Deleted file containing experimental features.
airbyte/cloud/workspaces.py Refactored authentication parameters to use client credentials. Updated method signatures for deployment methods and added new resource listing methods. Enhanced error handling and removed deprecated methods.
airbyte/exceptions.py Added AirbyteDuplicateResourcesError exception class and modified AirbyteMissingResourceError for proper formatting.
airbyte/sources/base.py Simplified Source class structure by removing private attributes and changing connector_type declaration.
docs/generate.py Updated documentation generation to exclude experimental modules and include the CLI module.
tests/conftest.py Modified test execution handling and Docker container management. Updated test prioritization and fixture definitions.
airbyte/destinations/base.py Changed connector_type declaration and updated logic in _write_airbyte_message_stream method.
tests/integration_tests/cloud/test_cloud_sql_reads.py Updated integration tests to utilize public methods for deployment and running connections. Changed parameter types and assertion checks.
airbyte/caches/bigquery.py Added class attributes and property methods for paired destination configurations.
airbyte/caches/duckdb.py Similar modifications as bigquery.py for paired destination configurations.
airbyte/caches/motherduck.py Added paired destination attributes and configuration methods.
airbyte/caches/postgres.py Updated class attributes and added methods for paired destination configurations.
airbyte/caches/snowflake.py Similar modifications as postgres.py for paired destination configurations.
airbyte/cloud/sync_results.py Updated attribute types and modified methods to use client credentials.
airbyte/secrets/google_gsm.py Changed return type of get_secret method to always return a SecretString. Enhanced credential handling logic in the constructor.
pyproject.toml Updated airbyte-api dependency version and added a warning filter for pytest.
tests/integration_tests/cloud/conftest.py Enhanced handling of cloud credentials and fixture definitions. Updated to use structured credentials format.
tests/integration_tests/cloud/test_cloud_api_util.py Added tests for new API utility functions and updated existing tests to align with new parameter structures.
tests/integration_tests/test_all_cache_types.py Updated type annotations and parameter names in tests for cache types.
airbyte/destinations/_translate_cache_to_dest.py Introduced functions for translating cache types into destination configurations.
airbyte/destinations/_translate_dest_to_cache.py Introduced functions for translating destination configurations into cache implementations.
airbyte/destinations/util.py Updated get_noop_destination function signature to include a new parameter for installation behavior.
airbyte/shared/state_providers.py Enhanced filtering logic in stream_state_artifacts property and updated get_stream_state method for improved flexibility.
airbyte/sources/__init__.py Added registry submodule to exports for type checking.
airbyte/sources/util.py Deprecated get_connector function and updated get_benchmark_source to include a new parameter for installation behavior.
tests/integration_tests/conftest.py Enhanced fixture functionality and updated handling of configurations for various caches.
tests/integration_tests/test_source_test_fixture.py Updated type annotations and function signatures for clarity and consistency.
tests/integration_tests/test_state_handling.py Updated tests for state handling to improve clarity and correctness in cache name generation.
tests/integration_tests/test_source_faker_integration.py Removed test function verifying the availability of the source-faker executable due to changes in installation behavior.

Suggested labels

enable-ai-review


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (11)
airbyte/_util/text_util.py (3)

9-11: The generate_ulid() function looks good, but how about enhancing the docstring?

The function is well-implemented, but the docstring could be more informative. What do you think about expanding it to explain what a ULID is and its characteristics? Something like this, perhaps?

def generate_ulid() -> str:
    """Generate a new ULID (Universally Unique Lexicographically Sortable Identifier).

    A ULID is a 128-bit identifier that combines a timestamp with randomly generated bits,
    resulting in a sortable, unique identifier that's more compact than a UUID.

    Returns:
        str: A 26-character string representation of the ULID.
    """
    return str(ulid.ULID())

This would provide more context for users of the function. WDYT?


14-22: Great implementation of generate_random_suffix()! How about a tiny optimization?

The function looks solid, and I love the detailed docstring explaining its behavior and limitations. The use of ULID for a sortable suffix is clever!

Just a small suggestion: we could simplify the implementation slightly by using string slicing in a single line. What do you think about this?

def generate_random_suffix() -> str:
    """Generate a random suffix for use in temporary names.

    By default, this function generates a ULID and returns a 9-character string
    which will be monotonically sortable. It is not guaranteed to be unique but
    is sufficient for small-scale and medium-scale use cases.
    """
    return generate_ulid()[:6] + generate_ulid()[-3:]

This achieves the same result but in a slightly more concise way. WDYT?


1-22: The overall structure of the file looks great! Any plans for more text utilities?

I really like how clean and focused this file is. The two functions are well-organized and clearly related. Great job on keeping it simple and to the point!

Just curious, do you have any plans to add more text utility functions in the future? This file seems like a great place for them. Maybe something for text sanitization, truncation, or other common text operations you might need across the project? No pressure, just thinking ahead!

Keep up the awesome work! 😊

airbyte/exceptions.py (2)

489-494: LGTM! New exception class looks good.

The new AirbyteDuplicateResourcesError class is well-structured and follows the established pattern. It provides a clear way to handle duplicate resource scenarios.

Quick thought: Would it be helpful to add a default value for the guidance attribute, similar to other exception classes in this file? Something like "Please choose a unique name for the resource." wdyt?


497-497: Hmm, duplicate "Custom Warnings" sections?

I noticed we now have two "Custom Warnings" sections in the file. The new one here and another at the end of the file. Should we consider consolidating these for better organization? Maybe move AirbyteMultipleResourcesError to the existing section at the end? What do you think?

airbyte/sources/base.py (1)

54-54: Simplified connector_type attribute, but lost type hinting. Thoughts on keeping both?

The connector_type attribute has been simplified from a type-annotated literal to a simple string assignment. While this makes the code cleaner, we've lost the type hinting.

What do you think about keeping the type hint for better IDE support and static type checking? We could do something like:

connector_type: Literal["source"] = "source"

This way, we maintain the simplicity while preserving the type information. WDYT?

airbyte/cloud/connectors.py (1)

39-39: Consider updating docstring to generalize connector type—wdyt?

Since the connector_url method applies to both sources and destinations, would it make sense to update the docstring to "Get the URL of the connector"? Let me know your thoughts.

airbyte/cloud/connections.py (1)

209-219: Enhance the method docstring to reflect cascading deletions.

Since cascade_delete_source and cascade_delete_destination allow deleting the source and destination, should we update the method description to reflect this additional functionality? Wdyt?

airbyte/_util/api_util.py (3)

104-105: Suggestion: Clarify Error Message

The error message could be more precise. Maybe rephrase it to "You can provide either 'name' or 'name_filter', but not both." to enhance clarity. Wdyt?


143-143: Update Docstring in list_sources

The docstring currently says "Get a connection." Should we update it to "List sources." to reflect the function's purpose? Wdyt?


179-179: Update Docstring in list_destinations

The docstring states "Get a connection." Consider updating it to "List destinations." to accurately describe the function. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f3bc4c6 and 9b40875.

📒 Files selected for processing (9)
  • airbyte/_util/api_util.py (17 hunks)
  • airbyte/_util/text_util.py (1 hunks)
  • airbyte/caches/base.py (0 hunks)
  • airbyte/cloud/connections.py (5 hunks)
  • airbyte/cloud/connectors.py (1 hunks)
  • airbyte/cloud/experimental.py (0 hunks)
  • airbyte/cloud/workspaces.py (6 hunks)
  • airbyte/exceptions.py (1 hunks)
  • airbyte/sources/base.py (1 hunks)
💤 Files with no reviewable changes (2)
  • airbyte/caches/base.py
  • airbyte/cloud/experimental.py
🧰 Additional context used
🔇 Additional comments (16)
airbyte/_util/text_util.py (1)

1-7: LGTM! The file header and imports look good.

The copyright notice, module docstring, and imports are all appropriate for the functionality provided. Nice job using from __future__ import annotations for better type hinting support! 👍

airbyte/exceptions.py (1)

489-489: Thumbs up for the indentation fix!

Nice catch on aligning the resource_name_or_id attribute. It's these little details that keep our codebase tidy and consistent. 👍

airbyte/sources/base.py (2)

54-54: Overall, good cleanup! Any other simplifications in mind?

These changes look good overall! You've simplified the connector_type attribute and removed some unused private attributes. This kind of cleanup helps keep the codebase lean and maintainable.

Do you have any other simplifications or cleanups in mind for this class? It might be worth taking a look at other attributes or methods that could benefit from similar treatment.


54-54: Removed private attributes. Can we double-check their usage?

I noticed that three private attributes (_deployed_api_root, _deployed_workspace_id, and _deployed_source_id) have been removed. This is great for code cleanliness if they were unused!

Just to be on the safe side, could we run a quick search through the codebase to ensure these weren't used anywhere else? Maybe something like:

This would help us catch any lingering references. What do you think?

airbyte/cloud/connectors.py (1)

40-40: Verify URL construction for correctness—wdyt?

Just to double-check, the connector_url method constructs the URL by appending {self.connector_type}s, resulting in paths like .../sources/{connector_id} or .../destinations/{connector_id}. Does this match the expected URL structure in Airbyte Cloud? Let me know if this aligns with the API endpoints.

airbyte/cloud/connections.py (5)

9-9: Importing CloudDestination and CloudSource looks good.

This addition enables the usage of these classes later in the code.


105-105: Good handling of None for streams.

Using or [] ensures that we don't encounter a TypeError when self._connection_info.configurations.streams is None. Nice work!


113-113: Properly handling None for prefix.

Returning an empty string when self._connection_info.prefix is None prevents potential issues when using this value. Looks good!


174-176: Adding _latest_job_info to SyncResult instantiation makes sense.

This provides SyncResult with access to the latest job information, which can be useful for further processing. Good addition!


223-227: Confirming safe deletion when cascading is enabled.

When cascade_delete_source or cascade_delete_destination are True, the source or destination will be permanently deleted. Should we add safeguards or confirmations to prevent accidental deletions of these resources? Wdyt?

airbyte/_util/api_util.py (6)

17-17: Approved: Enhancement of Type Hints

The addition of TYPE_CHECKING and Any from the typing module improves type hinting and code clarity.


27-27: Approved: Importing PyAirbyteInputError

Including PyAirbyteInputError in the exceptions enhances error handling for input validation.


31-33: Approved: Conditional Import of Callable

Using if TYPE_CHECKING to conditionally import Callable optimizes runtime performance by avoiding unnecessary imports.


235-239: Approved: Improved Exception Handling in get_connection

The updated exception handling provides more informative error messages, enhancing debugging and user experience.


563-569: Approved: Refactoring Stream Configurations

Wrapping stream_configurations in models.StreamConfigurations ensures compatibility with the API requirements.


390-397: Approved: Enhanced Error Handling in get_source

The updated error handling in get_source correctly raises an AirbyteMissingResourceError when a source is not found, improving reliability.

airbyte/cloud/connectors.py Outdated Show resolved Hide resolved
airbyte/cloud/connections.py Outdated Show resolved Hide resolved
airbyte/cloud/connections.py Outdated Show resolved Hide resolved
airbyte/cloud/workspaces.py Outdated Show resolved Hide resolved
airbyte/cloud/workspaces.py Outdated Show resolved Hide resolved
airbyte/cloud/workspaces.py Outdated Show resolved Hide resolved
airbyte/cloud/workspaces.py Outdated Show resolved Hide resolved
airbyte/cloud/workspaces.py Outdated Show resolved Hide resolved
airbyte/_util/api_util.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4f53e and b435550.

📒 Files selected for processing (5)
  • airbyte/destinations/_util.py (3 hunks)
  • airbyte/destinations/base.py (1 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_sync.py (1 hunks)
  • tests/integration_tests/cloud/test_cloud_workspaces.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_util.py

41-41: Unused function argument: cache_class

(ARG001)


43-43: Missing issue link on the line following this TODO

(TD003)


48-48: Unused function argument: destination_class

(ARG001)


50-50: Missing issue link on the line following this TODO

(TD003)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


187-187: Avoid specifying long messages outside the exception class

(TRY003)

tests/integration_tests/cloud/test_cloud_workspaces.py

67-68: SyntaxError: Expected an identifier

🔇 Additional comments (4)
airbyte/destinations/_util.py (3)

81-81: LGTM! 👍

Good catch on ensuring type consistency by converting db_path to string.


171-177: Great security practices! 🔒

Nice work on:

  1. Validating the required API key
  2. Using SecretString for sensitive data handling
🧰 Tools
🪛 Ruff (0.8.0)

172-172: Avoid specifying long messages outside the exception class

(TRY003)


185-193: Solid improvements to configuration handling! 👌

Nice work on:

  1. Proper port defaulting logic
  2. Required password validation
  3. Default schema handling
🧰 Tools
🪛 Ruff (0.8.0)

187-187: Avoid specifying long messages outside the exception class

(TRY003)

tests/integration_tests/cloud/test_cloud_workspaces.py (1)

69-73: Verify deploy_connection parameters

The connection deployment looks good, but should we also specify the table_prefix parameter since it's being asserted later? wdyt?

     connection: CloudConnection = cloud_workspace.deploy_connection(
         connection_name="test-connection",
         source=source,
         cache=cache,
+        table_prefix="abc_deleteme_"
     )

airbyte/destinations/_util.py Outdated Show resolved Hide resolved
airbyte/destinations/base.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_sync.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_sync.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_workspaces.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_sql_reads.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_sql_reads.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
tests/integration_tests/cloud/test_cloud_workspaces.py (2)

Line range hint 36-36: Consider enhancing the skip message with more context.

The current skip message doesn't provide enough information about the flakiness. Could we add more details about the specific issues and potential fixes? This would help other developers understand and eventually fix the problem, wdyt?

🧰 Tools
🪛 Ruff (0.8.0)

14-14: airbyte.destinations.base.Destination imported but unused

Remove unused import: airbyte.destinations.base.Destination

(F401)


64-68: Consider adding deployment success assertions.

The test jumps directly to checking stream names without verifying if the connection was deployed successfully. Should we add some basic assertions first? For example:

assert connection.id is not None
assert connection.status == "active"  # or whatever the expected status is
airbyte/caches/postgres.py (1)

35-35: LGTM! Consider adding docstring for the new attribute?

The _paired_destination_name attribute looks good and follows the consistent pattern across cache implementations. Would you consider adding a brief docstring to explain its purpose in the deployment context? wdyt?

airbyte/caches/snowflake.py (1)

37-37: LGTM! Consider grouping class attributes?

The implementation looks good! Would you like to consider grouping all class attributes together at the top of the class (moving _paired_destination_name next to dedupe_mode)? This could improve readability, but I'm fine either way - wdyt?

airbyte/caches/duckdb.py (1)

Line range hint 35-41: Consider documenting the architectural decision?

The addition of _paired_destination_name across all cache implementations suggests an important architectural pattern linking caches with specific destinations. Since this is part of the Cloud deployment functionality, would it be helpful to document this design decision in an ADR (Architecture Decision Record) or the deployment documentation? This could help future contributors understand the relationship between caches and destinations in the cloud context. wdyt?

airbyte/destinations/_util.py (1)

185-193: Consider adding type validation for port? 🤔

While the port conversion looks good, would you consider adding a range validation to ensure it's a valid port number (0-65535)? Here's a suggestion:

-    port: int = int(destination_configuration.port) if destination_configuration.port else 5432
+    if destination_configuration.port:
+        try:
+            port = int(destination_configuration.port)
+            if not 0 <= port <= 65535:
+                raise ValueError(f"Invalid port number: {port}")
+        except ValueError as e:
+            raise ValueError(f"Invalid port format: {e}")
+    else:
+        port = 5432
🧰 Tools
🪛 Ruff (0.8.0)

187-187: Avoid specifying long messages outside the exception class

(TRY003)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b435550 and be9225d.

📒 Files selected for processing (9)
  • airbyte/caches/base.py (1 hunks)
  • airbyte/caches/bigquery.py (1 hunks)
  • airbyte/caches/duckdb.py (1 hunks)
  • airbyte/caches/motherduck.py (1 hunks)
  • airbyte/caches/postgres.py (1 hunks)
  • airbyte/caches/snowflake.py (1 hunks)
  • airbyte/destinations/_util.py (3 hunks)
  • airbyte/destinations/base.py (2 hunks)
  • tests/integration_tests/cloud/test_cloud_workspaces.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte/caches/base.py
  • airbyte/destinations/base.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/test_cloud_workspaces.py

14-14: airbyte.destinations.base.Destination imported but unused

Remove unused import: airbyte.destinations.base.Destination

(F401)


47-47: Undefined name DestinationMotherDuck

(F821)

airbyte/destinations/_util.py

41-41: Unused function argument: cache_type

(ARG001)


43-43: Missing issue link on the line following this TODO

(TD003)


48-48: Unused function argument: destination_type

(ARG001)


50-50: Missing issue link on the line following this TODO

(TD003)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


187-187: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)

27-32: LGTM! Clean and well-structured test.

The test follows good practices with proper type hints and cleanup. The transition to public methods improves the test's clarity.

airbyte/caches/duckdb.py (1)

41-41: LGTM! Shall we verify the deployment integration?

The implementation looks consistent with other cache classes. Since this is part of adding Cloud deployment functionality (per PR title), should we verify how these paired destinations are used in the deployment flow?

✅ Verification successful

Let me gather more information about the deployment integration by checking the implementation of these test files and the cache usage.


The deployment integration looks solid! 🚀

The _paired_destination_name attribute is correctly integrated into the deployment flow, as evidenced by:

  • All cache classes consistently implement this attribute
  • The deployment tests in test_cloud_workspaces.py and test_cloud_sync.py verify the cache deployment functionality
  • The cache deployment is tested alongside source deployment in the connection flow

The implementation aligns perfectly with other cache types (Snowflake, BigQuery, Postgres, etc.) and is properly tested in the cloud integration test suite. Would you like to proceed with merging? 😊

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of _paired_destination_name in deployment-related code
echo "Searching for deployment-related usage..."
rg -l "deploy" | xargs rg "_paired_destination_name"

# Look for any deployment configuration or tests
echo "Checking for deployment tests..."
fd -e py | xargs rg "test.*deploy"

Length of output: 684


Script:

#!/bin/bash
# Check the implementation of deployment tests
echo "Checking deployment test implementations..."
rg -A 10 "test_deploy.*connection" tests/integration_tests/cloud/

# Look for cache usage in deployment tests
echo "Checking cache usage in deployment tests..."
rg "Cache|cache" tests/integration_tests/cloud/

# Check for paired destination usage
echo "Checking paired destination usage..."
rg "_paired_destination_name" airbyte/

Length of output: 6631

airbyte/caches/bigquery.py (1)

35-35: LGTM! The paired destination name is correctly mapped.

The addition of _paired_destination_name aligns with the standardization across cache implementations.

airbyte/caches/motherduck.py (1)

37-37: LGTM! The paired destination name is correctly mapped.

The addition of _paired_destination_name aligns with the standardization across cache implementations.

airbyte/destinations/_util.py (3)

81-81: LGTM! Explicit string conversion for destination path.

The explicit conversion of db_path to string ensures type safety.


171-177: LGTM! Good validation and secure handling of API key.

The changes properly validate the MotherDuck API key and wrap it in SecretString for enhanced security.

🧰 Tools
🪛 Ruff (0.8.0)

172-172: Avoid specifying long messages outside the exception class

(TRY003)


40-52: ⚠️ Potential issue

Consider implementing proper type mapping system? 🤔

The TODO implementations with hardcoded returns don't handle different cache/destination class types. Since we now have _paired_destination_name attributes in cache classes, we could leverage those for the mapping, wdyt?

Here's a suggestion that uses the new attributes:

def cache_type_to_destination_type(
    cache_type: type[CacheBase],
) -> type[Destination]:
-   # TODO: Fixme
-   return DestinationSnowflake
+   destination_name = getattr(cache_type, '_paired_destination_name', None)
+   if not destination_name:
+       raise ValueError(f"Cache type {cache_type.__name__} doesn't specify a paired destination")
+   
+   mapping = {
+       'destination-snowflake': DestinationSnowflake,
+       'destination-duckdb': DestinationDuckdb,
+       'destination-postgres': DestinationPostgres,
+       'destination-bigquery': DestinationBigquery,
+   }
+   if destination_name not in mapping:
+       raise ValueError(f"Unsupported destination name: {destination_name}")
+   return mapping[destination_name]
🧰 Tools
🪛 Ruff (0.8.0)

41-41: Unused function argument: cache_type

(ARG001)


43-43: Missing issue link on the line following this TODO

(TD003)


48-48: Unused function argument: destination_type

(ARG001)


50-50: Missing issue link on the line following this TODO

(TD003)

tests/integration_tests/cloud/test_cloud_workspaces.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_workspaces.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (10)
airbyte/cloud/sync_results.py (2)

173-173: Question about using asdict() for destination configuration

Is using asdict() on destination_response.configuration necessary here? If destination_response.configuration is already a dictionary, we might not need to convert it. Could there be cases where this conversion might cause issues? Wdyt?


199-199: Handling None values in metrics

Adding or 0 to bytes_synced and records_synced ensures that we return 0 when the values are None. This helps prevent errors, but should we also consider logging a warning if these values are unexpectedly None to aid in debugging? Wdyt?

Also applies to: 204-204

airbyte/cloud/workspaces.py (2)

11-11: Remove unused imports as indicated by static analysis

Static analysis tools suggest that the following imports are unused and can be removed:

  • http.client at line 11
  • pydoc.cli at line 12
  • pydantic.Secret at line 16

Cleaning up unused imports helps keep the codebase clean. Wdyt?

Also applies to: 12-12, 16-16

🧰 Tools
🪛 Ruff (0.8.0)

11-11: http.client imported but unused

Remove unused import: http.client

(F401)


23-23: Consider moving SecretString import into type-checking block

Static analysis indicates that SecretString is used only for type hints. Should we move the import into the TYPE_CHECKING block to avoid unnecessary runtime imports? Wdyt?

🧰 Tools
🪛 Ruff (0.8.0)

23-23: Move application import airbyte.secrets.base.SecretString into a type-checking block

Move into type-checking block

(TC001)

airbyte/_util/api_util.py (2)

17-17: Remove unused imports as indicated by static analysis

Static analysis tools suggest that the following imports are unused and can be removed:

  • pydoc.cli at line 17
  • token at line 18
  • airbyte.secrets.base.SecretString at line 31

Cleaning up unused imports helps maintain code cleanliness. Wdyt?

Also applies to: 18-18, 31-31

🧰 Tools
🪛 Ruff (0.8.0)

17-17: pydoc.cli imported but unused

Remove unused import: pydoc.cli

(F401)


31-31: Consider moving SecretString import into type-checking block

Since SecretString appears to be used only for type annotations, should we move the import into the TYPE_CHECKING block to avoid unnecessary runtime imports? Wdyt?

🧰 Tools
🪛 Ruff (0.8.0)

31-31: Move application import airbyte.secrets.base.SecretString into a type-checking block

Move into type-checking block

(TC001)

tests/integration_tests/cloud/conftest.py (1)

7-7: Remove unused import as indicated by static analysis

Static analysis indicates that the import pydoc.cli at line 7 is unused and can be removed. Wdyt?

🧰 Tools
🪛 Ruff (0.8.0)

7-7: pydoc.cli imported but unused

Remove unused import: pydoc.cli

(F401)

tests/integration_tests/cloud/test_cloud_api_util.py (2)

18-30: Consider enhancing test coverage for workspace retrieval.

The test verifies the basic functionality, but we could make it more robust. Would you consider adding assertions for other workspace properties or error cases? For example:

  • Invalid credentials
  • Non-existent workspace
  • Malformed workspace ID

9-9: Remove unused import.

The asdict import from dataclasses isn't used in this file.

-from dataclasses import asdict
🧰 Tools
🪛 Ruff (0.8.0)

9-9: dataclasses.asdict imported but unused

Remove unused import: dataclasses.asdict

(F401)

airbyte/destinations/_util.py (1)

183-184: Consider using custom exceptions for better error handling.

Instead of using ValueError with string messages, would you consider creating custom exceptions? This would make error handling more structured and maintainable. Something like:

class CredentialsError(Exception):
    """Raised when required credentials are missing."""
    pass

# Then use it like:
if not destination_configuration.motherduck_api_key:
    raise CredentialsError("MotherDuck API key is required")

Also applies to: 198-199

🧰 Tools
🪛 Ruff (0.8.0)

184-184: Avoid specifying long messages outside the exception class

(TRY003)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between be9225d and d8b40d1.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/cloud/connections.py (8 hunks)
  • airbyte/cloud/sync_results.py (6 hunks)
  • airbyte/cloud/workspaces.py (7 hunks)
  • airbyte/destinations/_util.py (5 hunks)
  • airbyte/exceptions.py (2 hunks)
  • airbyte/secrets/google_gsm.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/integration_tests/cloud/conftest.py (4 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (8 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (4 hunks)
  • tests/integration_tests/test_all_cache_types.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte/exceptions.py
  • airbyte/cloud/connections.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/conftest.py

7-7: pydoc.cli imported but unused

Remove unused import: pydoc.cli

(F401)

tests/integration_tests/cloud/test_cloud_api_util.py

9-9: dataclasses.asdict imported but unused

Remove unused import: dataclasses.asdict

(F401)

airbyte/cloud/workspaces.py

11-11: http.client imported but unused

Remove unused import: http.client

(F401)


12-12: pydoc.cli imported but unused

Remove unused import: pydoc.cli

(F401)


16-16: pydantic.Secret imported but unused

Remove unused import: pydantic.Secret

(F401)


23-23: Move application import airbyte.secrets.base.SecretString into a type-checking block

Move into type-checking block

(TC001)

airbyte/_util/api_util.py

17-17: pydoc.cli imported but unused

Remove unused import: pydoc.cli

(F401)


18-18: token imported but unused

Remove unused import: token

(F401)


31-31: Move application import airbyte.secrets.base.SecretString into a type-checking block

Move into type-checking block

(TC001)

airbyte/destinations/_util.py

42-42: Unused function argument: cache_type

(ARG001)


44-44: Missing issue link on the line following this TODO

(TD003)


49-49: Unused function argument: destination_type

(ARG001)


51-51: Missing issue link on the line following this TODO

(TD003)


184-184: Avoid specifying long messages outside the exception class

(TRY003)


199-199: Avoid specifying long messages outside the exception class

(TRY003)


250-250: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (14)
airbyte/cloud/workspaces.py (5)

232-235: Consider making selected_streams a required parameter

In the deploy_connection method, selected_streams has a default value of None, but the code raises an error if it's not provided:

if not selected_streams:
    raise exc.PyAirbyteInputError(
        guidance="You must provide `selected_streams` when creating a connection."
    )

Would it be clearer to make selected_streams a required parameter without a default value to reflect that it's mandatory? Wdyt?


243-248: Redundant check for selected_streams

There appears to be a second check for selected_streams being provided:

if not selected_streams:
    raise exc.PyAirbyteInputError(
        guidance=(
            "You must provide `selected_streams` when creating a connection "
            "from an existing destination."
        )
    )

Since we already have this check earlier in the method, perhaps this block can be removed to avoid redundancy. Wdyt?


100-100: ⚠️ Potential issue

Possible inconsistency in resource_type when raising duplicate resource error

In the deploy_source method, when raising AirbyteDuplicateResourcesError, the resource_type is set to "destination":

raise exc.AirbyteDuplicateResourcesError(
    resource_type="destination",
    resource_name=name,
)

Since we're dealing with sources here, should resource_type be "source" instead? Wdyt?


171-175: ⚠️ Potential issue

Use a tuple for multiple types in isinstance check

In the permanently_delete_source method, the isinstance check uses the bitwise | operator:

if not isinstance(source, str | CloudSource):
    # ...

However, isinstance expects a type or a tuple of types. Should we update this to use a tuple instead?

if not isinstance(source, (str, CloudSource)):
    # ...

Wdyt?


194-197: ⚠️ Potential issue

Use a tuple for multiple types in isinstance check

Similarly, in the permanently_delete_destination method, the isinstance check uses str | CloudDestination:

if not isinstance(destination, str | CloudDestination):
    # ...

Should we change this to a tuple for proper type checking?

if not isinstance(destination, (str, CloudDestination)):
    # ...

Wdyt?

airbyte/_util/api_util.py (1)

548-564: 🛠️ Refactor suggestion

Question: Simplify destination type handling

Currently, the code checks each destination type individually. Would it be beneficial to use a mapping or a factory pattern to reduce repetition and improve scalability? For example:

destination_type_mapping = {
    "snowflake": models.DestinationSnowflake,
    "bigquery": models.DestinationBigquery,
    "postgres": models.DestinationPostgres,
    "duckdb": models.DestinationDuckdb,
}

destination_class = destination_type_mapping.get(destination_type)
if destination_class:
    response.destination_response.configuration = destination_class(**raw_configuration)

Wdyt?

tests/integration_tests/cloud/test_cloud_sql_reads.py (2)

68-72: ⚠️ Potential issue

Fix method parameter to pass the connection object

The permanently_delete_connection method expects a connection parameter, but we're passing connection_id=connection. Should we update the call to pass the connection object directly?

cloud_workspace.permanently_delete_connection(
-    connection_id=connection,
+    connection=connection,
    delete_source=True,
    delete_destination=True,
)

Wdyt?


74-76: ⚠️ Potential issue

Update parameters to match method definitions

The permanently_delete_source and permanently_delete_destination methods now expect source and destination parameters instead of source_id and destination_id. Should we update these calls?

cloud_workspace.permanently_delete_source(
-    source_id=source_id
+    source=source_id
)
cloud_workspace.permanently_delete_destination(
-    destination_id=destination_id
+    destination=destination_id
)

Wdyt?

tests/integration_tests/cloud/test_cloud_api_util.py (1)

Line range hint 33-57: LGTM! Authentication mechanism updated correctly.

The transition from API key to client credentials is implemented properly with appropriate type annotations using SecretString.

🧰 Tools
🪛 Ruff (0.8.0)

9-9: dataclasses.asdict imported but unused

Remove unused import: dataclasses.asdict

(F401)

airbyte/destinations/_util.py (1)

41-52: ⚠️ Potential issue

Implement TODO functions with proper type mapping.

The current implementation with hardcoded returns doesn't handle different cache/destination class types. Would you consider implementing a proper mapping system? Here's a suggestion, wdyt?

def cache_type_to_destination_type(
    cache_type: type[CacheBase],
) -> type[Destination]:
-   # TODO: Fixme
-   return DestinationSnowflake
+   mapping = {
+       SnowflakeCache: DestinationSnowflake,
+       DuckDBCache: DestinationDuckdb,
+       PostgresCache: DestinationPostgres,
+       # Add other mappings as needed
+   }
+   if cache_type not in mapping:
+       raise ValueError(f"Unsupported cache type: {cache_type.__name__}")
+   return mapping[cache_type]

def destination_type_to_cache_type(
    destination_type: type[Destination],
) -> type[CacheBase]:
-   # TODO: Fixme
-   return SnowflakeCache
+   mapping = {
+       DestinationSnowflake: SnowflakeCache,
+       DestinationDuckdb: DuckDBCache,
+       DestinationPostgres: PostgresCache,
+       # Add other mappings as needed
+   }
+   if destination_type not in mapping:
+       raise ValueError(f"Unsupported destination type: {destination_type.__name__}")
+   return mapping[destination_type]

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.0)

42-42: Unused function argument: cache_type

(ARG001)


44-44: Missing issue link on the line following this TODO

(TD003)


49-49: Unused function argument: destination_type

(ARG001)


51-51: Missing issue link on the line following this TODO

(TD003)

tests/integration_tests/test_all_cache_types.py (1)

179-179: LGTM! Type annotations improve code clarity.

The addition of explicit ReadResult type annotations enhances type safety and makes the code more maintainable.

Also applies to: 194-194

airbyte/secrets/google_gsm.py (2)

Line range hint 64-108: LGTM! Robust credential handling implementation

The credential handling logic is well-structured with clear error messages and proper fallback mechanisms. Nice work! 👍


164-164: ⚠️ Potential issue

Breaking change in return type - needs verification

The return type has changed from SecretString | None to SecretString. This means the method will never return None, which could break existing code that handles the None case. Should we verify that all callers are updated to handle this change? wdyt?

pyproject.toml (1)

18-18: Verify airbyte-api version upgrade impact

The upgrade from ^0.49.2 to ^0.52.1 is a minor version bump. Should we check the changelog for any breaking changes that might affect our codebase? Also, would it be worth verifying compatibility with other dependencies? wdyt?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (16)
airbyte/destinations/_translate_cache_to_dest.py (1)

79-84: Update docstring to reflect the correct cache type

Could we update the docstring to reflect that this function gets the destination configuration from the MotherDuck cache instead of the DuckDB cache? WDYT?

Apply this diff to fix the issue:

-    """Get the destination configuration from the DuckDB cache."""
+    """Get the destination configuration from the MotherDuck cache."""
airbyte/destinations/_translate_dest_to_cache.py (2)

39-41: Update docstring to accurately describe the function purpose

Could we update the docstring to reflect that this function creates a cache from a destination configuration? WDYT?

Apply this diff to fix the issue:

-    """Get the destination configuration from the cache."""
+    """Create a cache instance from the destination configuration."""

81-84: Update docstring to reflect MotherDuck cache

Could we update the docstring to indicate that this function creates a new MotherDuck cache instead of a DuckDB cache? WDYT?

Apply this diff to fix the issue:

-    """Create a new DuckDB cache from the destination configuration."""
+    """Create a new MotherDuck cache from the destination configuration."""
🧰 Tools
🪛 Ruff (0.8.0)

83-83: Avoid specifying long messages outside the exception class

(TRY003)

airbyte/caches/duckdb.py (1)

51-54: Update docstring to reflect return type

Could we update the docstring to reflect that this method returns a DestinationDuckdb object instead of a dictionary? WDYT?

Apply this diff to fix the issue:

-    """Return a dictionary of destination configuration values."""
+    """Return a `DestinationDuckdb` configuration object."""
airbyte/caches/postgres.py (1)

53-61: Consider adding validation for database configuration

The clone method looks good, but what do you think about adding some basic validation for the configuration parameters? For instance, checking if the port is within valid range (0-65535)? wdyt?

def clone_as_cloud_destination_config(self) -> DestinationPostgres:
    """Return a DestinationPostgres instance with the same configuration."""
+   if not 0 <= self.port <= 65535:
+       raise ValueError(f"Invalid port number: {self.port}")
    return DestinationPostgres(
        host=self.host,
        port=self.port,
        username=self.username,
        password=self.password,
        database=self.database,
    )
airbyte/destinations/util.py (1)

78-81: Consider enhancing docstring with parameter description

The new parameter looks good! Would you like to add its description to the docstring? Something like:

def get_noop_destination(
    *,
    install_if_missing: bool = True,
) -> Destination:
    """Get a devnull (no-op) destination.

    This is useful for performance benchmarking of sources, without
    adding the overhead of writing data to a real destination.
+
+   Args:
+       install_if_missing: If True, automatically install the dev-null destination
+           connector if not found locally.
    """
tests/integration_tests/cloud/conftest.py (2)

112-114: Consider creating an issue for the DevNullDestination limitation

The TODO comment highlights an important limitation. Would you like me to help create a GitHub issue to track this? This could help ensure it's not forgotten in the future. 🤔


135-141: Consider updating the dictionary comment

The comment mentions "cache types" but the dictionary is now about destination fixtures. Should we update the comment to reflect this change? wdyt?

-    deployable_destination_fixtures: dict[str, str] = {
+    # Map of destination types to their respective fixture names, ordered by test execution speed
+    deployable_destination_fixtures: dict[str, str] = {
airbyte/shared/state_providers.py (1)

55-55: LGTM! Nice defensive programming! 🛡️

The additional null checks make the code more robust. Would you consider adding type hints to make the expectations even more explicit? wdyt?

-            if state_msg and state_msg.type == AirbyteStateType.STREAM and state_msg.stream
+            if state_msg is not None and state_msg.type == AirbyteStateType.STREAM and state_msg.stream is not None
tests/integration_tests/conftest.py (1)

65-66: Consider extracting the schema name prefix to a constant? 🎯

I notice "test_deleteme_" is used in multiple places. Would it make sense to extract this to a constant at the module level? This would make it easier to update if needed, wdyt?

+TEST_SCHEMA_PREFIX = "test_deleteme_"
+
 @pytest.fixture
 def new_motherduck_cache(
     motherduck_secrets,
 ) -> MotherDuckCache:
     return MotherDuckCache(
         database="my_db",  # TODO: Use a dedicated DB for testing
-        schema_name=f"test_deleteme_{text_util.generate_random_suffix()}",
+        schema_name=f"{TEST_SCHEMA_PREFIX}{text_util.generate_random_suffix()}",

Also applies to: 76-77, 114-116

airbyte/sources/util.py (1)

128-129: Could we update the docstring for the new parameter? 📝

The install_if_missing parameter has been added, but it's not documented in the function's docstring. Would you like me to suggest the docstring update? wdyt?

    Args:
        num_records (int | str): The number of records to generate. Defaults to "5e5", or
            500,000 records.
            Can be an integer (`1000`) or a string in scientific notation.
            For example, `"5e6"` will generate 5 million records.
+       install_if_missing (bool, optional): Whether to install the connector if not available.
+           Defaults to True.
tests/integration_tests/test_state_handling.py (1)

Line range hint 71-81: Consider enhancing assertion messages for better test diagnostics, wdyt?

The test verifies state persistence well, but adding descriptive messages to assertions could make debugging failures easier.

-    assert result2.processed_records == 0
+    assert result2.processed_records == 0, "Expected no new records due to state persistence"
-    assert len(list(state_provider.state_message_artifacts)) > 0
+    assert len(list(state_provider.state_message_artifacts)) > 0, "Expected state artifacts to be present"
tests/integration_tests/cloud/test_cloud_api_util.py (1)

158-164: Consider extracting the name generation pattern into a helper function, wdyt?

The pattern of creating names with "deleteme-" prefix and random suffix is repeated multiple times. Would it be clearer to extract this into a helper function?

def generate_test_resource_name(resource_type: str) -> str:
    return f"deleteme-{resource_type}-dummy{text_util.generate_random_suffix()}"
airbyte/cloud/sync_results.py (1)

199-199: Consider logging when None values are encountered

While returning 0 for None values is a safe default, it might hide underlying issues. Would it be helpful to add debug logging when None values are encountered?

 @property
 def bytes_synced(self) -> int:
+    if self._fetch_latest_job_info().bytes_synced is None:
+        logger.debug("bytes_synced returned None for job %d", self.job_id)
     return self._fetch_latest_job_info().bytes_synced or 0

Also applies to: 204-204

airbyte/cloud/workspaces.py (1)

178-182: Consider using tuple for type checking

The isinstance checks use the | operator for union types, which might not be compatible with all Python versions. Would you consider using tuples instead? Wdyt?

-    if not isinstance(source, str | CloudSource):
+    if not isinstance(source, (str, CloudSource)):

-    if not isinstance(destination, str | CloudDestination):
+    if not isinstance(destination, (str, CloudDestination)):

Also applies to: 201-204

airbyte/_util/api_util.py (1)

550-572: Consider implementing a destination type registry pattern

The current implementation uses multiple if-statements for different destination types. Would you consider using a mapping or registry pattern to make this more maintainable? Wdyt?

Example approach:

DESTINATION_TYPE_MAPPING = {
    "snowflake": models.DestinationSnowflake,
    "bigquery": models.DestinationBigquery,
    "postgres": models.DestinationPostgres,
    "duckdb": models.DestinationDuckdb,
}

# Then use:
destination_class = DESTINATION_TYPE_MAPPING.get(destination_type)
if destination_class:
    response.destination_response.configuration = destination_class(**raw_configuration)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d8b40d1 and 7c9a6ac.

📒 Files selected for processing (27)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/_util/text_util.py (1 hunks)
  • airbyte/caches/base.py (4 hunks)
  • airbyte/caches/bigquery.py (1 hunks)
  • airbyte/caches/duckdb.py (2 hunks)
  • airbyte/caches/motherduck.py (2 hunks)
  • airbyte/caches/postgres.py (2 hunks)
  • airbyte/caches/snowflake.py (1 hunks)
  • airbyte/cloud/sync_results.py (6 hunks)
  • airbyte/cloud/workspaces.py (5 hunks)
  • airbyte/destinations/_translate_cache_to_dest.py (1 hunks)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • airbyte/destinations/util.py (2 hunks)
  • airbyte/shared/state_providers.py (1 hunks)
  • airbyte/sources/__init__.py (1 hunks)
  • airbyte/sources/util.py (2 hunks)
  • pyproject.toml (2 hunks)
  • tests/conftest.py (2 hunks)
  • tests/integration_tests/cloud/conftest.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (7 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (5 hunks)
  • tests/integration_tests/cloud/test_cloud_sync.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_workspaces.py (2 hunks)
  • tests/integration_tests/conftest.py (2 hunks)
  • tests/integration_tests/test_all_cache_types.py (3 hunks)
  • tests/integration_tests/test_source_test_fixture.py (26 hunks)
  • tests/integration_tests/test_state_handling.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • airbyte/_util/text_util.py
  • airbyte/caches/snowflake.py
  • airbyte/caches/bigquery.py
  • tests/integration_tests/test_all_cache_types.py
  • tests/integration_tests/cloud/test_cloud_sync.py
  • pyproject.toml
  • airbyte/caches/base.py
  • tests/conftest.py
  • airbyte/caches/motherduck.py
  • tests/integration_tests/cloud/test_cloud_workspaces.py
  • tests/integration_tests/cloud/test_cloud_sql_reads.py
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_translate_dest_to_cache.py

83-83: Avoid specifying long messages outside the exception class

(TRY003)


98-98: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (15)
airbyte/sources/__init__.py (1)

25-25: LGTM!

The addition of registry to the imports and __all__ list looks good.

Also applies to: 32-32

airbyte/caches/postgres.py (2)

48-52: LGTM: Clean implementation of destination config translation

The property implementation looks clean and follows good practices by delegating to the translation function. Nice work! 👍


45-46: ⚠️ Potential issue

Verify destination pairing configuration

I notice that paired_destination_name is set to "destination-bigquery" while the class is for PostgresCache. Should this be "destination-postgres" instead? wdyt? 🤔

✅ Verification successful

Interesting pattern found in cache implementations! 🤔

Looking at all cache implementations (BigQuery, Snowflake, MotherDuck, and Postgres), I notice they all have paired_destination_name = "destination-bigquery". This seems to be an intentional pattern where all caches are designed to pair with BigQuery as the destination, while their respective paired_destination_config_class points to their own destination type (e.g., DestinationPostgres for PostgresCache).

This appears to be a deliberate architectural decision rather than a mistake. The code is consistent across all implementations and the translation logic in _translate_cache_to_dest.py supports this pattern.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any references to this pairing in the codebase
rg -l "paired_destination_name.*destination-bigquery.*PostgresCache" 
rg -l "paired_destination_name.*destination-postgres.*PostgresCache"

Length of output: 141


Script:

#!/bin/bash
# Let's check the base class and related cache implementations
ast-grep --pattern 'class $_Cache:
  $$$
  paired_destination_name: ClassVar[str | None] = $_
  $$$'

# Also check for any other cache classes with destination pairing
ast-grep --pattern 'class $_ {
  $$$
  paired_destination_name = $_
  $$$
}'

# Let's look at the PostgresCache class implementation
ast-grep --pattern 'class PostgresCache($_):
  $$$'

# Check for any test files that might validate this pairing
fd -e py -exec rg -l "PostgresCache.*destination.*bigquery" {} \;

Length of output: 527


Script:

#!/bin/bash
# Let's try a simpler search for the file content
cat airbyte/caches/postgres.py

# Search for any test files related to PostgresCache
fd -e py test | xargs rg -l "PostgresCache"

# Look for any BigQuery related configurations
rg -l "destination-bigquery"

Length of output: 2371


Script:

#!/bin/bash
# Let's check other cache implementations to understand the pairing pattern
cat airbyte/caches/bigquery.py
cat airbyte/caches/snowflake.py
cat airbyte/caches/motherduck.py

# Also check the destination translation implementation
cat airbyte/destinations/_translate_cache_to_dest.py

Length of output: 10968

tests/integration_tests/cloud/conftest.py (1)

53-54: LGTM: Clean credentials handling implementation

The transition from API key to client credentials is well implemented with proper typing and JSON parsing. Nice work on improving security! 🔐

Also applies to: 57-63

tests/integration_tests/conftest.py (1)

93-104: LGTM! Nice cleanup handling! 🧹

The schema cleanup in the teardown is well implemented. The use of SQLAlchemy's future mode and proper connection handling is great!

airbyte/sources/util.py (1)

171-171: LGTM! Nice parameter forwarding! 🎯

The install_if_missing parameter is correctly forwarded to get_source, maintaining consistency in behavior.

tests/integration_tests/test_state_handling.py (2)

141-143: LGTM! Good use of type annotations

The explicit type annotation for state_provider improves code clarity and maintainability.


132-133: Consider addressing the type system issue instead of suppressing it

The type ignore comment suggests there's an underlying issue with the type system. Would it be better to properly define the missing 'global' class or update the type hints to match the actual types being used?

airbyte/cloud/sync_results.py (1)

138-138: ⚠️ Potential issue

Verify compatibility with job_id type change

The job_id type has changed from str to int. This could be a breaking change if any code expects string job IDs.

✅ Verification successful

Let me search for broader patterns to understand how job_id is used across the codebase. This will help us verify if the type change from str to int could cause compatibility issues.


The previous search revealed two instances of job_id typed as int, but let's search more broadly without file type restrictions to ensure we catch all usages and potential type conversions.


The job_id type change looks safe to me - what do you think? 🤔

From the codebase analysis, all job_id type hints are consistently using int or int | None. The only string conversion happens in the API utility layer (_util/api_util.py) where it's explicitly cast to string for API resource naming, which is a safe practice. No code expects or relies on job_id being a string type.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for job_id usage in the codebase
ast-grep --pattern 'job_id: str'
rg "job_id.*=.*str\(" -A 2

Length of output: 63


Script:

#!/bin/bash
# Let's search for job_id usage patterns more broadly
rg "job_id" -t python -A 2

# Also search for type hints or annotations related to job_id
rg "job_id.*:" -t python -A 2

# Look for any job_id in function signatures
ast-grep --pattern 'def $_(job_id$_)'

# Check for job_id in class attributes or type definitions
ast-grep --pattern 'job_id: $_'

Length of output: 300


Script:

#!/bin/bash
# Let's try with ripgrep without file type filtering
rg "job_id" -A 2

# Search for any string conversions or type casting of job_id
rg "str\(.*job_id" -A 2
rg "job_id.*str\(" -A 2

# Look for job_id in function calls or assignments
rg "job_id\s*=" -A 2
rg "job_id:" -A 2

Length of output: 6211

airbyte/cloud/workspaces.py (3)

36-43: LGTM: Secure handling of credentials

The post-init method properly ensures that both client ID and secret are handled securely using SecretString.


94-100: ⚠️ Potential issue

Incorrect resource type in error message

The error message for duplicate source names incorrectly uses "destination" as the resource type.

-                    resource_type="destination",
+                    resource_type="source",

Likely invalid or redundant comment.


239-255: 🛠️ Refactor suggestion

Redundant validation check

There are two identical checks for selected_streams. The second check appears redundant since we already validate this condition earlier. Wdyt?

     if not selected_streams:
         raise exc.PyAirbyteInputError(
             guidance="You must provide `selected_streams` when creating a connection."
         )

     # Resolve source ID
     source_id: str
     source_id = source if isinstance(source, str) else source.connector_id

     # destination is already deployed
     destination_id = destination if isinstance(destination, str) else destination.connector_id
-    if not selected_streams:
-        raise exc.PyAirbyteInputError(
-            guidance=(
-                "You must provide `selected_streams` when creating a connection "
-                "from an existing destination."
-            )
-        )

Likely invalid or redundant comment.

airbyte/_util/api_util.py (1)

52-63: LGTM: Secure token URL construction

The authentication implementation correctly constructs the token URL and handles credentials securely.

tests/integration_tests/test_source_test_fixture.py (2)

77-78: LGTM: Improved type annotations

The change from SqlProcessorBase to CacheBase makes the function more flexible by accepting any cache implementation.


419-420: LGTM: Consistent random suffix generation

Good use of text_util.generate_random_suffix() for generating cache names, replacing the previous ulid usage.

Also applies to: 496-496

airbyte/destinations/_translate_cache_to_dest.py Outdated Show resolved Hide resolved
airbyte/destinations/_translate_dest_to_cache.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_api_util.py Outdated Show resolved Hide resolved
airbyte/cloud/sync_results.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)
tests/integration_tests/cloud/conftest.py (4)

53-54: Consider a more descriptive type alias name?

The type alias CloudAPICreds is a bit generic. What do you think about something more specific like CloudAPIClientCredentials to better reflect that it's specifically for client credentials? wdyt? 🤔


112-114: Should we track the DevNullDestination limitation?

The TODO comment mentions a Cloud limitation preventing the use of DevNullDestination. Would it be helpful to create a GitHub issue to track this limitation and its workaround? I can help create one if you'd like! 🤝


135-140: Clean up commented destinations?

There are some commented-out destinations in the test generation. Should we remove these if they're not being used, or are they planned for future implementation? wdyt? 🧹


107-116: Consider documenting the BigQuery workaround?

Since we're using BigQuery as a temporary workaround for DevNullDestination, should we add a note in the docstring explaining why this specific destination was chosen and any performance implications? This could help future maintainers understand the decision. wdyt? 📝

 def deployable_dummy_destination(
     new_bigquery_destination: Destination,
 ) -> Destination:
-    """A local PyAirbyte `Destination` object.
+    """A local PyAirbyte `Destination` object.
+    
+    Note: Currently using BigQuery as a workaround since DevNullDestination
+    is not supported in Cloud. While this works, it may have performance
+    implications for tests.
 
     # TODO: Use DevNullDestination instead of BigQueryDestination.
     Problem is that 'dev-null' is not accepted on Cloud as of now.
tests/integration_tests/test_source_test_fixture.py (2)

34-35: Consider adding docstrings to test functions?

The type annotations are great! Would you consider adding docstrings to test functions to explain the test scenarios and expectations? This could help other developers understand the test cases better, wdyt?

Example:

def test_registry_get() -> None:
    """Test that the registry correctly returns metadata for a known connector.
    
    Expects:
    - Metadata object is returned
    - Correct name and version are set
    """

Also applies to: 100-101, 145-147, 194-214, 222-227, 237-238, 302-310, 311-316, 882-887, 889-896, 899-908, 910-944


77-78: Consider adding validation for None streams?

The type update to allow streams: list[str] | None is good! Would you consider adding explicit validation when streams is None but the cache is empty? This could provide clearer error messages, wdyt?

def assert_data_matches_cache(
    expected_test_stream_data: dict[str, list[dict[str, str | int]]],
    cache: CacheBase,
    streams: list[str] | None = None,
) -> None:
+   if streams is None and not expected_test_stream_data:
+       raise ValueError("No streams specified and expected_test_stream_data is empty")
    for stream_name in streams or expected_test_stream_data.keys():
airbyte/destinations/_translate_dest_to_cache.py (2)

50-64: Consider simplifying destination type extraction

The current logic for extracting destination type is a bit complex. What do you think about simplifying it like this? WDYT?

-    if isinstance(destination_configuration, dict):
-        try:
-            destination_type = (
-                destination_configuration.get("DESTINATION_TYPE")
-                or destination_configuration["destinationType"]
-            )
-            if hasattr(destination_configuration, "value"):
-                destination_type = destination_type.value
-            else:
-                destination_type = str(destination_type)
-        except KeyError as ex:
-            raise ValueError(
-                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}."
-            ) from ex
-    else:
-        destination_type = str(destination_configuration.DESTINATION_TYPE)
+    try:
+        if isinstance(destination_configuration, dict):
+            destination_type = str(
+                destination_configuration.get("DESTINATION_TYPE") 
+                or destination_configuration["destinationType"]
+            )
+            if hasattr(destination_type, "value"):
+                destination_type = destination_type.value
+        else:
+            destination_type = str(destination_configuration.DESTINATION_TYPE)
+    except (KeyError, AttributeError) as ex:
+        raise ValueError(
+            "Could not determine destination type. Ensure 'destinationType' or 'DESTINATION_TYPE' is provided."
+        ) from ex
🧰 Tools
🪛 Ruff (0.8.0)

60-62: Avoid specifying long messages outside the exception class

(TRY003)


113-115: Consider moving validation before port parsing

What do you think about moving the password validation before the port parsing? This way, we fail fast if the required credential is missing. WDYT?

+    if not destination_configuration.password:
+        raise ValueError("Password is required for Postgres cache.")
+
     port: int = int(destination_configuration.port) if destination_configuration.port else 5432
-    if not destination_configuration.password:
-        raise ValueError("Password is required for Postgres cache.")
🧰 Tools
🪛 Ruff (0.8.0)

115-115: Avoid specifying long messages outside the exception class

(TRY003)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9a6ac and d58bf93.

📒 Files selected for processing (6)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • tests/integration_tests/cloud/conftest.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_sync.py (3 hunks)
  • tests/integration_tests/conftest.py (2 hunks)
  • tests/integration_tests/test_source_faker_integration.py (0 hunks)
  • tests/integration_tests/test_source_test_fixture.py (26 hunks)
💤 Files with no reviewable changes (1)
  • tests/integration_tests/test_source_faker_integration.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration_tests/cloud/test_cloud_sync.py
  • tests/integration_tests/conftest.py
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_translate_dest_to_cache.py

60-62: Avoid specifying long messages outside the exception class

(TRY003)


100-100: Avoid specifying long messages outside the exception class

(TRY003)


115-115: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
tests/integration_tests/cloud/conftest.py (1)

85-94: Clean transition to client credentials! 👍

The updated cloud_workspace fixture nicely handles the transition from API key to client credentials. The parameter ordering and type hints look great!

tests/integration_tests/test_source_test_fixture.py (2)

21-24: LGTM!

The new imports are well-organized and necessary for the type annotations and text utility changes.


419-419: LGTM! Verify text_util usage across codebase.

The switch to text_util.generate_random_suffix() looks good and provides consistency. Let's verify this pattern is used consistently across the codebase.

Also applies to: 439-439, 496-496

✅ Verification successful

LGTM! The codebase shows consistent usage patterns

The verification shows a clean transition pattern:

  • text_util.generate_random_suffix() is consistently used across test files for generating unique identifiers
  • The underlying implementation in text_util.py actually uses ULID internally, making this a proper abstraction
  • Remaining ULID usages are either in dependencies (pyproject.toml, poetry.lock) or part of the core functionality where direct ULID usage is appropriate
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining ulid usage and verify text_util adoption
# Test: Search for ulid usage that might need updating
rg 'ulid'

# Test: Verify consistent usage of text_util.generate_random_suffix
rg 'text_util\.generate_random_suffix'

Length of output: 3980

airbyte/destinations/_translate_dest_to_cache.py (2)

43-43: Typo in function name mapping persists

The typo "biqquery" still exists in the function mapping. Let's fix this to maintain consistency?


136-136: Add validation for Snowflake host format

The host parsing assumes a specific format. Should we add validation to ensure the host contains ".snowflakecomputing"? WDYT?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
airbyte/destinations/_translate_dest_to_cache.py (3)

61-64: Enhance error message with valid destination types

What do you think about making the error message more helpful by including the list of valid destination types? This could help users quickly identify the issue. wdyt?

-                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}."
+                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}. "
+                f"Valid types are: {list(conversion_fn_map.keys())}"

101-103: Consider creating a specific MotherDuck destination type

The function accepts DestinationDuckdb but is specifically for MotherDuck. Should we create a separate DestinationMotherduck type to make this more explicit and type-safe? wdyt?


119-119: Consider extracting default port as a constant

What do you think about moving the default port value to a constant at the module level? This would make it easier to maintain and more visible. wdyt?

+DEFAULT_POSTGRES_PORT = 5432
+
 def postgres_destination_to_cache(
     destination_configuration: DestinationPostgres,
 ) -> PostgresCache:
     """Create a new Postgres cache from the destination configuration."""
-    port: int = int(destination_configuration.port) if destination_configuration.port else 5432
+    port: int = int(destination_configuration.port) if destination_configuration.port else DEFAULT_POSTGRES_PORT
tests/integration_tests/cloud/test_cloud_sql_reads.py (1)

136-149: Consider extracting common test parameters

These parameters are duplicated across multiple test functions. Should we extract them into a shared fixture or constant to make maintenance easier? wdyt?

+CLOUD_CONNECTION_TEST_CASES = [
+    pytest.param("c7b4d838-a612-495a-9d91-a14e477add51", id="Faker->Snowflake"),
+    pytest.param("0e1d6b32-b8e3-4b68-91a3-3a314599c782", id="Faker->BigQuery"),
+    pytest.param("", id="Faker->Postgres", marks=pytest.mark.skip(reason="Not yet supported")),
+    pytest.param("", id="Faker->MotherDuck", marks=pytest.mark.skip(reason="Not yet supported")),
+]

-@pytest.mark.parametrize(
-    "deployed_connection_id",
-    [
-        pytest.param("c7b4d838-a612-495a-9d91-a14e477add51", id="Faker->Snowflake"),
-        ...
-    ],
-)
+@pytest.mark.parametrize("deployed_connection_id", CLOUD_CONNECTION_TEST_CASES)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d58bf93 and 413abf4.

📒 Files selected for processing (4)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • pyproject.toml (3 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (6 hunks)
  • tests/integration_tests/test_source_faker_integration.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • tests/integration_tests/test_source_faker_integration.py
🔇 Additional comments (1)
airbyte/destinations/_translate_dest_to_cache.py (1)

43-43: Fix typo in function name mapping

The mapping uses "biqquery" instead of "bigquery" which could cause runtime errors.

tests/integration_tests/cloud/test_cloud_sql_reads.py Outdated Show resolved Hide resolved
tests/integration_tests/cloud/test_cloud_sql_reads.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (13)
airbyte/destinations/_translate_dest_to_cache.py (4)

51-65: Consider simplifying the destination type extraction logic

The current implementation has multiple nested conditions. What do you think about simplifying it like this? WDYT?

     if isinstance(destination_configuration, dict):
         try:
-            destination_type = (
-                destination_configuration.get("DESTINATION_TYPE")
-                or destination_configuration["destinationType"]
-            )
-            if hasattr(destination_type, "value"):
-                destination_type = destination_type.value
-            elif hasattr(destination_type, "_value_"):
-                destination_type = destination_type._value_
-            else:
-                destination_type = str(destination_type)
+            dest_type = destination_configuration.get("DESTINATION_TYPE") or destination_configuration["destinationType"]
+            destination_type = getattr(dest_type, "value", None) or getattr(dest_type, "_value_", None) or str(dest_type)
         except KeyError as ex:
             raise ValueError(
-                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}."
+                f"Missing destination type. Available keys: {', '.join(destination_configuration.keys())}"
             ) from ex

146-164: Consider simplifying Snowflake password handling logic

The password extraction logic is quite complex. Would you consider simplifying it like this? WDYT?

-    snowflake_password: str | None = None
-    if (
-        destination_configuration.credentials
-        and hasattr(destination_configuration.credentials, "password")
-        and isinstance(destination_configuration.credentials.password, str)  # type: ignore [attr-defined]
-    ):
-        destination_password = str(destination_configuration.credentials.password)  # type: ignore [attr-defined]
-        if "****" in destination_password:
-            try:
-                snowflake_password = get_secret(password_secret_name)
-            except ValueError as ex:
-                raise PyAirbyteSecretNotFoundError(
-                    "Password is required for Snowflake cache, but it was not available."
-                ) from ex
-        else:
-            snowflake_password = get_secret(destination_password)
-    else:
-        snowflake_password = get_secret(password_secret_name)
+    try:
+        if hasattr(destination_configuration.credentials, "password"):
+            password = str(destination_configuration.credentials.password)  # type: ignore [attr-defined]
+            secret_name = password_secret_name if "****" in password else password
+        else:
+            secret_name = password_secret_name
+        
+        snowflake_password = get_secret(secret_name)
+    except ValueError as ex:
+        raise PyAirbyteSecretNotFoundError(
+            "Password is required for Snowflake cache, but it was not available."
+        ) from ex

150-150: Consider using isinstance with typing.cast for better type safety

Instead of using type ignore, we could use typing.cast for better type safety. WDYT?

-        and isinstance(destination_configuration.credentials.password, str)  # type: ignore [attr-defined]
+        and isinstance(typing.cast(Any, destination_configuration.credentials).password, str)

73-90: Consider enhancing docstrings with parameter and return type descriptions

The docstrings could be more informative. Would you like to add parameter and return type descriptions? For example:

def biqquery_destination_to_cache(
    destination_configuration: DestinationBigquery | dict[str, Any],
) -> BigQueryCache:
    """Create a new BigQuery cache from the destination configuration.

    Args:
        destination_configuration: BigQuery destination configuration object or dict
            containing project_id and dataset_id.

    Returns:
        BigQueryCache: Configured BigQuery cache instance.

    Note:
        We may have to inject credentials, because they are obfuscated when config
        is returned from the REST API.
    """

Also applies to: 92-99, 102-113

tests/integration_tests/cloud/conftest.py (2)

6-6: Remove unused import?

The auto import from enum appears to be unused in this file. Should we remove it to keep the imports clean? wdyt?

-from enum import auto
🧰 Tools
🪛 Ruff (0.8.0)

6-6: enum.auto imported but unused

Remove unused import: enum.auto

(F401)


124-127: Track TODO for DevNullDestination implementation

The comment indicates that using DevNullDestination would be preferable but is currently blocked by Cloud limitations. Would you like me to create a GitHub issue to track this improvement? This could help with test performance and simplicity once Cloud supports it.

tests/integration_tests/cloud/test_cloud_sql_reads.py (2)

12-12: Remove unused import?

The CloudSource import appears to be unused. Should we remove it? wdyt?

-from airbyte.cloud.connectors import CloudSource
🧰 Tools
🪛 Ruff (0.8.0)

12-12: airbyte.cloud.connectors.CloudSource imported but unused

Remove unused import: airbyte.cloud.connectors.CloudSource

(F401)


30-84: Consider removing commented-out test?

Since this test is redundant due to Cloud auto-retries, should we remove it entirely instead of keeping it commented out? This would improve code maintainability. wdyt?

If you decide to keep it, we should add the missing import:

+from airbyte.utils import text_util
tests/integration_tests/cloud/test_cloud_workspaces.py (5)

14-14: Remove unnecessary empty line, wdyt?

This empty line doesn't serve any purpose and can be removed to maintain a cleaner file structure.


19-19: Enhance docstring to be more descriptive, wdyt?

The current docstring mentions "source" but this is a destination test. Consider updating it to something like:

-    """Test deploying a source to a workspace."""
+    """Test deploying a destination to a workspace and verify cleanup.
+    
+    This test verifies that:
+    1. A destination can be successfully deployed with a random name suffix
+    2. The deployed destination can be properly cleaned up
+    """

37-41: Consider parameterizing the source configuration, wdyt?

The source configuration is currently hardcoded. To make the test more maintainable and reusable, consider moving the configuration to a fixture or test constants:

-    cloud_source: CloudSource = cloud_workspace.deploy_source(
-        name="test-source",
-        source=source,
-    )
+    cloud_source: CloudSource = cloud_workspace.deploy_source(
+        name="test-source",
+        source=source,
+        random_name_suffix=True,  # For consistency with other tests
+    )

51-54: Consider adding random_name_suffix for consistency, wdyt?

Other test functions use random_name_suffix=True to prevent naming conflicts. Should we add it here too?

     cloud_source: CloudSource = cloud_workspace.deploy_source(
         name="test-source",
         source=deployable_dummy_source,
+        random_name_suffix=True,
     )

63-63: Update docstring to reflect current functionality, wdyt?

The docstring still mentions "cache" but the function now tests source and destination deployment:

-    """Test deploying a source and cache to a workspace as a new connection."""
+    """Test deploying a source and destination to a workspace as a new connection.
+    
+    Verifies that:
+    1. Source and destination can be deployed with random name suffixes
+    2. A connection can be established between them
+    3. Resources are properly cleaned up using cascade deletion
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 413abf4 and 3e16ee4.

📒 Files selected for processing (4)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • tests/integration_tests/cloud/conftest.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (5 hunks)
  • tests/integration_tests/cloud/test_cloud_workspaces.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/test_cloud_sql_reads.py

12-12: airbyte.cloud.connectors.CloudSource imported but unused

Remove unused import: airbyte.cloud.connectors.CloudSource

(F401)

tests/integration_tests/cloud/conftest.py

6-6: enum.auto imported but unused

Remove unused import: enum.auto

(F401)

🔇 Additional comments (6)
airbyte/destinations/_translate_dest_to_cache.py (1)

44-44: Typo in function name biqquery_destination_to_cache

The function name contains a typo: "biqquery" should be "bigquery".

tests/integration_tests/cloud/conftest.py (2)

57-80: LGTM! Great security improvement

The transition from API key to client credentials (client_id and client_secret) is a more secure approach, following OAuth best practices.


164-184: LGTM! Well-structured credential management

The session-scoped fixtures for BigQuery and Snowflake credentials are well-implemented with proper:

  • Credential retrieval from secure storage
  • Environment variable setup
  • Cleanup in fixture teardown

Also applies to: 187-205

tests/integration_tests/cloud/test_cloud_sql_reads.py (2)

174-176: Add cache type verification?

Should we enhance the test by verifying the cache type matches the expected destination? This was suggested in a previous review. wdyt?

 cache = sync_result.get_sql_cache()
 sqlalchemy_url = cache.get_sql_alchemy_url()
 engine: Engine = sync_result.get_sql_engine()
+
+# Verify cache type based on destination
+if deployed_connection_id == "c7b4d838-a612-495a-9d91-a14e477add51":
+    assert isinstance(cache, SnowflakeCache)
+elif deployed_connection_id == "0e1d6b32-b8e3-4b68-91a3-3a314599c782":
+    assert isinstance(cache, BigQueryCache)

140-154: LGTM! Well-structured test parameterization

The test parameterization is clear and well-organized with:

  • Descriptive test IDs
  • Clear skip markers for unsupported destinations
  • Consistent structure across test functions
tests/integration_tests/cloud/test_cloud_workspaces.py (1)

81-82: Are these assertions still relevant, wdyt?

There are commented out assertions that might be important for testing. If they're still relevant, should we uncomment and update them? If not, should we remove them?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)
airbyte/destinations/_translate_cache_to_dest.py (4)

11-11: Consider removing the commented import

Hey! I noticed there's a commented out import for StandardInserts. Should we remove it if it's not being used? This would help keep the imports clean and avoid confusion. WDYT? 🤔


43-54: Consider documenting the purpose of duplicate mappings

I see we have both class names (e.g., "BigQueryCache") and lowercase variants (e.g., "bigquery") in the mapping. Could we add a comment explaining why both forms are needed? This would help future maintainers understand the design decision. WDYT? 🤔


107-109: Consider documenting the case transformations

I notice we're using .upper() for database and schema names in Snowflake. Could we add a comment explaining why this is necessary? This would help others understand Snowflake's case sensitivity requirements. WDYT? 🤔


130-130: Consider making the dataset location configurable

The BigQuery dataset location is hardcoded to "US". Should we make this configurable through the cache configuration to support other regions? This would provide more flexibility for users with data residency requirements. WDYT? 🤔

tests/integration_tests/cloud/test_cloud_sync.py (1)

72-79: Consider enhancing the error message with more details.

The error handling is good, but would it be helpful to include what types are actually supported? For example:

-                "new_deployable_destination must be a dictionary or a dataclass. "
-                f"Instead, it is a {type(new_deployable_destination)}."
+                "new_deployable_destination must be a dictionary or a dataclass that can be converted to dict. "
+                f"Got {type(new_deployable_destination)}. Supported types: dict, @dataclass-decorated classes."

wdyt?

airbyte/sources/util.py (1)

140-144: Consider enhancing the install_if_missing parameter documentation.

Would it be helpful to explain the behavior when set to False? For example:

-        install_if_missing: Whether to install the source if it is not available locally.
+        install_if_missing: Whether to install the source if it is not available locally.
+            When False, raises an error if the source is not already installed.

wdyt?

airbyte/_util/api_util.py (2)

132-138: Standardize error handling across list functions

The error handling patterns in list_* functions are similar but not identical. Should we extract this into a common helper? Wdyt?

Example approach:

def handle_list_response(response, resource_type: str, workspace_id: str) -> None:
    if not status_ok(response.status_code):
        raise AirbyteError(
            context={
                "workspace_id": workspace_id,
                "resource_type": resource_type,
                "response": response,
            }
        )

Also applies to: 175-181, 215-221, 253-259


616-631: Consider using a configuration class for connection creation

The function has many parameters (hence the noqa comment). Should we introduce a ConnectionConfig class to group these parameters? Wdyt?

Example approach:

@dataclass
class ConnectionConfig:
    name: str
    source_id: str
    destination_id: str
    workspace_id: str | None
    prefix: str
    selected_stream_names: list[str]

def create_connection(
    config: ConnectionConfig,
    *,
    api_root: str,
    client_id: SecretString,
    client_secret: SecretString,
) -> models.ConnectionResponse:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d52230 and 90bf1a2.

📒 Files selected for processing (9)
  • airbyte/_message_iterators.py (3 hunks)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/cloud/workspaces.py (4 hunks)
  • airbyte/destinations/_translate_cache_to_dest.py (1 hunks)
  • airbyte/sources/util.py (3 hunks)
  • tests/integration_tests/cloud/conftest.py (5 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (7 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (4 hunks)
  • tests/integration_tests/cloud/test_cloud_sync.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte/_message_iterators.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py
  • tests/integration_tests/cloud/conftest.py
🔇 Additional comments (15)
airbyte/destinations/_translate_cache_to_dest.py (2)

57-60: Consider combining exception messages in the ValueError

Could we merge the exception messages into a single string to prevent the error message from being displayed as a tuple? WDYT?

-        raise ValueError(
-            "Cannot convert cache type to destination configuration. Cache type not supported. ",
-            f"Supported cache types: {list(conversion_fn_map.keys())}",
-        )
+        raise ValueError(
+            "Cannot convert cache type to destination configuration. Cache type not supported. "
+            f"Supported cache types: {list(conversion_fn_map.keys())}"
+        )

1-133: Overall implementation looks great! 👍

The code is well-structured, properly typed, and implements a clean translation layer between cache and destination configurations. The conversion functions are clear and handle their specific cases well.

tests/integration_tests/cloud/test_cloud_sync.py (2)

35-37: LGTM! Good practice to verify connection first.

Getting the connection before running sync ensures the connection exists and is accessible.


47-49: LGTM! Consistent with the pattern of verifying connection first.

Getting the connection before accessing sync logs follows the same good practice as the previous change.

airbyte/sources/util.py (1)

128-129: LGTM! Consistent parameter addition.

The new parameter maintains consistency with the get_source function's interface.

tests/integration_tests/cloud/test_cloud_api_util.py (3)

159-165: LGTM! Good use of consistent naming and utilities.

The use of text_util.generate_random_suffix() and clear naming conventions makes the test more maintainable.


14-15: LGTM! Good security practice with SecretString.

Proper use of SecretString type for sensitive credentials enhances security.

Also applies to: 20-21, 90-91, 120-121, 155-156


74-84: ⚠️ Potential issue

Critical: Function is using wrong API call

The test function test_list_destinations is actually calling api_util.list_sources() instead of api_util.list_destinations(). This means the test is not testing what it claims to test.

-    result: list[SourceResponse] = api_util.list_sources(
+    result: list[DestinationResponse] = api_util.list_destinations(
         workspace_id=workspace_id,
         api_root=airbyte_cloud_api_root,
         client_id=airbyte_cloud_client_id,
         client_secret=airbyte_cloud_client_secret,
     )
     assert (
         result
         and len(result) > 0
-        and all(isinstance(source, SourceResponse) for source in result)
+        and all(isinstance(destination, DestinationResponse) for destination in result)
     )
airbyte/cloud/workspaces.py (6)

36-43: LGTM! Secure credential handling implementation.

The transition to client credentials with SecretString wrapper provides better security than the previous API key approach.


250-255: Remove redundant selected_streams check

There's a duplicate check for selected_streams. Since we already verify this earlier in the method, should we remove this second check? Wdyt?


97-100: ⚠️ Potential issue

Fix incorrect resource type in error message

The error message uses "destination" for a source deployment.

Apply this fix:

 raise exc.AirbyteDuplicateResourcesError(
-    resource_type="destination",
+    resource_type="source",
     resource_name=name,
 )

178-182: ⚠️ Potential issue

Update isinstance check to use tuple syntax

The isinstance check should use a tuple for multiple types.

Apply this fix:

-if not isinstance(source, str | CloudSource):
+if not isinstance(source, (str, CloudSource)):

201-204: ⚠️ Potential issue

Update isinstance check to use tuple syntax

The isinstance check should use a tuple for multiple types.

Apply this fix:

-if not isinstance(destination, str | CloudDestination):
+if not isinstance(destination, (str, CloudDestination)):

222-224: 🛠️ Refactor suggestion

Consider making selected_streams a required parameter

Since selected_streams is mandatory, should we make it a required parameter without a default value? Wdyt?

-selected_streams: list[str] | None = None,
+selected_streams: list[str],
airbyte/_util/api_util.py (1)

556-572: Consider using a mapping pattern for destination types

The current implementation checks each destination type individually. Would it be cleaner to use a mapping? Wdyt?

Example approach:

DESTINATION_TYPE_MAPPING = {
    "snowflake": models.DestinationSnowflake,
    "bigquery": models.DestinationBigquery,
    "postgres": models.DestinationPostgres,
    "duckdb": models.DestinationDuckdb,
}

destination_type = raw_response.get("destinationType")
if destination_type in DESTINATION_TYPE_MAPPING:
    model_class = DESTINATION_TYPE_MAPPING[destination_type]
    response.destination_response.configuration = model_class(**raw_configuration)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (18)
tests/integration_tests/test_source_test_fixture.py (2)

48-53: Great addition of descriptive assertion messages! Consider standardizing the message format.

The added error messages will make test failures more actionable. However, I notice slight inconsistencies in message formatting:

  • Some end with periods, others don't
  • Some use single quotes, others don't

What do you think about standardizing these for consistency? Here's a suggestion:

-            f"Column '{internal_column}' should exist in stream data."
+            f"Column '{internal_column}' should exist in stream data"

Also applies to: 64-70, 706-708, 711-713, 718-720


34-34: Excellent type annotation additions! Consider adding docstring type hints too.

The return type annotations make the code more maintainable and help catch type-related issues early. What do you think about also adding type hints in the docstrings for better IDE support? For example:

def test_registry_get() -> None:
    """Test registry get functionality.
    
    Returns:
        None
    """

Also applies to: 100-100, 145-145, 194-194, 214-214, 222-222, 227-227, 237-237, 311-311, 363-363, 381-383, 415-415, 435-435, 488-488, 527-527, 553-553, 575-575, 676-676, 786-786, 845-845, 882-882, 889-889, 899-899, 910-910

tests/integration_tests/test_bigquery_cache.py (2)

17-19: How about enhancing the error message for easier debugging? wdyt?

The assertion looks good, but we could make debugging easier by including the actual values in the error message. Here's a suggestion:

 assert new_bigquery_cache.dataset_name == new_bigquery_cache.schema_name, (
-    "Dataset name should be the same as schema name."
+    f"Dataset name '{new_bigquery_cache.dataset_name}' should be the same as schema name '{new_bigquery_cache.schema_name}'."
 )

25-27: Two small suggestions to improve this assertion, what do you think?

  1. For consistency with the previous assertion style and to avoid unnecessary method calls in tests, we could use property access if available.
  2. The error message could be more descriptive by including the actual values.
-    assert new_bigquery_cache.get_database_name() == new_bigquery_cache.project_name, (
-        "Database name should be the same as project name."
+    assert new_bigquery_cache.database_name == new_bigquery_cache.project_name, (
+        f"Database name '{new_bigquery_cache.database_name}' should be the same as project name '{new_bigquery_cache.project_name}'."
 )

Note: Only apply the property access change if database_name is available as a property. If not, we can keep the method call but still enhance the error message.

tests/unit_tests/test_exceptions.py (1)

24-26: LGTM! Clean formatting change! ✨

The multi-line assertion improves readability. Hey, what do you think about adding a test case for nested exception inheritance? It could help ensure subclasses also follow the naming convention, wdyt?

airbyte/_writers/jsonl.py (1)

37-38: The type hint change looks good! A couple of thoughts...

  1. The string literal type hint "IO[str]" is a good practice for forward compatibility 👍

  2. I notice the comment "Avoiding context manager" - would it be helpful to add a brief docstring explaining why we're managing the file handle manually? This could help future maintainers understand the intentional design choice, wdyt?

examples/run_bigquery_destination.py (1)

22-29: Hey there! Mind if I suggest a small readability improvement? 🤔

The current parentheses structure works, but we could make it even more readable by using a different line break pattern. What do you think about this approach? wdyt?

-bigquery_destination_secret: dict = (
-    GoogleGSMSecretManager(
-        project=AIRBYTE_INTERNAL_GCP_PROJECT,
-        credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"),
-    )
-    .get_secret(SECRET_NAME)
-    .parse_json()
-)
+bigquery_destination_secret: dict = GoogleGSMSecretManager(
+    project=AIRBYTE_INTERNAL_GCP_PROJECT,
+    credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"),
+).get_secret(SECRET_NAME).parse_json()
examples/run_bigquery_faker.py (1)

23-30: For consistency, shall we apply the same readability improvement here? 🎨

Similar to the previous file, we could make this more readable. This would keep both files consistent in their style. What are your thoughts? wdyt?

-bigquery_destination_secret: dict = (
-    GoogleGSMSecretManager(
-        project=AIRBYTE_INTERNAL_GCP_PROJECT,
-        credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"),
-    )
-    .get_secret(SECRET_NAME)
-    .parse_json()
-)
+bigquery_destination_secret: dict = GoogleGSMSecretManager(
+    project=AIRBYTE_INTERNAL_GCP_PROJECT,
+    credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"),
+).get_secret(SECRET_NAME).parse_json()
airbyte/_util/name_normalizers.py (1)

73-73: LGTM! Consider adding more test cases?

The change to use a raw string for the regex pattern is a good practice. While reviewing the examples in the docstring, I noticed they cover most cases but might miss some edge cases. Would you consider adding test cases for:

  • Unicode characters (e.g., "héllo")
  • Emojis (e.g., "hello👋")
  • Multiple consecutive special characters (e.g., "hello!!!world")

wdyt?

airbyte/secrets/util.py (1)

66-66: LGTM! Consider using TYPE_CHECKING for cleaner imports?

The change to use string literal for type casting is good practice. Since we're already importing TYPE_CHECKING, we could potentially move the SecretManager import under the TYPE_CHECKING block to avoid circular imports. Something like:

if TYPE_CHECKING:
    from collections.abc import Iterable
    from airbyte.secrets.base import SecretManager

This would allow us to use SecretManager directly in the cast. wdyt?

airbyte/_executors/declarative.py (1)

57-57: LGTM! Consider using TypeAlias for manifest type?

The change to use string literal for type casting is good. Since we're dealing with a specific manifest structure, would it make sense to define a type alias for better type safety? Something like:

from typing import TypeAlias

ManifestDict: TypeAlias = dict[str, Any]  # or more specific type hints

This could make the code more maintainable and self-documenting. wdyt?

airbyte/shared/state_providers.py (1)

55-55: LGTM! The additional null checks make the code more robust.

The added checks for state_msg and state_msg.stream help prevent potential null pointer exceptions. Would you consider adding a docstring comment explaining these safety checks for future maintainers? wdyt?

tests/unit_tests/test_anonymous_usage_stats.py (1)

58-67: LGTM! The assertions are now more readable and thorough.

The reformatted assertions with detailed error messages will make debugging failed tests easier. Would you consider adding a test case for partial/incomplete properties to ensure robustness? wdyt?

airbyte/_executors/base.py (1)

165-167: LGTM! Consider clarifying the comment.

The type casting change to string literal looks good. However, the comment "metadata is not None here" might be slightly misleading since metadata could be None but name would be set (due to the earlier validation). What do you think about updating it to be more precise? WDYT?

-        )  # metadata is not None here
+        )  # Either name is set or metadata is valid here
airbyte/destinations/_translate_dest_to_cache.py (2)

146-164: Consider simplifying the password handling logic.

The nested conditionals and multiple paths for password retrieval could be simplified. What do you think about consolidating the logic? WDYT?

-    snowflake_password: str | None = None
-    if (
-        destination_configuration.credentials
-        and hasattr(destination_configuration.credentials, "password")
-        and isinstance(destination_configuration.credentials.password, str)
-    ):
-        destination_password = str(destination_configuration.credentials.password)
-        if "****" in destination_password:
-            try:
-                snowflake_password = get_secret(password_secret_name)
-            except ValueError as ex:
-                raise PyAirbyteSecretNotFoundError(
-                    "Password is required for Snowflake cache, but it was not available."
-                ) from ex
-        else:
-            snowflake_password = get_secret(destination_password)
-    else:
-        snowflake_password = get_secret(password_secret_name)
+    try:
+        if (
+            destination_configuration.credentials
+            and hasattr(destination_configuration.credentials, "password")
+            and isinstance(destination_configuration.credentials.password, str)
+            and "****" not in destination_configuration.credentials.password
+        ):
+            secret_name = str(destination_configuration.credentials.password)
+        else:
+            secret_name = password_secret_name
+        
+        snowflake_password = get_secret(secret_name)
+    except ValueError as ex:
+        raise PyAirbyteSecretNotFoundError(
+            f"Failed to retrieve Snowflake password from secret '{secret_name}'"
+        ) from ex

121-122: Enhance error message for Postgres password.

The error message could be more descriptive. How about including more context? WDYT?

-        raise ValueError("Password is required for Postgres cache.")
+        raise ValueError(
+            "Password is required for Postgres cache. Please ensure it's provided in the destination configuration."
+        )
airbyte/_processors/sql/bigquery.py (1)

102-102: Consider enhancing the type conversion documentation?

The comment "BigQuery uses STRING for all strings" is accurate but could be more detailed. Maybe we could add information about BigQuery's type system and why we're using a string literal here, wdyt?

-        return cast("sqlalchemy.types.TypeEngine", "String")  # BigQuery uses STRING for all strings
+        return cast("sqlalchemy.types.TypeEngine", "String")  # BigQuery uses STRING type for all string/varchar variations, as it doesn't distinguish between different string lengths
airbyte/sources/registry.py (1)

24-26: Consider reducing the blank lines after logger initialization?

The logger initialization looks good, but maybe we could reduce the two blank lines to one to match PEP 8 style, wdyt?

 logger = logging.getLogger("airbyte")
-
-
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 90bf1a2 and 2ab2e68.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (39)
  • airbyte/__init__.py (1 hunks)
  • airbyte/_executors/base.py (2 hunks)
  • airbyte/_executors/declarative.py (1 hunks)
  • airbyte/_executors/util.py (1 hunks)
  • airbyte/_message_iterators.py (5 hunks)
  • airbyte/_processors/sql/bigquery.py (1 hunks)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/_util/name_normalizers.py (1 hunks)
  • airbyte/_util/telemetry.py (1 hunks)
  • airbyte/_writers/jsonl.py (1 hunks)
  • airbyte/caches/__init__.py (1 hunks)
  • airbyte/cloud/__init__.py (1 hunks)
  • airbyte/cloud/connections.py (8 hunks)
  • airbyte/cloud/workspaces.py (4 hunks)
  • airbyte/datasets/_base.py (2 hunks)
  • airbyte/datasets/_sql.py (4 hunks)
  • airbyte/destinations/__init__.py (1 hunks)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • airbyte/destinations/base.py (4 hunks)
  • airbyte/progress.py (1 hunks)
  • airbyte/secrets/__init__.py (1 hunks)
  • airbyte/secrets/base.py (3 hunks)
  • airbyte/secrets/util.py (1 hunks)
  • airbyte/shared/sql_processor.py (4 hunks)
  • airbyte/shared/state_providers.py (2 hunks)
  • airbyte/sources/__init__.py (1 hunks)
  • airbyte/sources/registry.py (3 hunks)
  • airbyte/types.py (1 hunks)
  • examples/run_bigquery_destination.py (1 hunks)
  • examples/run_bigquery_faker.py (1 hunks)
  • examples/run_perf_test_reads.py (1 hunks)
  • pyproject.toml (4 hunks)
  • tests/docs_tests/test_docs_checked_in.py (1 hunks)
  • tests/integration_tests/test_all_cache_types.py (4 hunks)
  • tests/integration_tests/test_bigquery_cache.py (1 hunks)
  • tests/integration_tests/test_source_test_fixture.py (28 hunks)
  • tests/unit_tests/test_anonymous_usage_stats.py (1 hunks)
  • tests/unit_tests/test_exceptions.py (1 hunks)
  • tests/unit_tests/test_progress.py (1 hunks)
✅ Files skipped from review due to trivial changes (10)
  • airbyte/cloud/init.py
  • airbyte/destinations/init.py
  • airbyte/progress.py
  • airbyte/_executors/util.py
  • tests/unit_tests/test_progress.py
  • examples/run_perf_test_reads.py
  • airbyte/caches/init.py
  • airbyte/_util/telemetry.py
  • airbyte/secrets/init.py
  • airbyte/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • airbyte/sources/init.py
  • airbyte/_message_iterators.py
  • airbyte/destinations/base.py
  • tests/integration_tests/test_all_cache_types.py
  • pyproject.toml
  • airbyte/cloud/connections.py
👮 Files not reviewed due to content moderation or server errors (3)
  • airbyte/cloud/workspaces.py
  • airbyte/_util/api_util.py
  • airbyte/shared/sql_processor.py
🧰 Additional context used
📓 Path-based instructions (2)
examples/run_bigquery_destination.py (1)

Pattern examples/*: These scripts are intended for demonstration purposes only. They are not meant to represent production code. For these scripts, you should prefer brevity over robust error handling and handling of edge cases. These are demos - which should be as simple as possible to handle the 'blessed' use cases.

examples/run_bigquery_faker.py (1)

Pattern examples/*: These scripts are intended for demonstration purposes only. They are not meant to represent production code. For these scripts, you should prefer brevity over robust error handling and handling of edge cases. These are demos - which should be as simple as possible to handle the 'blessed' use cases.

🔇 Additional comments (14)
tests/integration_tests/test_source_test_fixture.py (3)

21-24: LGTM! Clean import additions.

The new imports for text_util and CacheBase are well-organized and necessary for the changes.


419-419: Nice improvement using text_util for cache name generation!

Good call replacing ulid with text_util.generate_random_suffix(). This makes the code more consistent with the rest of the codebase and uses our internal utilities effectively.

Also applies to: 439-439, 496-496


77-78: Smart update to assert_data_matches_cache parameters!

Good improvement changing SqlProcessorBase to CacheBase and making streams optional. This makes the function more flexible and reusable across different cache implementations.

tests/docs_tests/test_docs_checked_in.py (1)

25-27: LGTM! Nice formatting improvement! 👍

The multi-line formatting makes the assertion message more readable while maintaining the same functionality.

airbyte/datasets/_base.py (2)

14-15: Nice optimization with TYPE_CHECKING! 👍

Moving these imports under TYPE_CHECKING is a great practice for reducing runtime overhead. The code looks good!


42-42: Let's verify the type casting change impact

The switch to string literal type casting ("Iterator[dict[str, Any]]") looks good and follows best practices for forward references. However, we should verify that this doesn't affect any downstream type checking. Would you like me to help check for any potential impacts?

✅ Verification successful

Type casting change looks safe and consistent with codebase patterns

The string literal type casting is perfectly aligned with how Iterator types are used throughout the codebase. I found that:

  • The same Iterator[dict[str, Any]] type is consistently used across multiple files (_base.py, _sql.py, _lazy.py)
  • The DataFrame construction pattern in _base.py is isolated and doesn't have any dependent type patterns elsewhere
  • No other DataFrame constructions in the codebase rely on this specific typing pattern
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any type-related imports or usage of Iterator in the codebase

# Search for Iterator type usage
rg "Iterator\[" --type python

# Search for pandas DataFrame construction with Iterator
rg "DataFrame\(" -A 2 --type python

Length of output: 131


Script:

#!/bin/bash
# Let's try with correct file type flags and search patterns

# Search for Iterator type usage
rg "Iterator\[" -t py

# Search for DataFrame construction patterns
rg "DataFrame\(" -t py -A 2

# Also check for typing imports
rg "^from typing import.*Iterator" -t py

Length of output: 3537

airbyte/shared/state_providers.py (1)

99-99: LGTM! The enhanced flexibility in handling missing states is valuable.

The addition of None as a valid option for not_found parameter makes sense. Should we verify that all existing callers of this method are compatible with this change?

✅ Verification successful

The previous ast-grep pattern might have been too specific. Let's try with ripgrep to get a broader view of the usage patterns and understand the context better.


The change is safe and all existing callers are compatible! 🎉

Looking at the codebase usage:

  • Most calls use the default behavior (no explicit not_found parameter)
  • One caller in test_state_handling.py already uses None
  • The JoinedStateProvider internally uses None when delegating to the secondary provider

The change is backward compatible since it only adds None as an option while preserving the existing "raise" default.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all calls to get_stream_state to verify compatibility
ast-grep --pattern 'get_stream_state($$$)'

Length of output: 44


Script:

#!/bin/bash
# Search for all calls to get_stream_state with ripgrep for better coverage
rg "get_stream_state\(" -A 2

# Also search for the class definition to understand the context
rg "class.*StateProvider" -A 5

Length of output: 3308

airbyte/types.py (1)

44-44: LGTM! The string literal type hint is more precise.

The change to use "str" instead of str in cast follows type annotation best practices. Should we apply this consistently to other similar casts in the codebase for uniformity? wdyt?

✅ Verification successful

The previous ast-grep pattern might have been too specific. Let's try a broader search to find all cast usages and then analyze the type hints being used. This will help us verify if there's inconsistency in type casting patterns across the codebase.


The codebase already follows a consistent pattern with string literal type hints!

I noticed that all string type casts in the codebase are already using the "str" string literal format, as seen in:

  • airbyte/_util/telemetry.py: cast("str", result)
  • airbyte/cloud/connections.py: cast("str", self._source_id)
  • airbyte/cloud/connections.py: cast("str", self._destination_id)

The change you made aligns perfectly with the existing codebase conventions. No further updates needed! 😊

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other cast calls that might need similar updates
ast-grep --pattern 'cast(str, $$$)'

Length of output: 37


Script:

#!/bin/bash
# Search for cast calls with str using ripgrep for better pattern matching
rg "cast\(str," -A 2

# Also search for string literal type hints in cast
rg 'cast\("str",' -A 2

# Try a more flexible ast-grep pattern that might catch both variants
ast-grep --pattern 'cast($_, $_)'

Length of output: 3546

airbyte/destinations/_translate_dest_to_cache.py (1)

44-44: Typo in function name biqquery_destination_to_cache

The function name contains a typo: "biqquery" should be "bigquery".

airbyte/datasets/_sql.py (1)

44-44: LGTM! Type hint changes look good.

The changes to use string literals for type casting and the reordering of union types in the parameter hints are consistent with best practices. These changes will help prevent circular imports and improve type checking performance.

Also applies to: 89-89, 101-101, 150-150

airbyte/secrets/base.py (2)

127-134: LGTM! Nice parameter naming improvement.

The parameter rename from _core_schema to core_schema_ aligns better with Pydantic's conventions, and the unused parameter is properly marked.


229-229: LGTM! Good use of string literal type annotation.

The change to use a string literal type annotation for SecretString is a good practice to avoid potential forward reference issues.

airbyte/sources/registry.py (2)

283-283: LGTM! Good type annotation update.

The change to ConnectorMetadata | None follows modern Python type annotation conventions.


321-321: LGTM! Helpful logging additions.

The new logging statements provide clear context about Docker detection and its impact on connector availability.

Also applies to: 325-325

tests/integration_tests/test_bigquery_cache.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)

15-25: Consider enhancing test assertions, wdyt?

The test looks good but could be more robust with additional assertions to verify:

  • The destination name contains the random suffix
  • The destination type matches the dummy destination type
airbyte/destinations/_translate_dest_to_cache.py (2)

63-65: Enhance error message with destination type, wdyt?

The error message could be more helpful by including the actual destination type that was attempted to be used.

-                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}."
+                f"Missing 'destinationType' in configuration. Available keys: {list(destination_configuration.keys())}. "
+                f"Supported types: {list(conversion_fn_map.keys())}"

146-164: Consider simplifying password handling logic

The password retrieval logic has multiple branches and nested conditions. Consider extracting this into a separate helper function for better maintainability and testability.

+def _get_snowflake_password(
+    credentials: Any,
+    password_secret_name: str
+) -> str:
+    """Extract Snowflake password from credentials or secrets."""
+    if (
+        credentials
+        and hasattr(credentials, "password")
+        and isinstance(credentials.password, str)
+    ):
+        password = str(credentials.password)
+        return get_secret(password_secret_name if "****" in password else password)
+    return get_secret(password_secret_name)
+
 def snowflake_destination_to_cache(
     destination_configuration: DestinationSnowflake | dict[str, Any],
     password_secret_name: str = SNOWFLAKE_PASSWORD_SECRET_NAME,
 ) -> SnowflakeCache:
     if isinstance(destination_configuration, dict):
         destination_configuration = DestinationSnowflake(**destination_configuration)
-
-    snowflake_password: str | None = None
-    if (
-        destination_configuration.credentials
-        and hasattr(destination_configuration.credentials, "password")
-        and isinstance(destination_configuration.credentials.password, str)
-    ):
-        destination_password = str(destination_configuration.credentials.password)
-        if "****" in destination_password:
-            try:
-                snowflake_password = get_secret(password_secret_name)
-            except ValueError as ex:
-                raise PyAirbyteSecretNotFoundError(
-                    "Password is required for Snowflake cache, but it was not available."
-                ) from ex
-        else:
-            snowflake_password = get_secret(destination_password)
-    else:
-        snowflake_password = get_secret(password_secret_name)
+    try:
+        snowflake_password = _get_snowflake_password(
+            destination_configuration.credentials,
+            password_secret_name
+        )
+    except ValueError as ex:
+        raise PyAirbyteSecretNotFoundError(
+            "Password is required for Snowflake cache, but it was not available."
+        ) from ex
tests/integration_tests/cloud/test_cloud_api_util.py (2)

44-46: Enhance workspace list assertions, wdyt?

Consider adding more specific assertions to verify:

  • The provided workspace_id exists in the returned list
  • The workspace properties (name, status) are as expected
     assert result
     assert len(result) > 0
     assert all(isinstance(workspace, WorkspaceResponse) for workspace in result)
+    assert any(w.workspace_id == workspace_id for w in result)

Line range hint 152-228: Consider using a context manager for resource cleanup

The test creates multiple resources that need cleanup. To ensure cleanup happens even if the test fails, consider using a context manager or fixture.

Example approach:

@contextmanager
def temporary_connection_resources(
    workspace_id: str,
    api_root: str,
    client_id: SecretString,
    client_secret: SecretString,
    motherduck_api_key: str,
) -> Generator[tuple[SourceResponse, DestinationResponse, ConnectionResponse], None, None]:
    """Create and cleanup source, destination and connection for testing."""
    source = destination = connection = None
    try:
        # Create resources...
        yield source, destination, connection
    finally:
        # Cleanup resources in reverse order...
        for resource in filter(None, [connection, source, destination]):
            try:
                # Delete resource...
                pass
            except Exception:
                logger.exception(f"Failed to cleanup {resource}")
airbyte/_util/api_util.py (2)

558-568: Consider enhancing destination type handling.

The current mapping approach works but could be more maintainable. Would it make sense to:

  1. Move the mapping to a module-level constant?
  2. Consider using a factory pattern or registry for better extensibility?

Wdyt?


175-181: Consider reducing error handling duplication.

The error handling pattern is repeated across list functions. Would it make sense to extract this into a reusable helper function? Something like:

def handle_list_response(response: Any, resource_type: str, workspace_id: str) -> None:
    if not status_ok(response.status_code):
        raise AirbyteError(
            context={
                "workspace_id": workspace_id,
                "response": response,
            }
        )

Wdyt?

Also applies to: 215-221, 253-259

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8a1092e and 452bc0e.

📒 Files selected for processing (8)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/caches/base.py (4 hunks)
  • airbyte/cloud/connections.py (8 hunks)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (7 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (5 hunks)
  • tests/integration_tests/cloud/test_cloud_workspaces.py (2 hunks)
  • tests/integration_tests/test_bigquery_cache.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration_tests/test_bigquery_cache.py
  • airbyte/cloud/connections.py
  • tests/integration_tests/cloud/test_cloud_sql_reads.py
🔇 Additional comments (6)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)

80-84: ⚠️ Potential issue

Fix incorrect assertion type

The assertion is checking for SourceResponse instead of DestinationResponse. This could mask issues with the returned data type.

-        and all(isinstance(source, SourceResponse) for source in result)
+        and all(isinstance(destination, DestinationResponse) for destination in result)

Likely invalid or redundant comment.

airbyte/caches/base.py (4)

65-74: LGTM! Clean architectural change.

The shift from deployment-specific attributes to a destination pairing mechanism improves the design by better separating concerns and making the relationship between caches and destinations more explicit.


240-244: LGTM! Nice Pythonic implementation.

The __len__ method provides a clean way to get the number of streams using Python's built-in len() function.


245-252: LGTM! Good defensive programming.

Making caches always truthy prevents subtle bugs where empty caches might be incorrectly treated as False in boolean contexts.


296-298: Verify the impact of return type change.

The return type change from DatasetBase to CachedDataset is more specific, but could this be a breaking change for existing clients? Wdyt?

✅ Verification successful

Let me gather more information about the inheritance hierarchy and usage of CachedDataset.


The return type change is safe to proceed with

From the codebase analysis, I can see that CachedDataset inherits from SQLDataset, which in turn inherits from DatasetBase. All client code already expects and checks for CachedDataset specifically, as evidenced by the type hints and isinstance checks in the test files and actual usage. The change makes the return type more precise without breaking the Liskov Substitution Principle.

Key observations:

  • CachedDataset is the expected concrete type in ReadResult, SyncResult, and test assertions
  • No code was found that relies on the base DatasetBase type specifically
  • The inheritance chain (CachedDataset -> SQLDataset -> DatasetBase) preserves type compatibility
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of DatasetBase type hints or isinstance checks
rg -l "DatasetBase" | xargs -I {} rg -A 2 -B 2 "DatasetBase" {}

Length of output: 1181


Script:

#!/bin/bash
# Check if CachedDataset inherits from DatasetBase and its usage
ast-grep --pattern 'class CachedDataset($_) {
  $$$
}'

# Search for CachedDataset usage
rg -A 2 -B 2 "CachedDataset"

Length of output: 14588

airbyte/_util/api_util.py (1)

52-63: LGTM! Security improvement with client credentials.

The switch from API key to client credentials authentication is a good security practice, providing better access control and token management.

@aaronsteers aaronsteers marked this pull request as ready for review December 10, 2024 01:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
airbyte/cloud/connectors.py (1)

35-36: Update the docstring to reflect both sources and destinations

Currently, the docstring for connector_url mentions "source connector," but since this class represents both sources and destinations, would it make sense to change it to "Get the URL of the connector" for clarity? Wdyt?

airbyte/cloud/workspaces.py (1)

36-43: Consider changing type annotations of client_id and client_secret to str

In the CloudWorkspace class, client_id and client_secret are annotated as SecretString, and then they're wrapped again with SecretString in the __post_init__ method. To avoid wrapping an already wrapped object, should we change the type annotations to str? This way, the constructor accepts strings and wraps them appropriately. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 452bc0e and 349df46.

📒 Files selected for processing (4)
  • airbyte/cloud/connectors.py (1 hunks)
  • airbyte/cloud/workspaces.py (4 hunks)
  • pyproject.toml (4 hunks)
  • tests/integration_tests/cloud/test_cloud_sql_reads.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

@aaronsteers aaronsteers changed the title Feat: Add previously experimental Cloud "deploy" functionality (DRAFT - DO NOT MERGE) Feat: Add Cloud deploy functionality out of experimental status Dec 10, 2024
@aaronsteers aaronsteers enabled auto-merge (squash) December 10, 2024 01:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (10)
tests/integration_tests/cloud/test_cloud_api_util.py (3)

17-47: Consider enhancing workspace test assertions, wdyt?

The new workspace tests look good, but we could make them more robust. Maybe add assertions for specific workspace properties beyond just the ID? For example:

  • Workspace name
  • Workspace status
  • Organization ID

This would help catch any unexpected changes in the workspace structure.


Line range hint 153-227: Should we improve test isolation for the connection test?

The connection test is doing a lot of setup and teardown. What do you think about:

  1. Moving the source/destination creation to fixtures?
  2. Using pytest.fixture(autouse=True) for cleanup?

This would make the test more focused and resilient to failures during setup/teardown.

Example:

@pytest.fixture
def test_source(workspace_id, airbyte_cloud_api_root, airbyte_cloud_client_id, airbyte_cloud_client_secret):
    source = api_util.create_source(...)
    yield source
    api_util.delete_source(source.source_id, ...)

@pytest.fixture
def test_destination(workspace_id, airbyte_cloud_api_root, airbyte_cloud_client_id, airbyte_cloud_client_secret, motherduck_api_key):
    destination = api_util.create_destination(...)
    yield destination
    api_util.delete_destination(destination.destination_id, ...)

Line range hint 88-227: Should we add error handling for cleanup operations?

The delete operations in teardown could fail silently. Consider wrapping them in try-except blocks to:

  1. Log cleanup failures
  2. Continue with remaining cleanup even if one fails
  3. Maybe retry failed cleanups

This would help prevent orphaned test resources, wdyt?

Example:

def safe_delete_source(source_id, **kwargs):
    try:
        api_util.delete_source(source_id, **kwargs)
    except Exception as e:
        logger.error(f"Failed to delete source {source_id}: {e}")
        # Optionally implement retry logic here
airbyte/destinations/_translate_dest_to_cache.py (6)

2-2: Consider enhancing the module docstring?

The current docstring "Cloud destinations for Airbyte" could be more descriptive. Maybe something like "Translation utilities for converting Airbyte destination configurations to cache implementations"? WDYT?


52-62: Consider simplifying the destination type extraction?

The nested attribute checks could be simplified using a helper function. Something like this maybe? WDYT?

-            destination_type = (
-                destination_configuration.get("DESTINATION_TYPE")
-                or destination_configuration["destinationType"]
-            )
-            if hasattr(destination_type, "value"):
-                destination_type = destination_type.value
-            elif hasattr(destination_type, "_value_"):
-                destination_type = destination_type._value_
-            else:
-                destination_type = str(destination_type)
+            def get_value(obj):
+                if hasattr(obj, "value"): return obj.value
+                if hasattr(obj, "_value_"): return obj._value_
+                return str(obj)
+            
+            destination_type = get_value(
+                destination_configuration.get("DESTINATION_TYPE")
+                or destination_configuration["destinationType"]
+            )

63-65: Enhance the error message with available types?

The error message could be more helpful by including the available destination type keys. Maybe something like this? WDYT?

-                f"Missing 'destinationType' in keys {list(destination_configuration.keys())}."
+                f"Missing destination type. Expected 'DESTINATION_TYPE' or 'destinationType' in configuration. "
+                f"Available keys: {list(destination_configuration.keys())}."

153-171: Consider simplifying Snowflake password handling logic?

The nested conditions for password handling could be simplified using early returns. Something like this maybe? WDYT?

-    snowflake_password: str | None = None
-    if (
-        destination_configuration.credentials
-        and hasattr(destination_configuration.credentials, "password")
-        and isinstance(destination_configuration.credentials.password, str)
-    ):
-        destination_password = str(destination_configuration.credentials.password)
-        if "****" in destination_password:
-            try:
-                snowflake_password = get_secret(password_secret_name)
-            except ValueError as ex:
-                raise PyAirbyteSecretNotFoundError(
-                    "Password is required for Snowflake cache, but it was not available."
-                ) from ex
-        else:
-            snowflake_password = get_secret(destination_password)
-    else:
-        snowflake_password = get_secret(password_secret_name)
+    if not (destination_configuration.credentials
+            and hasattr(destination_configuration.credentials, "password")
+            and isinstance(destination_configuration.credentials.password, str)):
+        return get_secret(password_secret_name)
+
+    destination_password = str(destination_configuration.credentials.password)
+    if "****" in destination_password:
+        try:
+            return get_secret(password_secret_name)
+        except ValueError as ex:
+            raise PyAirbyteSecretNotFoundError(
+                "Password is required for Snowflake cache, but it was not available."
+            ) from ex
+
+    return get_secret(destination_password)

127-127: Consider adding port range validation?

Maybe we could add validation to ensure the port number is within a valid range (0-65535)? WDYT?

-    port: int = int(destination_configuration.port) if destination_configuration.port else 5432
+    port: int = int(destination_configuration.port) if destination_configuration.port else 5432
+    if not (0 <= port <= 65535):
+        raise ValueError(f"Invalid port number: {port}. Must be between 0 and 65535.")

173-173: Consider validating Snowflake account format?

The account extraction assumes a specific format with ".snowflakecomputing". Should we add validation to ensure the host format is correct? WDYT?

+    if ".snowflakecomputing" not in destination_configuration.host:
+        raise ValueError(
+            f"Invalid Snowflake host format: {destination_configuration.host}. "
+            "Expected format: <account>.snowflakecomputing.com"
+        )
airbyte/destinations/_translate_cache_to_dest.py (1)

78-79: Update the docstring to reflect the correct cache type

The docstring mentions "DuckDB cache," but this function is for the "MotherDuck cache." Should we update it to accurately describe the function? WDYT?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 349df46 and 1bdcab1.

📒 Files selected for processing (3)
  • airbyte/destinations/_translate_cache_to_dest.py (1 hunks)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (7 hunks)
🔇 Additional comments (3)
tests/integration_tests/cloud/test_cloud_api_util.py (1)

68-84: Great fix on the destinations test! 🎉

The test is now correctly using list_destinations() instead of list_sources(). The assertions are also properly typed with DestinationResponse. Nice work on maintaining consistency with the other list operation tests!

airbyte/destinations/_translate_dest_to_cache.py (1)

1-180: Overall implementation looks solid!

The code is well-structured with good error handling and type hints. The suggested improvements are mostly optional refinements to make the code even more robust and maintainable.

airbyte/destinations/_translate_cache_to_dest.py (1)

56-60: Consider combining the exception messages into a single string

Currently, the ValueError is raised with multiple strings, which may cause the error message to be displayed as a tuple. How about merging them into a single string to improve readability? WDYT?

airbyte/destinations/_translate_cache_to_dest.py Outdated Show resolved Hide resolved
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Dec 10, 2024

/fix-pr

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.
(This job requires that the PR author has "Allow edits from maintainers" enabled.)

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
airbyte/destinations/_translate_cache_to_dest.py (2)

Line range hint 170-170: Consider making account extraction more robust.

The current account extraction assumes a specific URL format. Would it be helpful to add validation or error handling for cases where the URL format differs? WDYT?

-        account=destination_configuration.host.split(".snowflakecomputing")[0],
+        account=destination_configuration.host.split(".snowflakecomputing.com")[0] if ".snowflakecomputing.com" in destination_configuration.host else destination_configuration.host,

128-128: Consider making dataset location configurable.

The dataset location is currently hardcoded to "US". Should we make this configurable through the destination configuration? WDYT?

airbyte/destinations/_translate_dest_to_cache.py (1)

48-59: Consider simplifying destination type extraction.

The current implementation has multiple checks for different attribute types. Could we simplify this using a helper function? WDYT?

+def _extract_destination_type(destination_type: Any) -> str:
+    """Extract string value from destination type."""
+    if hasattr(destination_type, "value"):
+        return destination_type.value
+    if hasattr(destination_type, "_value_"):
+        return destination_type._value_
+    return str(destination_type)
+
 if isinstance(destination_configuration, dict):
     try:
         destination_type = (
             destination_configuration.get("DESTINATION_TYPE")
             or destination_configuration["destinationType"]
         )
-        if hasattr(destination_type, "value"):
-            destination_type = destination_type.value
-        elif hasattr(destination_type, "_value_"):
-            destination_type = destination_type._value_
-        else:
-            destination_type = str(destination_type)
+        destination_type = _extract_destination_type(destination_type)
airbyte/_util/api_util.py (1)

552-570: Consider improving destination type mapping.

The current implementation uses a hardcoded dictionary for destination type mapping. Would it be more maintainable to move this to a configuration or constant? WDYT?

+DESTINATION_TYPE_MAPPING = {
+    "snowflake": models.DestinationSnowflake,
+    "bigquery": models.DestinationBigquery,
+    "postgres": models.DestinationPostgres,
+    "duckdb": models.DestinationDuckdb,
+}
+
 if status_ok(response.status_code) and response.destination_response:
     raw_response: dict[str, Any] = json.loads(response.raw_response.text)
     raw_configuration: dict[str, Any] = raw_response["configuration"]
     destination_type = raw_response.get("destinationType")
-    destination_mapping = {
-        "snowflake": models.DestinationSnowflake,
-        "bigquery": models.DestinationBigquery,
-        "postgres": models.DestinationPostgres,
-        "duckdb": models.DestinationDuckdb,
-    }
-    if destination_type in destination_mapping:
-        response.destination_response.configuration = destination_mapping[destination_type](
+    if destination_type in DESTINATION_TYPE_MAPPING:
+        response.destination_response.configuration = DESTINATION_TYPE_MAPPING[destination_type](
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec5398b and 93fb13c.

📒 Files selected for processing (3)
  • airbyte/_util/api_util.py (18 hunks)
  • airbyte/destinations/_translate_cache_to_dest.py (1 hunks)
  • airbyte/destinations/_translate_dest_to_cache.py (1 hunks)
🔇 Additional comments (5)
airbyte/destinations/_translate_cache_to_dest.py (4)

36-61: LGTM! Clean and maintainable implementation.

The mapping approach makes it easy to add new cache types, and the error message is informative.


64-71: LGTM! Simple and correct implementation.

The function correctly maps the required fields from DuckDB cache to destination configuration.


74-82: LGTM! Secure handling of sensitive data.

Good use of SecretString for the API key and consistent schema handling with DuckDB.


Line range hint 120-135: LGTM! Thorough implementation with good defaults.

The function includes appropriate defaults for port and schema, with proper password validation.

airbyte/_util/api_util.py (1)

54-65: LGTM! Secure implementation of client credentials authentication.

Good transition from API key to client credentials-based authentication.

@aaronsteers aaronsteers merged commit 5e472fc into main Dec 10, 2024
18 checks passed
@aaronsteers aaronsteers deleted the aj/feat/cloud-deploy-features-from-experimental branch December 10, 2024 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant