-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add Cloud deploy functionality out of experimental status #419
Feat: Add Cloud deploy functionality out of experimental status #419
Conversation
📝 WalkthroughWalkthroughThe pull request introduces substantial changes to the Airbyte codebase, focusing on enhancing API interactions and transitioning from API key-based to client credentials-based authentication. Key updates include the addition of new utility functions for listing resources, modifications to existing functions to accommodate the new authentication model, and improvements in error handling and logic across various classes. New utility functions for text manipulation are also introduced, while several test files are updated to align with these changes, ensuring consistency and clarity in the testing framework. Changes
Suggested labels
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (11)
airbyte/_util/text_util.py (3)
9-11
: Thegenerate_ulid()
function looks good, but how about enhancing the docstring?The function is well-implemented, but the docstring could be more informative. What do you think about expanding it to explain what a ULID is and its characteristics? Something like this, perhaps?
def generate_ulid() -> str: """Generate a new ULID (Universally Unique Lexicographically Sortable Identifier). A ULID is a 128-bit identifier that combines a timestamp with randomly generated bits, resulting in a sortable, unique identifier that's more compact than a UUID. Returns: str: A 26-character string representation of the ULID. """ return str(ulid.ULID())This would provide more context for users of the function. WDYT?
14-22
: Great implementation ofgenerate_random_suffix()
! How about a tiny optimization?The function looks solid, and I love the detailed docstring explaining its behavior and limitations. The use of ULID for a sortable suffix is clever!
Just a small suggestion: we could simplify the implementation slightly by using string slicing in a single line. What do you think about this?
def generate_random_suffix() -> str: """Generate a random suffix for use in temporary names. By default, this function generates a ULID and returns a 9-character string which will be monotonically sortable. It is not guaranteed to be unique but is sufficient for small-scale and medium-scale use cases. """ return generate_ulid()[:6] + generate_ulid()[-3:]This achieves the same result but in a slightly more concise way. WDYT?
1-22
: The overall structure of the file looks great! Any plans for more text utilities?I really like how clean and focused this file is. The two functions are well-organized and clearly related. Great job on keeping it simple and to the point!
Just curious, do you have any plans to add more text utility functions in the future? This file seems like a great place for them. Maybe something for text sanitization, truncation, or other common text operations you might need across the project? No pressure, just thinking ahead!
Keep up the awesome work! 😊
airbyte/exceptions.py (2)
489-494
: LGTM! New exception class looks good.The new
AirbyteDuplicateResourcesError
class is well-structured and follows the established pattern. It provides a clear way to handle duplicate resource scenarios.Quick thought: Would it be helpful to add a default value for the
guidance
attribute, similar to other exception classes in this file? Something like "Please choose a unique name for the resource." wdyt?
497-497
: Hmm, duplicate "Custom Warnings" sections?I noticed we now have two "Custom Warnings" sections in the file. The new one here and another at the end of the file. Should we consider consolidating these for better organization? Maybe move
AirbyteMultipleResourcesError
to the existing section at the end? What do you think?airbyte/sources/base.py (1)
54-54
: Simplifiedconnector_type
attribute, but lost type hinting. Thoughts on keeping both?The
connector_type
attribute has been simplified from a type-annotated literal to a simple string assignment. While this makes the code cleaner, we've lost the type hinting.What do you think about keeping the type hint for better IDE support and static type checking? We could do something like:
connector_type: Literal["source"] = "source"This way, we maintain the simplicity while preserving the type information. WDYT?
airbyte/cloud/connectors.py (1)
39-39
: Consider updating docstring to generalize connector type—wdyt?Since the
connector_url
method applies to both sources and destinations, would it make sense to update the docstring to "Get the URL of the connector"? Let me know your thoughts.airbyte/cloud/connections.py (1)
209-219
: Enhance the method docstring to reflect cascading deletions.Since
cascade_delete_source
andcascade_delete_destination
allow deleting the source and destination, should we update the method description to reflect this additional functionality? Wdyt?airbyte/_util/api_util.py (3)
104-105
: Suggestion: Clarify Error MessageThe error message could be more precise. Maybe rephrase it to "You can provide either 'name' or 'name_filter', but not both." to enhance clarity. Wdyt?
143-143
: Update Docstring inlist_sources
The docstring currently says "Get a connection." Should we update it to "List sources." to reflect the function's purpose? Wdyt?
179-179
: Update Docstring inlist_destinations
The docstring states "Get a connection." Consider updating it to "List destinations." to accurately describe the function. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- airbyte/_util/api_util.py (17 hunks)
- airbyte/_util/text_util.py (1 hunks)
- airbyte/caches/base.py (0 hunks)
- airbyte/cloud/connections.py (5 hunks)
- airbyte/cloud/connectors.py (1 hunks)
- airbyte/cloud/experimental.py (0 hunks)
- airbyte/cloud/workspaces.py (6 hunks)
- airbyte/exceptions.py (1 hunks)
- airbyte/sources/base.py (1 hunks)
💤 Files with no reviewable changes (2)
- airbyte/caches/base.py
- airbyte/cloud/experimental.py
🧰 Additional context used
🔇 Additional comments (16)
airbyte/_util/text_util.py (1)
1-7
: LGTM! The file header and imports look good.The copyright notice, module docstring, and imports are all appropriate for the functionality provided. Nice job using
from __future__ import annotations
for better type hinting support! 👍airbyte/exceptions.py (1)
489-489
: Thumbs up for the indentation fix!Nice catch on aligning the
resource_name_or_id
attribute. It's these little details that keep our codebase tidy and consistent. 👍airbyte/sources/base.py (2)
54-54
: Overall, good cleanup! Any other simplifications in mind?These changes look good overall! You've simplified the
connector_type
attribute and removed some unused private attributes. This kind of cleanup helps keep the codebase lean and maintainable.Do you have any other simplifications or cleanups in mind for this class? It might be worth taking a look at other attributes or methods that could benefit from similar treatment.
54-54
: Removed private attributes. Can we double-check their usage?I noticed that three private attributes (
_deployed_api_root
,_deployed_workspace_id
, and_deployed_source_id
) have been removed. This is great for code cleanliness if they were unused!Just to be on the safe side, could we run a quick search through the codebase to ensure these weren't used anywhere else? Maybe something like:
This would help us catch any lingering references. What do you think?
airbyte/cloud/connectors.py (1)
40-40
: Verify URL construction for correctness—wdyt?Just to double-check, the
connector_url
method constructs the URL by appending{self.connector_type}s
, resulting in paths like.../sources/{connector_id}
or.../destinations/{connector_id}
. Does this match the expected URL structure in Airbyte Cloud? Let me know if this aligns with the API endpoints.airbyte/cloud/connections.py (5)
9-9
: ImportingCloudDestination
andCloudSource
looks good.This addition enables the usage of these classes later in the code.
105-105
: Good handling ofNone
forstreams
.Using
or []
ensures that we don't encounter aTypeError
whenself._connection_info.configurations.streams
isNone
. Nice work!
113-113
: Properly handlingNone
forprefix
.Returning an empty string when
self._connection_info.prefix
isNone
prevents potential issues when using this value. Looks good!
174-176
: Adding_latest_job_info
toSyncResult
instantiation makes sense.This provides
SyncResult
with access to the latest job information, which can be useful for further processing. Good addition!
223-227
: Confirming safe deletion when cascading is enabled.When
cascade_delete_source
orcascade_delete_destination
areTrue
, the source or destination will be permanently deleted. Should we add safeguards or confirmations to prevent accidental deletions of these resources? Wdyt?airbyte/_util/api_util.py (6)
17-17
: Approved: Enhancement of Type HintsThe addition of
TYPE_CHECKING
andAny
from thetyping
module improves type hinting and code clarity.
27-27
: Approved: ImportingPyAirbyteInputError
Including
PyAirbyteInputError
in the exceptions enhances error handling for input validation.
31-33
: Approved: Conditional Import ofCallable
Using
if TYPE_CHECKING
to conditionally importCallable
optimizes runtime performance by avoiding unnecessary imports.
235-239
: Approved: Improved Exception Handling inget_connection
The updated exception handling provides more informative error messages, enhancing debugging and user experience.
563-569
: Approved: Refactoring Stream ConfigurationsWrapping
stream_configurations
inmodels.StreamConfigurations
ensures compatibility with the API requirements.
390-397
: Approved: Enhanced Error Handling inget_source
The updated error handling in
get_source
correctly raises anAirbyteMissingResourceError
when a source is not found, improving reliability.
…features-from-experimental
…features-from-experimental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
airbyte/destinations/_util.py
(3 hunks)airbyte/destinations/base.py
(1 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(3 hunks)tests/integration_tests/cloud/test_cloud_sync.py
(1 hunks)tests/integration_tests/cloud/test_cloud_workspaces.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_util.py
41-41: Unused function argument: cache_class
(ARG001)
43-43: Missing issue link on the line following this TODO
(TD003)
48-48: Unused function argument: destination_class
(ARG001)
50-50: Missing issue link on the line following this TODO
(TD003)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
tests/integration_tests/cloud/test_cloud_workspaces.py
67-68: SyntaxError: Expected an identifier
🔇 Additional comments (4)
airbyte/destinations/_util.py (3)
81-81
: LGTM! 👍
Good catch on ensuring type consistency by converting db_path
to string.
171-177
: Great security practices! 🔒
Nice work on:
- Validating the required API key
- Using SecretString for sensitive data handling
🧰 Tools
🪛 Ruff (0.8.0)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
185-193
: Solid improvements to configuration handling! 👌
Nice work on:
- Proper port defaulting logic
- Required password validation
- Default schema handling
🧰 Tools
🪛 Ruff (0.8.0)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)
69-73
: Verify deploy_connection parameters
The connection deployment looks good, but should we also specify the table_prefix
parameter since it's being asserted later? wdyt?
connection: CloudConnection = cloud_workspace.deploy_connection(
connection_name="test-connection",
source=source,
cache=cache,
+ table_prefix="abc_deleteme_"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
tests/integration_tests/cloud/test_cloud_workspaces.py (2)
Line range hint
36-36
: Consider enhancing the skip message with more context.The current skip message doesn't provide enough information about the flakiness. Could we add more details about the specific issues and potential fixes? This would help other developers understand and eventually fix the problem, wdyt?
🧰 Tools
🪛 Ruff (0.8.0)
14-14:
airbyte.destinations.base.Destination
imported but unusedRemove unused import:
airbyte.destinations.base.Destination
(F401)
64-68
: Consider adding deployment success assertions.The test jumps directly to checking stream names without verifying if the connection was deployed successfully. Should we add some basic assertions first? For example:
assert connection.id is not None assert connection.status == "active" # or whatever the expected status isairbyte/caches/postgres.py (1)
35-35
: LGTM! Consider adding docstring for the new attribute?The
_paired_destination_name
attribute looks good and follows the consistent pattern across cache implementations. Would you consider adding a brief docstring to explain its purpose in the deployment context? wdyt?airbyte/caches/snowflake.py (1)
37-37
: LGTM! Consider grouping class attributes?The implementation looks good! Would you like to consider grouping all class attributes together at the top of the class (moving
_paired_destination_name
next todedupe_mode
)? This could improve readability, but I'm fine either way - wdyt?airbyte/caches/duckdb.py (1)
Line range hint
35-41
: Consider documenting the architectural decision?The addition of
_paired_destination_name
across all cache implementations suggests an important architectural pattern linking caches with specific destinations. Since this is part of the Cloud deployment functionality, would it be helpful to document this design decision in an ADR (Architecture Decision Record) or the deployment documentation? This could help future contributors understand the relationship between caches and destinations in the cloud context. wdyt?airbyte/destinations/_util.py (1)
185-193
: Consider adding type validation for port? 🤔While the port conversion looks good, would you consider adding a range validation to ensure it's a valid port number (0-65535)? Here's a suggestion:
- port: int = int(destination_configuration.port) if destination_configuration.port else 5432 + if destination_configuration.port: + try: + port = int(destination_configuration.port) + if not 0 <= port <= 65535: + raise ValueError(f"Invalid port number: {port}") + except ValueError as e: + raise ValueError(f"Invalid port format: {e}") + else: + port = 5432🧰 Tools
🪛 Ruff (0.8.0)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
airbyte/caches/base.py
(1 hunks)airbyte/caches/bigquery.py
(1 hunks)airbyte/caches/duckdb.py
(1 hunks)airbyte/caches/motherduck.py
(1 hunks)airbyte/caches/postgres.py
(1 hunks)airbyte/caches/snowflake.py
(1 hunks)airbyte/destinations/_util.py
(3 hunks)airbyte/destinations/base.py
(2 hunks)tests/integration_tests/cloud/test_cloud_workspaces.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/caches/base.py
- airbyte/destinations/base.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/test_cloud_workspaces.py
14-14: airbyte.destinations.base.Destination
imported but unused
Remove unused import: airbyte.destinations.base.Destination
(F401)
47-47: Undefined name DestinationMotherDuck
(F821)
airbyte/destinations/_util.py
41-41: Unused function argument: cache_type
(ARG001)
43-43: Missing issue link on the line following this TODO
(TD003)
48-48: Unused function argument: destination_type
(ARG001)
50-50: Missing issue link on the line following this TODO
(TD003)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (7)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)
27-32
: LGTM! Clean and well-structured test.
The test follows good practices with proper type hints and cleanup. The transition to public methods improves the test's clarity.
airbyte/caches/duckdb.py (1)
41-41
: LGTM! Shall we verify the deployment integration?
The implementation looks consistent with other cache classes. Since this is part of adding Cloud deployment functionality (per PR title), should we verify how these paired destinations are used in the deployment flow?
✅ Verification successful
Let me gather more information about the deployment integration by checking the implementation of these test files and the cache usage.
The deployment integration looks solid! 🚀
The _paired_destination_name
attribute is correctly integrated into the deployment flow, as evidenced by:
- All cache classes consistently implement this attribute
- The deployment tests in
test_cloud_workspaces.py
andtest_cloud_sync.py
verify the cache deployment functionality - The cache deployment is tested alongside source deployment in the connection flow
The implementation aligns perfectly with other cache types (Snowflake, BigQuery, Postgres, etc.) and is properly tested in the cloud integration test suite. Would you like to proceed with merging? 😊
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of _paired_destination_name in deployment-related code
echo "Searching for deployment-related usage..."
rg -l "deploy" | xargs rg "_paired_destination_name"
# Look for any deployment configuration or tests
echo "Checking for deployment tests..."
fd -e py | xargs rg "test.*deploy"
Length of output: 684
Script:
#!/bin/bash
# Check the implementation of deployment tests
echo "Checking deployment test implementations..."
rg -A 10 "test_deploy.*connection" tests/integration_tests/cloud/
# Look for cache usage in deployment tests
echo "Checking cache usage in deployment tests..."
rg "Cache|cache" tests/integration_tests/cloud/
# Check for paired destination usage
echo "Checking paired destination usage..."
rg "_paired_destination_name" airbyte/
Length of output: 6631
airbyte/caches/bigquery.py (1)
35-35
: LGTM! The paired destination name is correctly mapped.
The addition of _paired_destination_name
aligns with the standardization across cache implementations.
airbyte/caches/motherduck.py (1)
37-37
: LGTM! The paired destination name is correctly mapped.
The addition of _paired_destination_name
aligns with the standardization across cache implementations.
airbyte/destinations/_util.py (3)
81-81
: LGTM! Explicit string conversion for destination path.
The explicit conversion of db_path
to string ensures type safety.
171-177
: LGTM! Good validation and secure handling of API key.
The changes properly validate the MotherDuck API key and wrap it in SecretString for enhanced security.
🧰 Tools
🪛 Ruff (0.8.0)
172-172: Avoid specifying long messages outside the exception class
(TRY003)
40-52
:
Consider implementing proper type mapping system? 🤔
The TODO implementations with hardcoded returns don't handle different cache/destination class types. Since we now have _paired_destination_name
attributes in cache classes, we could leverage those for the mapping, wdyt?
Here's a suggestion that uses the new attributes:
def cache_type_to_destination_type(
cache_type: type[CacheBase],
) -> type[Destination]:
- # TODO: Fixme
- return DestinationSnowflake
+ destination_name = getattr(cache_type, '_paired_destination_name', None)
+ if not destination_name:
+ raise ValueError(f"Cache type {cache_type.__name__} doesn't specify a paired destination")
+
+ mapping = {
+ 'destination-snowflake': DestinationSnowflake,
+ 'destination-duckdb': DestinationDuckdb,
+ 'destination-postgres': DestinationPostgres,
+ 'destination-bigquery': DestinationBigquery,
+ }
+ if destination_name not in mapping:
+ raise ValueError(f"Unsupported destination name: {destination_name}")
+ return mapping[destination_name]
🧰 Tools
🪛 Ruff (0.8.0)
41-41: Unused function argument: cache_type
(ARG001)
43-43: Missing issue link on the line following this TODO
(TD003)
48-48: Unused function argument: destination_type
(ARG001)
50-50: Missing issue link on the line following this TODO
(TD003)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (10)
airbyte/cloud/sync_results.py (2)
173-173
: Question about usingasdict()
for destination configurationIs using
asdict()
ondestination_response.configuration
necessary here? Ifdestination_response.configuration
is already a dictionary, we might not need to convert it. Could there be cases where this conversion might cause issues? Wdyt?
199-199
: HandlingNone
values in metricsAdding
or 0
tobytes_synced
andrecords_synced
ensures that we return0
when the values areNone
. This helps prevent errors, but should we also consider logging a warning if these values are unexpectedlyNone
to aid in debugging? Wdyt?Also applies to: 204-204
airbyte/cloud/workspaces.py (2)
11-11
: Remove unused imports as indicated by static analysisStatic analysis tools suggest that the following imports are unused and can be removed:
http.client
at line 11pydoc.cli
at line 12pydantic.Secret
at line 16Cleaning up unused imports helps keep the codebase clean. Wdyt?
Also applies to: 12-12, 16-16
🧰 Tools
🪛 Ruff (0.8.0)
11-11:
http.client
imported but unusedRemove unused import:
http.client
(F401)
23-23
: Consider movingSecretString
import into type-checking blockStatic analysis indicates that
SecretString
is used only for type hints. Should we move the import into theTYPE_CHECKING
block to avoid unnecessary runtime imports? Wdyt?🧰 Tools
🪛 Ruff (0.8.0)
23-23: Move application import
airbyte.secrets.base.SecretString
into a type-checking blockMove into type-checking block
(TC001)
airbyte/_util/api_util.py (2)
17-17
: Remove unused imports as indicated by static analysisStatic analysis tools suggest that the following imports are unused and can be removed:
pydoc.cli
at line 17token
at line 18airbyte.secrets.base.SecretString
at line 31Cleaning up unused imports helps maintain code cleanliness. Wdyt?
Also applies to: 18-18, 31-31
🧰 Tools
🪛 Ruff (0.8.0)
17-17:
pydoc.cli
imported but unusedRemove unused import:
pydoc.cli
(F401)
31-31
: Consider movingSecretString
import into type-checking blockSince
SecretString
appears to be used only for type annotations, should we move the import into theTYPE_CHECKING
block to avoid unnecessary runtime imports? Wdyt?🧰 Tools
🪛 Ruff (0.8.0)
31-31: Move application import
airbyte.secrets.base.SecretString
into a type-checking blockMove into type-checking block
(TC001)
tests/integration_tests/cloud/conftest.py (1)
7-7
: Remove unused import as indicated by static analysisStatic analysis indicates that the import
pydoc.cli
at line 7 is unused and can be removed. Wdyt?🧰 Tools
🪛 Ruff (0.8.0)
7-7:
pydoc.cli
imported but unusedRemove unused import:
pydoc.cli
(F401)
tests/integration_tests/cloud/test_cloud_api_util.py (2)
18-30
: Consider enhancing test coverage for workspace retrieval.The test verifies the basic functionality, but we could make it more robust. Would you consider adding assertions for other workspace properties or error cases? For example:
- Invalid credentials
- Non-existent workspace
- Malformed workspace ID
9-9
: Remove unused import.The
asdict
import from dataclasses isn't used in this file.-from dataclasses import asdict
🧰 Tools
🪛 Ruff (0.8.0)
9-9:
dataclasses.asdict
imported but unusedRemove unused import:
dataclasses.asdict
(F401)
airbyte/destinations/_util.py (1)
183-184
: Consider using custom exceptions for better error handling.Instead of using ValueError with string messages, would you consider creating custom exceptions? This would make error handling more structured and maintainable. Something like:
class CredentialsError(Exception): """Raised when required credentials are missing.""" pass # Then use it like: if not destination_configuration.motherduck_api_key: raise CredentialsError("MotherDuck API key is required")Also applies to: 198-199
🧰 Tools
🪛 Ruff (0.8.0)
184-184: Avoid specifying long messages outside the exception class
(TRY003)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (12)
airbyte/_util/api_util.py
(18 hunks)airbyte/cloud/connections.py
(8 hunks)airbyte/cloud/sync_results.py
(6 hunks)airbyte/cloud/workspaces.py
(7 hunks)airbyte/destinations/_util.py
(5 hunks)airbyte/exceptions.py
(2 hunks)airbyte/secrets/google_gsm.py
(1 hunks)pyproject.toml
(1 hunks)tests/integration_tests/cloud/conftest.py
(4 hunks)tests/integration_tests/cloud/test_cloud_api_util.py
(8 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(4 hunks)tests/integration_tests/test_all_cache_types.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/exceptions.py
- airbyte/cloud/connections.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/conftest.py
7-7: pydoc.cli
imported but unused
Remove unused import: pydoc.cli
(F401)
tests/integration_tests/cloud/test_cloud_api_util.py
9-9: dataclasses.asdict
imported but unused
Remove unused import: dataclasses.asdict
(F401)
airbyte/cloud/workspaces.py
11-11: http.client
imported but unused
Remove unused import: http.client
(F401)
12-12: pydoc.cli
imported but unused
Remove unused import: pydoc.cli
(F401)
16-16: pydantic.Secret
imported but unused
Remove unused import: pydantic.Secret
(F401)
23-23: Move application import airbyte.secrets.base.SecretString
into a type-checking block
Move into type-checking block
(TC001)
airbyte/_util/api_util.py
17-17: pydoc.cli
imported but unused
Remove unused import: pydoc.cli
(F401)
18-18: token
imported but unused
Remove unused import: token
(F401)
31-31: Move application import airbyte.secrets.base.SecretString
into a type-checking block
Move into type-checking block
(TC001)
airbyte/destinations/_util.py
42-42: Unused function argument: cache_type
(ARG001)
44-44: Missing issue link on the line following this TODO
(TD003)
49-49: Unused function argument: destination_type
(ARG001)
51-51: Missing issue link on the line following this TODO
(TD003)
184-184: Avoid specifying long messages outside the exception class
(TRY003)
199-199: Avoid specifying long messages outside the exception class
(TRY003)
250-250: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (14)
airbyte/cloud/workspaces.py (5)
232-235
: Consider making selected_streams
a required parameter
In the deploy_connection
method, selected_streams
has a default value of None
, but the code raises an error if it's not provided:
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance="You must provide `selected_streams` when creating a connection."
)
Would it be clearer to make selected_streams
a required parameter without a default value to reflect that it's mandatory? Wdyt?
243-248
: Redundant check for selected_streams
There appears to be a second check for selected_streams
being provided:
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance=(
"You must provide `selected_streams` when creating a connection "
"from an existing destination."
)
)
Since we already have this check earlier in the method, perhaps this block can be removed to avoid redundancy. Wdyt?
100-100
:
Possible inconsistency in resource_type
when raising duplicate resource error
In the deploy_source
method, when raising AirbyteDuplicateResourcesError
, the resource_type
is set to "destination"
:
raise exc.AirbyteDuplicateResourcesError(
resource_type="destination",
resource_name=name,
)
Since we're dealing with sources here, should resource_type
be "source"
instead? Wdyt?
171-175
:
Use a tuple for multiple types in isinstance
check
In the permanently_delete_source
method, the isinstance
check uses the bitwise |
operator:
if not isinstance(source, str | CloudSource):
# ...
However, isinstance
expects a type or a tuple of types. Should we update this to use a tuple instead?
if not isinstance(source, (str, CloudSource)):
# ...
Wdyt?
194-197
:
Use a tuple for multiple types in isinstance
check
Similarly, in the permanently_delete_destination
method, the isinstance
check uses str | CloudDestination
:
if not isinstance(destination, str | CloudDestination):
# ...
Should we change this to a tuple for proper type checking?
if not isinstance(destination, (str, CloudDestination)):
# ...
Wdyt?
airbyte/_util/api_util.py (1)
548-564
: 🛠️ Refactor suggestion
Question: Simplify destination type handling
Currently, the code checks each destination type individually. Would it be beneficial to use a mapping or a factory pattern to reduce repetition and improve scalability? For example:
destination_type_mapping = {
"snowflake": models.DestinationSnowflake,
"bigquery": models.DestinationBigquery,
"postgres": models.DestinationPostgres,
"duckdb": models.DestinationDuckdb,
}
destination_class = destination_type_mapping.get(destination_type)
if destination_class:
response.destination_response.configuration = destination_class(**raw_configuration)
Wdyt?
tests/integration_tests/cloud/test_cloud_sql_reads.py (2)
68-72
:
Fix method parameter to pass the connection object
The permanently_delete_connection
method expects a connection
parameter, but we're passing connection_id=connection
. Should we update the call to pass the connection
object directly?
cloud_workspace.permanently_delete_connection(
- connection_id=connection,
+ connection=connection,
delete_source=True,
delete_destination=True,
)
Wdyt?
74-76
:
Update parameters to match method definitions
The permanently_delete_source
and permanently_delete_destination
methods now expect source
and destination
parameters instead of source_id
and destination_id
. Should we update these calls?
cloud_workspace.permanently_delete_source(
- source_id=source_id
+ source=source_id
)
cloud_workspace.permanently_delete_destination(
- destination_id=destination_id
+ destination=destination_id
)
Wdyt?
tests/integration_tests/cloud/test_cloud_api_util.py (1)
Line range hint 33-57
: LGTM! Authentication mechanism updated correctly.
The transition from API key to client credentials is implemented properly with appropriate type annotations using SecretString.
🧰 Tools
🪛 Ruff (0.8.0)
9-9: dataclasses.asdict
imported but unused
Remove unused import: dataclasses.asdict
(F401)
airbyte/destinations/_util.py (1)
41-52
:
Implement TODO functions with proper type mapping.
The current implementation with hardcoded returns doesn't handle different cache/destination class types. Would you consider implementing a proper mapping system? Here's a suggestion, wdyt?
def cache_type_to_destination_type(
cache_type: type[CacheBase],
) -> type[Destination]:
- # TODO: Fixme
- return DestinationSnowflake
+ mapping = {
+ SnowflakeCache: DestinationSnowflake,
+ DuckDBCache: DestinationDuckdb,
+ PostgresCache: DestinationPostgres,
+ # Add other mappings as needed
+ }
+ if cache_type not in mapping:
+ raise ValueError(f"Unsupported cache type: {cache_type.__name__}")
+ return mapping[cache_type]
def destination_type_to_cache_type(
destination_type: type[Destination],
) -> type[CacheBase]:
- # TODO: Fixme
- return SnowflakeCache
+ mapping = {
+ DestinationSnowflake: SnowflakeCache,
+ DestinationDuckdb: DuckDBCache,
+ DestinationPostgres: PostgresCache,
+ # Add other mappings as needed
+ }
+ if destination_type not in mapping:
+ raise ValueError(f"Unsupported destination type: {destination_type.__name__}")
+ return mapping[destination_type]
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.0)
42-42: Unused function argument: cache_type
(ARG001)
44-44: Missing issue link on the line following this TODO
(TD003)
49-49: Unused function argument: destination_type
(ARG001)
51-51: Missing issue link on the line following this TODO
(TD003)
tests/integration_tests/test_all_cache_types.py (1)
179-179
: LGTM! Type annotations improve code clarity.
The addition of explicit ReadResult type annotations enhances type safety and makes the code more maintainable.
Also applies to: 194-194
airbyte/secrets/google_gsm.py (2)
Line range hint 64-108
: LGTM! Robust credential handling implementation
The credential handling logic is well-structured with clear error messages and proper fallback mechanisms. Nice work! 👍
164-164
:
Breaking change in return type - needs verification
The return type has changed from SecretString | None
to SecretString
. This means the method will never return None, which could break existing code that handles the None case. Should we verify that all callers are updated to handle this change? wdyt?
pyproject.toml (1)
18-18
: Verify airbyte-api version upgrade impact
The upgrade from ^0.49.2 to ^0.52.1 is a minor version bump. Should we check the changelog for any breaking changes that might affect our codebase? Also, would it be worth verifying compatibility with other dependencies? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (16)
airbyte/destinations/_translate_cache_to_dest.py (1)
79-84
: Update docstring to reflect the correct cache typeCould we update the docstring to reflect that this function gets the destination configuration from the MotherDuck cache instead of the DuckDB cache? WDYT?
Apply this diff to fix the issue:
- """Get the destination configuration from the DuckDB cache.""" + """Get the destination configuration from the MotherDuck cache."""airbyte/destinations/_translate_dest_to_cache.py (2)
39-41
: Update docstring to accurately describe the function purposeCould we update the docstring to reflect that this function creates a cache from a destination configuration? WDYT?
Apply this diff to fix the issue:
- """Get the destination configuration from the cache.""" + """Create a cache instance from the destination configuration."""
81-84
: Update docstring to reflect MotherDuck cacheCould we update the docstring to indicate that this function creates a new MotherDuck cache instead of a DuckDB cache? WDYT?
Apply this diff to fix the issue:
- """Create a new DuckDB cache from the destination configuration.""" + """Create a new MotherDuck cache from the destination configuration."""🧰 Tools
🪛 Ruff (0.8.0)
83-83: Avoid specifying long messages outside the exception class
(TRY003)
airbyte/caches/duckdb.py (1)
51-54
: Update docstring to reflect return typeCould we update the docstring to reflect that this method returns a
DestinationDuckdb
object instead of a dictionary? WDYT?Apply this diff to fix the issue:
- """Return a dictionary of destination configuration values.""" + """Return a `DestinationDuckdb` configuration object."""airbyte/caches/postgres.py (1)
53-61
: Consider adding validation for database configurationThe clone method looks good, but what do you think about adding some basic validation for the configuration parameters? For instance, checking if the port is within valid range (0-65535)? wdyt?
def clone_as_cloud_destination_config(self) -> DestinationPostgres: """Return a DestinationPostgres instance with the same configuration.""" + if not 0 <= self.port <= 65535: + raise ValueError(f"Invalid port number: {self.port}") return DestinationPostgres( host=self.host, port=self.port, username=self.username, password=self.password, database=self.database, )airbyte/destinations/util.py (1)
78-81
: Consider enhancing docstring with parameter descriptionThe new parameter looks good! Would you like to add its description to the docstring? Something like:
def get_noop_destination( *, install_if_missing: bool = True, ) -> Destination: """Get a devnull (no-op) destination. This is useful for performance benchmarking of sources, without adding the overhead of writing data to a real destination. + + Args: + install_if_missing: If True, automatically install the dev-null destination + connector if not found locally. """tests/integration_tests/cloud/conftest.py (2)
112-114
: Consider creating an issue for the DevNullDestination limitationThe TODO comment highlights an important limitation. Would you like me to help create a GitHub issue to track this? This could help ensure it's not forgotten in the future. 🤔
135-141
: Consider updating the dictionary commentThe comment mentions "cache types" but the dictionary is now about destination fixtures. Should we update the comment to reflect this change? wdyt?
- deployable_destination_fixtures: dict[str, str] = { + # Map of destination types to their respective fixture names, ordered by test execution speed + deployable_destination_fixtures: dict[str, str] = {airbyte/shared/state_providers.py (1)
55-55
: LGTM! Nice defensive programming! 🛡️The additional null checks make the code more robust. Would you consider adding type hints to make the expectations even more explicit? wdyt?
- if state_msg and state_msg.type == AirbyteStateType.STREAM and state_msg.stream + if state_msg is not None and state_msg.type == AirbyteStateType.STREAM and state_msg.stream is not Nonetests/integration_tests/conftest.py (1)
65-66
: Consider extracting the schema name prefix to a constant? 🎯I notice "test_deleteme_" is used in multiple places. Would it make sense to extract this to a constant at the module level? This would make it easier to update if needed, wdyt?
+TEST_SCHEMA_PREFIX = "test_deleteme_" + @pytest.fixture def new_motherduck_cache( motherduck_secrets, ) -> MotherDuckCache: return MotherDuckCache( database="my_db", # TODO: Use a dedicated DB for testing - schema_name=f"test_deleteme_{text_util.generate_random_suffix()}", + schema_name=f"{TEST_SCHEMA_PREFIX}{text_util.generate_random_suffix()}",Also applies to: 76-77, 114-116
airbyte/sources/util.py (1)
128-129
: Could we update the docstring for the new parameter? 📝The
install_if_missing
parameter has been added, but it's not documented in the function's docstring. Would you like me to suggest the docstring update? wdyt?Args: num_records (int | str): The number of records to generate. Defaults to "5e5", or 500,000 records. Can be an integer (`1000`) or a string in scientific notation. For example, `"5e6"` will generate 5 million records. + install_if_missing (bool, optional): Whether to install the connector if not available. + Defaults to True.tests/integration_tests/test_state_handling.py (1)
Line range hint
71-81
: Consider enhancing assertion messages for better test diagnostics, wdyt?The test verifies state persistence well, but adding descriptive messages to assertions could make debugging failures easier.
- assert result2.processed_records == 0 + assert result2.processed_records == 0, "Expected no new records due to state persistence" - assert len(list(state_provider.state_message_artifacts)) > 0 + assert len(list(state_provider.state_message_artifacts)) > 0, "Expected state artifacts to be present"tests/integration_tests/cloud/test_cloud_api_util.py (1)
158-164
: Consider extracting the name generation pattern into a helper function, wdyt?The pattern of creating names with "deleteme-" prefix and random suffix is repeated multiple times. Would it be clearer to extract this into a helper function?
def generate_test_resource_name(resource_type: str) -> str: return f"deleteme-{resource_type}-dummy{text_util.generate_random_suffix()}"airbyte/cloud/sync_results.py (1)
199-199
: Consider logging when None values are encounteredWhile returning 0 for None values is a safe default, it might hide underlying issues. Would it be helpful to add debug logging when None values are encountered?
@property def bytes_synced(self) -> int: + if self._fetch_latest_job_info().bytes_synced is None: + logger.debug("bytes_synced returned None for job %d", self.job_id) return self._fetch_latest_job_info().bytes_synced or 0Also applies to: 204-204
airbyte/cloud/workspaces.py (1)
178-182
: Consider using tuple for type checkingThe
isinstance
checks use the|
operator for union types, which might not be compatible with all Python versions. Would you consider using tuples instead? Wdyt?- if not isinstance(source, str | CloudSource): + if not isinstance(source, (str, CloudSource)): - if not isinstance(destination, str | CloudDestination): + if not isinstance(destination, (str, CloudDestination)):Also applies to: 201-204
airbyte/_util/api_util.py (1)
550-572
: Consider implementing a destination type registry patternThe current implementation uses multiple if-statements for different destination types. Would you consider using a mapping or registry pattern to make this more maintainable? Wdyt?
Example approach:
DESTINATION_TYPE_MAPPING = { "snowflake": models.DestinationSnowflake, "bigquery": models.DestinationBigquery, "postgres": models.DestinationPostgres, "duckdb": models.DestinationDuckdb, } # Then use: destination_class = DESTINATION_TYPE_MAPPING.get(destination_type) if destination_class: response.destination_response.configuration = destination_class(**raw_configuration)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (27)
airbyte/_util/api_util.py
(18 hunks)airbyte/_util/text_util.py
(1 hunks)airbyte/caches/base.py
(4 hunks)airbyte/caches/bigquery.py
(1 hunks)airbyte/caches/duckdb.py
(2 hunks)airbyte/caches/motherduck.py
(2 hunks)airbyte/caches/postgres.py
(2 hunks)airbyte/caches/snowflake.py
(1 hunks)airbyte/cloud/sync_results.py
(6 hunks)airbyte/cloud/workspaces.py
(5 hunks)airbyte/destinations/_translate_cache_to_dest.py
(1 hunks)airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)airbyte/destinations/util.py
(2 hunks)airbyte/shared/state_providers.py
(1 hunks)airbyte/sources/__init__.py
(1 hunks)airbyte/sources/util.py
(2 hunks)pyproject.toml
(2 hunks)tests/conftest.py
(2 hunks)tests/integration_tests/cloud/conftest.py
(3 hunks)tests/integration_tests/cloud/test_cloud_api_util.py
(7 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(5 hunks)tests/integration_tests/cloud/test_cloud_sync.py
(3 hunks)tests/integration_tests/cloud/test_cloud_workspaces.py
(2 hunks)tests/integration_tests/conftest.py
(2 hunks)tests/integration_tests/test_all_cache_types.py
(3 hunks)tests/integration_tests/test_source_test_fixture.py
(26 hunks)tests/integration_tests/test_state_handling.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- airbyte/_util/text_util.py
- airbyte/caches/snowflake.py
- airbyte/caches/bigquery.py
- tests/integration_tests/test_all_cache_types.py
- tests/integration_tests/cloud/test_cloud_sync.py
- pyproject.toml
- airbyte/caches/base.py
- tests/conftest.py
- airbyte/caches/motherduck.py
- tests/integration_tests/cloud/test_cloud_workspaces.py
- tests/integration_tests/cloud/test_cloud_sql_reads.py
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_translate_dest_to_cache.py
83-83: Avoid specifying long messages outside the exception class
(TRY003)
98-98: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (15)
airbyte/sources/__init__.py (1)
25-25
: LGTM!
The addition of registry
to the imports and __all__
list looks good.
Also applies to: 32-32
airbyte/caches/postgres.py (2)
48-52
: LGTM: Clean implementation of destination config translation
The property implementation looks clean and follows good practices by delegating to the translation function. Nice work! 👍
45-46
:
Verify destination pairing configuration
I notice that paired_destination_name
is set to "destination-bigquery" while the class is for PostgresCache. Should this be "destination-postgres" instead? wdyt? 🤔
✅ Verification successful
Interesting pattern found in cache implementations! 🤔
Looking at all cache implementations (BigQuery, Snowflake, MotherDuck, and Postgres), I notice they all have paired_destination_name = "destination-bigquery"
. This seems to be an intentional pattern where all caches are designed to pair with BigQuery as the destination, while their respective paired_destination_config_class
points to their own destination type (e.g., DestinationPostgres
for PostgresCache).
This appears to be a deliberate architectural decision rather than a mistake. The code is consistent across all implementations and the translation logic in _translate_cache_to_dest.py
supports this pattern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any references to this pairing in the codebase
rg -l "paired_destination_name.*destination-bigquery.*PostgresCache"
rg -l "paired_destination_name.*destination-postgres.*PostgresCache"
Length of output: 141
Script:
#!/bin/bash
# Let's check the base class and related cache implementations
ast-grep --pattern 'class $_Cache:
$$$
paired_destination_name: ClassVar[str | None] = $_
$$$'
# Also check for any other cache classes with destination pairing
ast-grep --pattern 'class $_ {
$$$
paired_destination_name = $_
$$$
}'
# Let's look at the PostgresCache class implementation
ast-grep --pattern 'class PostgresCache($_):
$$$'
# Check for any test files that might validate this pairing
fd -e py -exec rg -l "PostgresCache.*destination.*bigquery" {} \;
Length of output: 527
Script:
#!/bin/bash
# Let's try a simpler search for the file content
cat airbyte/caches/postgres.py
# Search for any test files related to PostgresCache
fd -e py test | xargs rg -l "PostgresCache"
# Look for any BigQuery related configurations
rg -l "destination-bigquery"
Length of output: 2371
Script:
#!/bin/bash
# Let's check other cache implementations to understand the pairing pattern
cat airbyte/caches/bigquery.py
cat airbyte/caches/snowflake.py
cat airbyte/caches/motherduck.py
# Also check the destination translation implementation
cat airbyte/destinations/_translate_cache_to_dest.py
Length of output: 10968
tests/integration_tests/cloud/conftest.py (1)
53-54
: LGTM: Clean credentials handling implementation
The transition from API key to client credentials is well implemented with proper typing and JSON parsing. Nice work on improving security! 🔐
Also applies to: 57-63
tests/integration_tests/conftest.py (1)
93-104
: LGTM! Nice cleanup handling! 🧹
The schema cleanup in the teardown is well implemented. The use of SQLAlchemy's future mode and proper connection handling is great!
airbyte/sources/util.py (1)
171-171
: LGTM! Nice parameter forwarding! 🎯
The install_if_missing
parameter is correctly forwarded to get_source
, maintaining consistency in behavior.
tests/integration_tests/test_state_handling.py (2)
141-143
: LGTM! Good use of type annotations
The explicit type annotation for state_provider improves code clarity and maintainability.
132-133
: Consider addressing the type system issue instead of suppressing it
The type ignore comment suggests there's an underlying issue with the type system. Would it be better to properly define the missing 'global' class or update the type hints to match the actual types being used?
airbyte/cloud/sync_results.py (1)
138-138
:
Verify compatibility with job_id type change
The job_id type has changed from str to int. This could be a breaking change if any code expects string job IDs.
✅ Verification successful
Let me search for broader patterns to understand how job_id is used across the codebase. This will help us verify if the type change from str to int could cause compatibility issues.
The previous search revealed two instances of job_id
typed as int
, but let's search more broadly without file type restrictions to ensure we catch all usages and potential type conversions.
The job_id type change looks safe to me - what do you think? 🤔
From the codebase analysis, all job_id type hints are consistently using int
or int | None
. The only string conversion happens in the API utility layer (_util/api_util.py
) where it's explicitly cast to string for API resource naming, which is a safe practice. No code expects or relies on job_id being a string type.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for job_id usage in the codebase
ast-grep --pattern 'job_id: str'
rg "job_id.*=.*str\(" -A 2
Length of output: 63
Script:
#!/bin/bash
# Let's search for job_id usage patterns more broadly
rg "job_id" -t python -A 2
# Also search for type hints or annotations related to job_id
rg "job_id.*:" -t python -A 2
# Look for any job_id in function signatures
ast-grep --pattern 'def $_(job_id$_)'
# Check for job_id in class attributes or type definitions
ast-grep --pattern 'job_id: $_'
Length of output: 300
Script:
#!/bin/bash
# Let's try with ripgrep without file type filtering
rg "job_id" -A 2
# Search for any string conversions or type casting of job_id
rg "str\(.*job_id" -A 2
rg "job_id.*str\(" -A 2
# Look for job_id in function calls or assignments
rg "job_id\s*=" -A 2
rg "job_id:" -A 2
Length of output: 6211
airbyte/cloud/workspaces.py (3)
36-43
: LGTM: Secure handling of credentials
The post-init method properly ensures that both client ID and secret are handled securely using SecretString.
94-100
:
Incorrect resource type in error message
The error message for duplicate source names incorrectly uses "destination" as the resource type.
- resource_type="destination",
+ resource_type="source",
Likely invalid or redundant comment.
239-255
: 🛠️ Refactor suggestion
Redundant validation check
There are two identical checks for selected_streams
. The second check appears redundant since we already validate this condition earlier. Wdyt?
if not selected_streams:
raise exc.PyAirbyteInputError(
guidance="You must provide `selected_streams` when creating a connection."
)
# Resolve source ID
source_id: str
source_id = source if isinstance(source, str) else source.connector_id
# destination is already deployed
destination_id = destination if isinstance(destination, str) else destination.connector_id
- if not selected_streams:
- raise exc.PyAirbyteInputError(
- guidance=(
- "You must provide `selected_streams` when creating a connection "
- "from an existing destination."
- )
- )
Likely invalid or redundant comment.
airbyte/_util/api_util.py (1)
52-63
: LGTM: Secure token URL construction
The authentication implementation correctly constructs the token URL and handles credentials securely.
tests/integration_tests/test_source_test_fixture.py (2)
77-78
: LGTM: Improved type annotations
The change from SqlProcessorBase
to CacheBase
makes the function more flexible by accepting any cache implementation.
419-420
: LGTM: Consistent random suffix generation
Good use of text_util.generate_random_suffix()
for generating cache names, replacing the previous ulid
usage.
Also applies to: 496-496
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (8)
tests/integration_tests/cloud/conftest.py (4)
53-54
: Consider a more descriptive type alias name?The type alias
CloudAPICreds
is a bit generic. What do you think about something more specific likeCloudAPIClientCredentials
to better reflect that it's specifically for client credentials? wdyt? 🤔
112-114
: Should we track the DevNullDestination limitation?The TODO comment mentions a Cloud limitation preventing the use of DevNullDestination. Would it be helpful to create a GitHub issue to track this limitation and its workaround? I can help create one if you'd like! 🤝
135-140
: Clean up commented destinations?There are some commented-out destinations in the test generation. Should we remove these if they're not being used, or are they planned for future implementation? wdyt? 🧹
107-116
: Consider documenting the BigQuery workaround?Since we're using BigQuery as a temporary workaround for DevNullDestination, should we add a note in the docstring explaining why this specific destination was chosen and any performance implications? This could help future maintainers understand the decision. wdyt? 📝
def deployable_dummy_destination( new_bigquery_destination: Destination, ) -> Destination: - """A local PyAirbyte `Destination` object. + """A local PyAirbyte `Destination` object. + + Note: Currently using BigQuery as a workaround since DevNullDestination + is not supported in Cloud. While this works, it may have performance + implications for tests. # TODO: Use DevNullDestination instead of BigQueryDestination. Problem is that 'dev-null' is not accepted on Cloud as of now.tests/integration_tests/test_source_test_fixture.py (2)
34-35
: Consider adding docstrings to test functions?The type annotations are great! Would you consider adding docstrings to test functions to explain the test scenarios and expectations? This could help other developers understand the test cases better, wdyt?
Example:
def test_registry_get() -> None: """Test that the registry correctly returns metadata for a known connector. Expects: - Metadata object is returned - Correct name and version are set """Also applies to: 100-101, 145-147, 194-214, 222-227, 237-238, 302-310, 311-316, 882-887, 889-896, 899-908, 910-944
77-78
: Consider adding validation for None streams?The type update to allow
streams: list[str] | None
is good! Would you consider adding explicit validation whenstreams
is None but the cache is empty? This could provide clearer error messages, wdyt?def assert_data_matches_cache( expected_test_stream_data: dict[str, list[dict[str, str | int]]], cache: CacheBase, streams: list[str] | None = None, ) -> None: + if streams is None and not expected_test_stream_data: + raise ValueError("No streams specified and expected_test_stream_data is empty") for stream_name in streams or expected_test_stream_data.keys():airbyte/destinations/_translate_dest_to_cache.py (2)
50-64
: Consider simplifying destination type extractionThe current logic for extracting destination type is a bit complex. What do you think about simplifying it like this? WDYT?
- if isinstance(destination_configuration, dict): - try: - destination_type = ( - destination_configuration.get("DESTINATION_TYPE") - or destination_configuration["destinationType"] - ) - if hasattr(destination_configuration, "value"): - destination_type = destination_type.value - else: - destination_type = str(destination_type) - except KeyError as ex: - raise ValueError( - f"Missing 'destinationType' in keys {list(destination_configuration.keys())}." - ) from ex - else: - destination_type = str(destination_configuration.DESTINATION_TYPE) + try: + if isinstance(destination_configuration, dict): + destination_type = str( + destination_configuration.get("DESTINATION_TYPE") + or destination_configuration["destinationType"] + ) + if hasattr(destination_type, "value"): + destination_type = destination_type.value + else: + destination_type = str(destination_configuration.DESTINATION_TYPE) + except (KeyError, AttributeError) as ex: + raise ValueError( + "Could not determine destination type. Ensure 'destinationType' or 'DESTINATION_TYPE' is provided." + ) from ex🧰 Tools
🪛 Ruff (0.8.0)
60-62: Avoid specifying long messages outside the exception class
(TRY003)
113-115
: Consider moving validation before port parsingWhat do you think about moving the password validation before the port parsing? This way, we fail fast if the required credential is missing. WDYT?
+ if not destination_configuration.password: + raise ValueError("Password is required for Postgres cache.") + port: int = int(destination_configuration.port) if destination_configuration.port else 5432 - if not destination_configuration.password: - raise ValueError("Password is required for Postgres cache.")🧰 Tools
🪛 Ruff (0.8.0)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)tests/integration_tests/cloud/conftest.py
(3 hunks)tests/integration_tests/cloud/test_cloud_sync.py
(3 hunks)tests/integration_tests/conftest.py
(2 hunks)tests/integration_tests/test_source_faker_integration.py
(0 hunks)tests/integration_tests/test_source_test_fixture.py
(26 hunks)
💤 Files with no reviewable changes (1)
- tests/integration_tests/test_source_faker_integration.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration_tests/cloud/test_cloud_sync.py
- tests/integration_tests/conftest.py
🧰 Additional context used
🪛 Ruff (0.8.0)
airbyte/destinations/_translate_dest_to_cache.py
60-62: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Avoid specifying long messages outside the exception class
(TRY003)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
tests/integration_tests/cloud/conftest.py (1)
85-94
: Clean transition to client credentials! 👍
The updated cloud_workspace
fixture nicely handles the transition from API key to client credentials. The parameter ordering and type hints look great!
tests/integration_tests/test_source_test_fixture.py (2)
21-24
: LGTM!
The new imports are well-organized and necessary for the type annotations and text utility changes.
419-419
: LGTM! Verify text_util usage across codebase.
The switch to text_util.generate_random_suffix()
looks good and provides consistency. Let's verify this pattern is used consistently across the codebase.
Also applies to: 439-439, 496-496
✅ Verification successful
LGTM! The codebase shows consistent usage patterns
The verification shows a clean transition pattern:
text_util.generate_random_suffix()
is consistently used across test files for generating unique identifiers- The underlying implementation in
text_util.py
actually uses ULID internally, making this a proper abstraction - Remaining ULID usages are either in dependencies (pyproject.toml, poetry.lock) or part of the core functionality where direct ULID usage is appropriate
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining ulid usage and verify text_util adoption
# Test: Search for ulid usage that might need updating
rg 'ulid'
# Test: Verify consistent usage of text_util.generate_random_suffix
rg 'text_util\.generate_random_suffix'
Length of output: 3980
airbyte/destinations/_translate_dest_to_cache.py (2)
43-43
: Typo in function name mapping persists
The typo "biqquery" still exists in the function mapping. Let's fix this to maintain consistency?
136-136
: Add validation for Snowflake host format
The host parsing assumes a specific format. Should we add validation to ensure the host contains ".snowflakecomputing"? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
airbyte/destinations/_translate_dest_to_cache.py (3)
61-64
: Enhance error message with valid destination typesWhat do you think about making the error message more helpful by including the list of valid destination types? This could help users quickly identify the issue. wdyt?
- f"Missing 'destinationType' in keys {list(destination_configuration.keys())}." + f"Missing 'destinationType' in keys {list(destination_configuration.keys())}. " + f"Valid types are: {list(conversion_fn_map.keys())}"
101-103
: Consider creating a specific MotherDuck destination typeThe function accepts
DestinationDuckdb
but is specifically for MotherDuck. Should we create a separateDestinationMotherduck
type to make this more explicit and type-safe? wdyt?
119-119
: Consider extracting default port as a constantWhat do you think about moving the default port value to a constant at the module level? This would make it easier to maintain and more visible. wdyt?
+DEFAULT_POSTGRES_PORT = 5432 + def postgres_destination_to_cache( destination_configuration: DestinationPostgres, ) -> PostgresCache: """Create a new Postgres cache from the destination configuration.""" - port: int = int(destination_configuration.port) if destination_configuration.port else 5432 + port: int = int(destination_configuration.port) if destination_configuration.port else DEFAULT_POSTGRES_PORTtests/integration_tests/cloud/test_cloud_sql_reads.py (1)
136-149
: Consider extracting common test parametersThese parameters are duplicated across multiple test functions. Should we extract them into a shared fixture or constant to make maintenance easier? wdyt?
+CLOUD_CONNECTION_TEST_CASES = [ + pytest.param("c7b4d838-a612-495a-9d91-a14e477add51", id="Faker->Snowflake"), + pytest.param("0e1d6b32-b8e3-4b68-91a3-3a314599c782", id="Faker->BigQuery"), + pytest.param("", id="Faker->Postgres", marks=pytest.mark.skip(reason="Not yet supported")), + pytest.param("", id="Faker->MotherDuck", marks=pytest.mark.skip(reason="Not yet supported")), +] -@pytest.mark.parametrize( - "deployed_connection_id", - [ - pytest.param("c7b4d838-a612-495a-9d91-a14e477add51", id="Faker->Snowflake"), - ... - ], -) +@pytest.mark.parametrize("deployed_connection_id", CLOUD_CONNECTION_TEST_CASES)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)pyproject.toml
(3 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(6 hunks)tests/integration_tests/test_source_faker_integration.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- tests/integration_tests/test_source_faker_integration.py
🔇 Additional comments (1)
airbyte/destinations/_translate_dest_to_cache.py (1)
43-43
: Fix typo in function name mapping
The mapping uses "biqquery" instead of "bigquery" which could cause runtime errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (13)
airbyte/destinations/_translate_dest_to_cache.py (4)
51-65
: Consider simplifying the destination type extraction logicThe current implementation has multiple nested conditions. What do you think about simplifying it like this? WDYT?
if isinstance(destination_configuration, dict): try: - destination_type = ( - destination_configuration.get("DESTINATION_TYPE") - or destination_configuration["destinationType"] - ) - if hasattr(destination_type, "value"): - destination_type = destination_type.value - elif hasattr(destination_type, "_value_"): - destination_type = destination_type._value_ - else: - destination_type = str(destination_type) + dest_type = destination_configuration.get("DESTINATION_TYPE") or destination_configuration["destinationType"] + destination_type = getattr(dest_type, "value", None) or getattr(dest_type, "_value_", None) or str(dest_type) except KeyError as ex: raise ValueError( - f"Missing 'destinationType' in keys {list(destination_configuration.keys())}." + f"Missing destination type. Available keys: {', '.join(destination_configuration.keys())}" ) from ex
146-164
: Consider simplifying Snowflake password handling logicThe password extraction logic is quite complex. Would you consider simplifying it like this? WDYT?
- snowflake_password: str | None = None - if ( - destination_configuration.credentials - and hasattr(destination_configuration.credentials, "password") - and isinstance(destination_configuration.credentials.password, str) # type: ignore [attr-defined] - ): - destination_password = str(destination_configuration.credentials.password) # type: ignore [attr-defined] - if "****" in destination_password: - try: - snowflake_password = get_secret(password_secret_name) - except ValueError as ex: - raise PyAirbyteSecretNotFoundError( - "Password is required for Snowflake cache, but it was not available." - ) from ex - else: - snowflake_password = get_secret(destination_password) - else: - snowflake_password = get_secret(password_secret_name) + try: + if hasattr(destination_configuration.credentials, "password"): + password = str(destination_configuration.credentials.password) # type: ignore [attr-defined] + secret_name = password_secret_name if "****" in password else password + else: + secret_name = password_secret_name + + snowflake_password = get_secret(secret_name) + except ValueError as ex: + raise PyAirbyteSecretNotFoundError( + "Password is required for Snowflake cache, but it was not available." + ) from ex
150-150
: Consider using isinstance with typing.cast for better type safetyInstead of using type ignore, we could use
typing.cast
for better type safety. WDYT?- and isinstance(destination_configuration.credentials.password, str) # type: ignore [attr-defined] + and isinstance(typing.cast(Any, destination_configuration.credentials).password, str)
73-90
: Consider enhancing docstrings with parameter and return type descriptionsThe docstrings could be more informative. Would you like to add parameter and return type descriptions? For example:
def biqquery_destination_to_cache( destination_configuration: DestinationBigquery | dict[str, Any], ) -> BigQueryCache: """Create a new BigQuery cache from the destination configuration. Args: destination_configuration: BigQuery destination configuration object or dict containing project_id and dataset_id. Returns: BigQueryCache: Configured BigQuery cache instance. Note: We may have to inject credentials, because they are obfuscated when config is returned from the REST API. """Also applies to: 92-99, 102-113
tests/integration_tests/cloud/conftest.py (2)
6-6
: Remove unused import?The
auto
import fromenum
appears to be unused in this file. Should we remove it to keep the imports clean? wdyt?-from enum import auto
🧰 Tools
🪛 Ruff (0.8.0)
6-6:
enum.auto
imported but unusedRemove unused import:
enum.auto
(F401)
124-127
: Track TODO for DevNullDestination implementationThe comment indicates that using DevNullDestination would be preferable but is currently blocked by Cloud limitations. Would you like me to create a GitHub issue to track this improvement? This could help with test performance and simplicity once Cloud supports it.
tests/integration_tests/cloud/test_cloud_sql_reads.py (2)
12-12
: Remove unused import?The
CloudSource
import appears to be unused. Should we remove it? wdyt?-from airbyte.cloud.connectors import CloudSource
🧰 Tools
🪛 Ruff (0.8.0)
12-12:
airbyte.cloud.connectors.CloudSource
imported but unusedRemove unused import:
airbyte.cloud.connectors.CloudSource
(F401)
30-84
: Consider removing commented-out test?Since this test is redundant due to Cloud auto-retries, should we remove it entirely instead of keeping it commented out? This would improve code maintainability. wdyt?
If you decide to keep it, we should add the missing import:
+from airbyte.utils import text_util
tests/integration_tests/cloud/test_cloud_workspaces.py (5)
14-14
: Remove unnecessary empty line, wdyt?This empty line doesn't serve any purpose and can be removed to maintain a cleaner file structure.
19-19
: Enhance docstring to be more descriptive, wdyt?The current docstring mentions "source" but this is a destination test. Consider updating it to something like:
- """Test deploying a source to a workspace.""" + """Test deploying a destination to a workspace and verify cleanup. + + This test verifies that: + 1. A destination can be successfully deployed with a random name suffix + 2. The deployed destination can be properly cleaned up + """
37-41
: Consider parameterizing the source configuration, wdyt?The source configuration is currently hardcoded. To make the test more maintainable and reusable, consider moving the configuration to a fixture or test constants:
- cloud_source: CloudSource = cloud_workspace.deploy_source( - name="test-source", - source=source, - ) + cloud_source: CloudSource = cloud_workspace.deploy_source( + name="test-source", + source=source, + random_name_suffix=True, # For consistency with other tests + )
51-54
: Consider adding random_name_suffix for consistency, wdyt?Other test functions use
random_name_suffix=True
to prevent naming conflicts. Should we add it here too?cloud_source: CloudSource = cloud_workspace.deploy_source( name="test-source", source=deployable_dummy_source, + random_name_suffix=True, )
63-63
: Update docstring to reflect current functionality, wdyt?The docstring still mentions "cache" but the function now tests source and destination deployment:
- """Test deploying a source and cache to a workspace as a new connection.""" + """Test deploying a source and destination to a workspace as a new connection. + + Verifies that: + 1. Source and destination can be deployed with random name suffixes + 2. A connection can be established between them + 3. Resources are properly cleaned up using cascade deletion + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)tests/integration_tests/cloud/conftest.py
(3 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(5 hunks)tests/integration_tests/cloud/test_cloud_workspaces.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/integration_tests/cloud/test_cloud_sql_reads.py
12-12: airbyte.cloud.connectors.CloudSource
imported but unused
Remove unused import: airbyte.cloud.connectors.CloudSource
(F401)
tests/integration_tests/cloud/conftest.py
6-6: enum.auto
imported but unused
Remove unused import: enum.auto
(F401)
🔇 Additional comments (6)
airbyte/destinations/_translate_dest_to_cache.py (1)
44-44
: Typo in function name biqquery_destination_to_cache
The function name contains a typo: "biqquery" should be "bigquery".
tests/integration_tests/cloud/conftest.py (2)
57-80
: LGTM! Great security improvement
The transition from API key to client credentials (client_id and client_secret) is a more secure approach, following OAuth best practices.
164-184
: LGTM! Well-structured credential management
The session-scoped fixtures for BigQuery and Snowflake credentials are well-implemented with proper:
- Credential retrieval from secure storage
- Environment variable setup
- Cleanup in fixture teardown
Also applies to: 187-205
tests/integration_tests/cloud/test_cloud_sql_reads.py (2)
174-176
: Add cache type verification?
Should we enhance the test by verifying the cache type matches the expected destination? This was suggested in a previous review. wdyt?
cache = sync_result.get_sql_cache()
sqlalchemy_url = cache.get_sql_alchemy_url()
engine: Engine = sync_result.get_sql_engine()
+
+# Verify cache type based on destination
+if deployed_connection_id == "c7b4d838-a612-495a-9d91-a14e477add51":
+ assert isinstance(cache, SnowflakeCache)
+elif deployed_connection_id == "0e1d6b32-b8e3-4b68-91a3-3a314599c782":
+ assert isinstance(cache, BigQueryCache)
140-154
: LGTM! Well-structured test parameterization
The test parameterization is clear and well-organized with:
- Descriptive test IDs
- Clear skip markers for unsupported destinations
- Consistent structure across test functions
tests/integration_tests/cloud/test_cloud_workspaces.py (1)
81-82
: Are these assertions still relevant, wdyt?
There are commented out assertions that might be important for testing. If they're still relevant, should we uncomment and update them? If not, should we remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (8)
airbyte/destinations/_translate_cache_to_dest.py (4)
11-11
: Consider removing the commented importHey! I noticed there's a commented out import for
StandardInserts
. Should we remove it if it's not being used? This would help keep the imports clean and avoid confusion. WDYT? 🤔
43-54
: Consider documenting the purpose of duplicate mappingsI see we have both class names (e.g., "BigQueryCache") and lowercase variants (e.g., "bigquery") in the mapping. Could we add a comment explaining why both forms are needed? This would help future maintainers understand the design decision. WDYT? 🤔
107-109
: Consider documenting the case transformationsI notice we're using
.upper()
for database and schema names in Snowflake. Could we add a comment explaining why this is necessary? This would help others understand Snowflake's case sensitivity requirements. WDYT? 🤔
130-130
: Consider making the dataset location configurableThe BigQuery dataset location is hardcoded to "US". Should we make this configurable through the cache configuration to support other regions? This would provide more flexibility for users with data residency requirements. WDYT? 🤔
tests/integration_tests/cloud/test_cloud_sync.py (1)
72-79
: Consider enhancing the error message with more details.The error handling is good, but would it be helpful to include what types are actually supported? For example:
- "new_deployable_destination must be a dictionary or a dataclass. " - f"Instead, it is a {type(new_deployable_destination)}." + "new_deployable_destination must be a dictionary or a dataclass that can be converted to dict. " + f"Got {type(new_deployable_destination)}. Supported types: dict, @dataclass-decorated classes."wdyt?
airbyte/sources/util.py (1)
140-144
: Consider enhancing the install_if_missing parameter documentation.Would it be helpful to explain the behavior when set to False? For example:
- install_if_missing: Whether to install the source if it is not available locally. + install_if_missing: Whether to install the source if it is not available locally. + When False, raises an error if the source is not already installed.wdyt?
airbyte/_util/api_util.py (2)
132-138
: Standardize error handling across list functionsThe error handling patterns in list_* functions are similar but not identical. Should we extract this into a common helper? Wdyt?
Example approach:
def handle_list_response(response, resource_type: str, workspace_id: str) -> None: if not status_ok(response.status_code): raise AirbyteError( context={ "workspace_id": workspace_id, "resource_type": resource_type, "response": response, } )Also applies to: 175-181, 215-221, 253-259
616-631
: Consider using a configuration class for connection creationThe function has many parameters (hence the noqa comment). Should we introduce a ConnectionConfig class to group these parameters? Wdyt?
Example approach:
@dataclass class ConnectionConfig: name: str source_id: str destination_id: str workspace_id: str | None prefix: str selected_stream_names: list[str] def create_connection( config: ConnectionConfig, *, api_root: str, client_id: SecretString, client_secret: SecretString, ) -> models.ConnectionResponse:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
airbyte/_message_iterators.py
(3 hunks)airbyte/_util/api_util.py
(18 hunks)airbyte/cloud/workspaces.py
(4 hunks)airbyte/destinations/_translate_cache_to_dest.py
(1 hunks)airbyte/sources/util.py
(3 hunks)tests/integration_tests/cloud/conftest.py
(5 hunks)tests/integration_tests/cloud/test_cloud_api_util.py
(7 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(4 hunks)tests/integration_tests/cloud/test_cloud_sync.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- airbyte/_message_iterators.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration_tests/cloud/test_cloud_sql_reads.py
- tests/integration_tests/cloud/conftest.py
🔇 Additional comments (15)
airbyte/destinations/_translate_cache_to_dest.py (2)
57-60
: Consider combining exception messages in the ValueError
Could we merge the exception messages into a single string to prevent the error message from being displayed as a tuple? WDYT?
- raise ValueError(
- "Cannot convert cache type to destination configuration. Cache type not supported. ",
- f"Supported cache types: {list(conversion_fn_map.keys())}",
- )
+ raise ValueError(
+ "Cannot convert cache type to destination configuration. Cache type not supported. "
+ f"Supported cache types: {list(conversion_fn_map.keys())}"
+ )
1-133
: Overall implementation looks great! 👍
The code is well-structured, properly typed, and implements a clean translation layer between cache and destination configurations. The conversion functions are clear and handle their specific cases well.
tests/integration_tests/cloud/test_cloud_sync.py (2)
35-37
: LGTM! Good practice to verify connection first.
Getting the connection before running sync ensures the connection exists and is accessible.
47-49
: LGTM! Consistent with the pattern of verifying connection first.
Getting the connection before accessing sync logs follows the same good practice as the previous change.
airbyte/sources/util.py (1)
128-129
: LGTM! Consistent parameter addition.
The new parameter maintains consistency with the get_source
function's interface.
tests/integration_tests/cloud/test_cloud_api_util.py (3)
159-165
: LGTM! Good use of consistent naming and utilities.
The use of text_util.generate_random_suffix()
and clear naming conventions makes the test more maintainable.
14-15
: LGTM! Good security practice with SecretString.
Proper use of SecretString
type for sensitive credentials enhances security.
Also applies to: 20-21, 90-91, 120-121, 155-156
74-84
:
Critical: Function is using wrong API call
The test function test_list_destinations
is actually calling api_util.list_sources()
instead of api_util.list_destinations()
. This means the test is not testing what it claims to test.
- result: list[SourceResponse] = api_util.list_sources(
+ result: list[DestinationResponse] = api_util.list_destinations(
workspace_id=workspace_id,
api_root=airbyte_cloud_api_root,
client_id=airbyte_cloud_client_id,
client_secret=airbyte_cloud_client_secret,
)
assert (
result
and len(result) > 0
- and all(isinstance(source, SourceResponse) for source in result)
+ and all(isinstance(destination, DestinationResponse) for destination in result)
)
airbyte/cloud/workspaces.py (6)
36-43
: LGTM! Secure credential handling implementation.
The transition to client credentials with SecretString wrapper provides better security than the previous API key approach.
250-255
: Remove redundant selected_streams check
There's a duplicate check for selected_streams. Since we already verify this earlier in the method, should we remove this second check? Wdyt?
97-100
:
Fix incorrect resource type in error message
The error message uses "destination" for a source deployment.
Apply this fix:
raise exc.AirbyteDuplicateResourcesError(
- resource_type="destination",
+ resource_type="source",
resource_name=name,
)
178-182
:
Update isinstance check to use tuple syntax
The isinstance check should use a tuple for multiple types.
Apply this fix:
-if not isinstance(source, str | CloudSource):
+if not isinstance(source, (str, CloudSource)):
201-204
:
Update isinstance check to use tuple syntax
The isinstance check should use a tuple for multiple types.
Apply this fix:
-if not isinstance(destination, str | CloudDestination):
+if not isinstance(destination, (str, CloudDestination)):
222-224
: 🛠️ Refactor suggestion
Consider making selected_streams a required parameter
Since selected_streams is mandatory, should we make it a required parameter without a default value? Wdyt?
-selected_streams: list[str] | None = None,
+selected_streams: list[str],
airbyte/_util/api_util.py (1)
556-572
: Consider using a mapping pattern for destination types
The current implementation checks each destination type individually. Would it be cleaner to use a mapping? Wdyt?
Example approach:
DESTINATION_TYPE_MAPPING = {
"snowflake": models.DestinationSnowflake,
"bigquery": models.DestinationBigquery,
"postgres": models.DestinationPostgres,
"duckdb": models.DestinationDuckdb,
}
destination_type = raw_response.get("destinationType")
if destination_type in DESTINATION_TYPE_MAPPING:
model_class = DESTINATION_TYPE_MAPPING[destination_type]
response.destination_response.configuration = model_class(**raw_configuration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (18)
tests/integration_tests/test_source_test_fixture.py (2)
48-53
: Great addition of descriptive assertion messages! Consider standardizing the message format.The added error messages will make test failures more actionable. However, I notice slight inconsistencies in message formatting:
- Some end with periods, others don't
- Some use single quotes, others don't
What do you think about standardizing these for consistency? Here's a suggestion:
- f"Column '{internal_column}' should exist in stream data." + f"Column '{internal_column}' should exist in stream data"Also applies to: 64-70, 706-708, 711-713, 718-720
34-34
: Excellent type annotation additions! Consider adding docstring type hints too.The return type annotations make the code more maintainable and help catch type-related issues early. What do you think about also adding type hints in the docstrings for better IDE support? For example:
def test_registry_get() -> None: """Test registry get functionality. Returns: None """Also applies to: 100-100, 145-145, 194-194, 214-214, 222-222, 227-227, 237-237, 311-311, 363-363, 381-383, 415-415, 435-435, 488-488, 527-527, 553-553, 575-575, 676-676, 786-786, 845-845, 882-882, 889-889, 899-899, 910-910
tests/integration_tests/test_bigquery_cache.py (2)
17-19
: How about enhancing the error message for easier debugging? wdyt?The assertion looks good, but we could make debugging easier by including the actual values in the error message. Here's a suggestion:
assert new_bigquery_cache.dataset_name == new_bigquery_cache.schema_name, ( - "Dataset name should be the same as schema name." + f"Dataset name '{new_bigquery_cache.dataset_name}' should be the same as schema name '{new_bigquery_cache.schema_name}'." )
25-27
: Two small suggestions to improve this assertion, what do you think?
- For consistency with the previous assertion style and to avoid unnecessary method calls in tests, we could use property access if available.
- The error message could be more descriptive by including the actual values.
- assert new_bigquery_cache.get_database_name() == new_bigquery_cache.project_name, ( - "Database name should be the same as project name." + assert new_bigquery_cache.database_name == new_bigquery_cache.project_name, ( + f"Database name '{new_bigquery_cache.database_name}' should be the same as project name '{new_bigquery_cache.project_name}'." )Note: Only apply the property access change if
database_name
is available as a property. If not, we can keep the method call but still enhance the error message.tests/unit_tests/test_exceptions.py (1)
24-26
: LGTM! Clean formatting change! ✨The multi-line assertion improves readability. Hey, what do you think about adding a test case for nested exception inheritance? It could help ensure subclasses also follow the naming convention, wdyt?
airbyte/_writers/jsonl.py (1)
37-38
: The type hint change looks good! A couple of thoughts...
The string literal type hint
"IO[str]"
is a good practice for forward compatibility 👍I notice the comment "Avoiding context manager" - would it be helpful to add a brief docstring explaining why we're managing the file handle manually? This could help future maintainers understand the intentional design choice, wdyt?
examples/run_bigquery_destination.py (1)
22-29
: Hey there! Mind if I suggest a small readability improvement? 🤔The current parentheses structure works, but we could make it even more readable by using a different line break pattern. What do you think about this approach? wdyt?
-bigquery_destination_secret: dict = ( - GoogleGSMSecretManager( - project=AIRBYTE_INTERNAL_GCP_PROJECT, - credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"), - ) - .get_secret(SECRET_NAME) - .parse_json() -) +bigquery_destination_secret: dict = GoogleGSMSecretManager( + project=AIRBYTE_INTERNAL_GCP_PROJECT, + credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"), +).get_secret(SECRET_NAME).parse_json()examples/run_bigquery_faker.py (1)
23-30
: For consistency, shall we apply the same readability improvement here? 🎨Similar to the previous file, we could make this more readable. This would keep both files consistent in their style. What are your thoughts? wdyt?
-bigquery_destination_secret: dict = ( - GoogleGSMSecretManager( - project=AIRBYTE_INTERNAL_GCP_PROJECT, - credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"), - ) - .get_secret(SECRET_NAME) - .parse_json() -) +bigquery_destination_secret: dict = GoogleGSMSecretManager( + project=AIRBYTE_INTERNAL_GCP_PROJECT, + credentials_json=ab.get_secret("GCP_GSM_CREDENTIALS"), +).get_secret(SECRET_NAME).parse_json()airbyte/_util/name_normalizers.py (1)
73-73
: LGTM! Consider adding more test cases?The change to use a raw string for the regex pattern is a good practice. While reviewing the examples in the docstring, I noticed they cover most cases but might miss some edge cases. Would you consider adding test cases for:
- Unicode characters (e.g., "héllo")
- Emojis (e.g., "hello👋")
- Multiple consecutive special characters (e.g., "hello!!!world")
wdyt?
airbyte/secrets/util.py (1)
66-66
: LGTM! Consider using TYPE_CHECKING for cleaner imports?The change to use string literal for type casting is good practice. Since we're already importing TYPE_CHECKING, we could potentially move the SecretManager import under the TYPE_CHECKING block to avoid circular imports. Something like:
if TYPE_CHECKING: from collections.abc import Iterable from airbyte.secrets.base import SecretManagerThis would allow us to use
SecretManager
directly in the cast. wdyt?airbyte/_executors/declarative.py (1)
57-57
: LGTM! Consider using TypeAlias for manifest type?The change to use string literal for type casting is good. Since we're dealing with a specific manifest structure, would it make sense to define a type alias for better type safety? Something like:
from typing import TypeAlias ManifestDict: TypeAlias = dict[str, Any] # or more specific type hintsThis could make the code more maintainable and self-documenting. wdyt?
airbyte/shared/state_providers.py (1)
55-55
: LGTM! The additional null checks make the code more robust.The added checks for
state_msg
andstate_msg.stream
help prevent potential null pointer exceptions. Would you consider adding a docstring comment explaining these safety checks for future maintainers? wdyt?tests/unit_tests/test_anonymous_usage_stats.py (1)
58-67
: LGTM! The assertions are now more readable and thorough.The reformatted assertions with detailed error messages will make debugging failed tests easier. Would you consider adding a test case for partial/incomplete properties to ensure robustness? wdyt?
airbyte/_executors/base.py (1)
165-167
: LGTM! Consider clarifying the comment.The type casting change to string literal looks good. However, the comment "metadata is not None here" might be slightly misleading since
metadata
could be None butname
would be set (due to the earlier validation). What do you think about updating it to be more precise? WDYT?- ) # metadata is not None here + ) # Either name is set or metadata is valid hereairbyte/destinations/_translate_dest_to_cache.py (2)
146-164
: Consider simplifying the password handling logic.The nested conditionals and multiple paths for password retrieval could be simplified. What do you think about consolidating the logic? WDYT?
- snowflake_password: str | None = None - if ( - destination_configuration.credentials - and hasattr(destination_configuration.credentials, "password") - and isinstance(destination_configuration.credentials.password, str) - ): - destination_password = str(destination_configuration.credentials.password) - if "****" in destination_password: - try: - snowflake_password = get_secret(password_secret_name) - except ValueError as ex: - raise PyAirbyteSecretNotFoundError( - "Password is required for Snowflake cache, but it was not available." - ) from ex - else: - snowflake_password = get_secret(destination_password) - else: - snowflake_password = get_secret(password_secret_name) + try: + if ( + destination_configuration.credentials + and hasattr(destination_configuration.credentials, "password") + and isinstance(destination_configuration.credentials.password, str) + and "****" not in destination_configuration.credentials.password + ): + secret_name = str(destination_configuration.credentials.password) + else: + secret_name = password_secret_name + + snowflake_password = get_secret(secret_name) + except ValueError as ex: + raise PyAirbyteSecretNotFoundError( + f"Failed to retrieve Snowflake password from secret '{secret_name}'" + ) from ex
121-122
: Enhance error message for Postgres password.The error message could be more descriptive. How about including more context? WDYT?
- raise ValueError("Password is required for Postgres cache.") + raise ValueError( + "Password is required for Postgres cache. Please ensure it's provided in the destination configuration." + )airbyte/_processors/sql/bigquery.py (1)
102-102
: Consider enhancing the type conversion documentation?The comment "BigQuery uses STRING for all strings" is accurate but could be more detailed. Maybe we could add information about BigQuery's type system and why we're using a string literal here, wdyt?
- return cast("sqlalchemy.types.TypeEngine", "String") # BigQuery uses STRING for all strings + return cast("sqlalchemy.types.TypeEngine", "String") # BigQuery uses STRING type for all string/varchar variations, as it doesn't distinguish between different string lengthsairbyte/sources/registry.py (1)
24-26
: Consider reducing the blank lines after logger initialization?The logger initialization looks good, but maybe we could reduce the two blank lines to one to match PEP 8 style, wdyt?
logger = logging.getLogger("airbyte") - -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (39)
airbyte/__init__.py
(1 hunks)airbyte/_executors/base.py
(2 hunks)airbyte/_executors/declarative.py
(1 hunks)airbyte/_executors/util.py
(1 hunks)airbyte/_message_iterators.py
(5 hunks)airbyte/_processors/sql/bigquery.py
(1 hunks)airbyte/_util/api_util.py
(18 hunks)airbyte/_util/name_normalizers.py
(1 hunks)airbyte/_util/telemetry.py
(1 hunks)airbyte/_writers/jsonl.py
(1 hunks)airbyte/caches/__init__.py
(1 hunks)airbyte/cloud/__init__.py
(1 hunks)airbyte/cloud/connections.py
(8 hunks)airbyte/cloud/workspaces.py
(4 hunks)airbyte/datasets/_base.py
(2 hunks)airbyte/datasets/_sql.py
(4 hunks)airbyte/destinations/__init__.py
(1 hunks)airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)airbyte/destinations/base.py
(4 hunks)airbyte/progress.py
(1 hunks)airbyte/secrets/__init__.py
(1 hunks)airbyte/secrets/base.py
(3 hunks)airbyte/secrets/util.py
(1 hunks)airbyte/shared/sql_processor.py
(4 hunks)airbyte/shared/state_providers.py
(2 hunks)airbyte/sources/__init__.py
(1 hunks)airbyte/sources/registry.py
(3 hunks)airbyte/types.py
(1 hunks)examples/run_bigquery_destination.py
(1 hunks)examples/run_bigquery_faker.py
(1 hunks)examples/run_perf_test_reads.py
(1 hunks)pyproject.toml
(4 hunks)tests/docs_tests/test_docs_checked_in.py
(1 hunks)tests/integration_tests/test_all_cache_types.py
(4 hunks)tests/integration_tests/test_bigquery_cache.py
(1 hunks)tests/integration_tests/test_source_test_fixture.py
(28 hunks)tests/unit_tests/test_anonymous_usage_stats.py
(1 hunks)tests/unit_tests/test_exceptions.py
(1 hunks)tests/unit_tests/test_progress.py
(1 hunks)
✅ Files skipped from review due to trivial changes (10)
- airbyte/cloud/init.py
- airbyte/destinations/init.py
- airbyte/progress.py
- airbyte/_executors/util.py
- tests/unit_tests/test_progress.py
- examples/run_perf_test_reads.py
- airbyte/caches/init.py
- airbyte/_util/telemetry.py
- airbyte/secrets/init.py
- airbyte/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
- airbyte/sources/init.py
- airbyte/_message_iterators.py
- airbyte/destinations/base.py
- tests/integration_tests/test_all_cache_types.py
- pyproject.toml
- airbyte/cloud/connections.py
👮 Files not reviewed due to content moderation or server errors (3)
- airbyte/cloud/workspaces.py
- airbyte/_util/api_util.py
- airbyte/shared/sql_processor.py
🧰 Additional context used
📓 Path-based instructions (2)
examples/run_bigquery_destination.py (1)
Pattern examples/*
: These scripts are intended for demonstration purposes only. They are not meant to represent production code. For these scripts, you should prefer brevity over robust error handling and handling of edge cases. These are demos - which should be as simple as possible to handle the 'blessed' use cases.
examples/run_bigquery_faker.py (1)
Pattern examples/*
: These scripts are intended for demonstration purposes only. They are not meant to represent production code. For these scripts, you should prefer brevity over robust error handling and handling of edge cases. These are demos - which should be as simple as possible to handle the 'blessed' use cases.
🔇 Additional comments (14)
tests/integration_tests/test_source_test_fixture.py (3)
21-24
: LGTM! Clean import additions.
The new imports for text_util
and CacheBase
are well-organized and necessary for the changes.
419-419
: Nice improvement using text_util for cache name generation!
Good call replacing ulid
with text_util.generate_random_suffix()
. This makes the code more consistent with the rest of the codebase and uses our internal utilities effectively.
Also applies to: 439-439, 496-496
77-78
: Smart update to assert_data_matches_cache parameters!
Good improvement changing SqlProcessorBase
to CacheBase
and making streams
optional. This makes the function more flexible and reusable across different cache implementations.
tests/docs_tests/test_docs_checked_in.py (1)
25-27
: LGTM! Nice formatting improvement! 👍
The multi-line formatting makes the assertion message more readable while maintaining the same functionality.
airbyte/datasets/_base.py (2)
14-15
: Nice optimization with TYPE_CHECKING! 👍
Moving these imports under TYPE_CHECKING is a great practice for reducing runtime overhead. The code looks good!
42-42
: Let's verify the type casting change impact
The switch to string literal type casting ("Iterator[dict[str, Any]]"
) looks good and follows best practices for forward references. However, we should verify that this doesn't affect any downstream type checking. Would you like me to help check for any potential impacts?
✅ Verification successful
Type casting change looks safe and consistent with codebase patterns
The string literal type casting is perfectly aligned with how Iterator types are used throughout the codebase. I found that:
- The same
Iterator[dict[str, Any]]
type is consistently used across multiple files (_base.py
,_sql.py
,_lazy.py
) - The DataFrame construction pattern in
_base.py
is isolated and doesn't have any dependent type patterns elsewhere - No other DataFrame constructions in the codebase rely on this specific typing pattern
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any type-related imports or usage of Iterator in the codebase
# Search for Iterator type usage
rg "Iterator\[" --type python
# Search for pandas DataFrame construction with Iterator
rg "DataFrame\(" -A 2 --type python
Length of output: 131
Script:
#!/bin/bash
# Let's try with correct file type flags and search patterns
# Search for Iterator type usage
rg "Iterator\[" -t py
# Search for DataFrame construction patterns
rg "DataFrame\(" -t py -A 2
# Also check for typing imports
rg "^from typing import.*Iterator" -t py
Length of output: 3537
airbyte/shared/state_providers.py (1)
99-99
: LGTM! The enhanced flexibility in handling missing states is valuable.
The addition of None
as a valid option for not_found
parameter makes sense. Should we verify that all existing callers of this method are compatible with this change?
✅ Verification successful
The previous ast-grep pattern might have been too specific. Let's try with ripgrep to get a broader view of the usage patterns and understand the context better.
The change is safe and all existing callers are compatible! 🎉
Looking at the codebase usage:
- Most calls use the default behavior (no explicit
not_found
parameter) - One caller in
test_state_handling.py
already usesNone
- The
JoinedStateProvider
internally usesNone
when delegating to the secondary provider
The change is backward compatible since it only adds None
as an option while preserving the existing "raise"
default.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all calls to get_stream_state to verify compatibility
ast-grep --pattern 'get_stream_state($$$)'
Length of output: 44
Script:
#!/bin/bash
# Search for all calls to get_stream_state with ripgrep for better coverage
rg "get_stream_state\(" -A 2
# Also search for the class definition to understand the context
rg "class.*StateProvider" -A 5
Length of output: 3308
airbyte/types.py (1)
44-44
: LGTM! The string literal type hint is more precise.
The change to use "str"
instead of str
in cast follows type annotation best practices. Should we apply this consistently to other similar casts in the codebase for uniformity? wdyt?
✅ Verification successful
The previous ast-grep pattern might have been too specific. Let's try a broader search to find all cast usages and then analyze the type hints being used. This will help us verify if there's inconsistency in type casting patterns across the codebase.
The codebase already follows a consistent pattern with string literal type hints!
I noticed that all string type casts in the codebase are already using the "str"
string literal format, as seen in:
airbyte/_util/telemetry.py
:cast("str", result)
airbyte/cloud/connections.py
:cast("str", self._source_id)
airbyte/cloud/connections.py
:cast("str", self._destination_id)
The change you made aligns perfectly with the existing codebase conventions. No further updates needed! 😊
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other cast calls that might need similar updates
ast-grep --pattern 'cast(str, $$$)'
Length of output: 37
Script:
#!/bin/bash
# Search for cast calls with str using ripgrep for better pattern matching
rg "cast\(str," -A 2
# Also search for string literal type hints in cast
rg 'cast\("str",' -A 2
# Try a more flexible ast-grep pattern that might catch both variants
ast-grep --pattern 'cast($_, $_)'
Length of output: 3546
airbyte/destinations/_translate_dest_to_cache.py (1)
44-44
: Typo in function name biqquery_destination_to_cache
The function name contains a typo: "biqquery" should be "bigquery".
airbyte/datasets/_sql.py (1)
44-44
: LGTM! Type hint changes look good.
The changes to use string literals for type casting and the reordering of union types in the parameter hints are consistent with best practices. These changes will help prevent circular imports and improve type checking performance.
Also applies to: 89-89, 101-101, 150-150
airbyte/secrets/base.py (2)
127-134
: LGTM! Nice parameter naming improvement.
The parameter rename from _core_schema
to core_schema_
aligns better with Pydantic's conventions, and the unused parameter is properly marked.
229-229
: LGTM! Good use of string literal type annotation.
The change to use a string literal type annotation for SecretString
is a good practice to avoid potential forward reference issues.
airbyte/sources/registry.py (2)
283-283
: LGTM! Good type annotation update.
The change to ConnectorMetadata | None
follows modern Python type annotation conventions.
321-321
: LGTM! Helpful logging additions.
The new logging statements provide clear context about Docker detection and its impact on connector availability.
Also applies to: 325-325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)
15-25
: Consider enhancing test assertions, wdyt?The test looks good but could be more robust with additional assertions to verify:
- The destination name contains the random suffix
- The destination type matches the dummy destination type
airbyte/destinations/_translate_dest_to_cache.py (2)
63-65
: Enhance error message with destination type, wdyt?The error message could be more helpful by including the actual destination type that was attempted to be used.
- f"Missing 'destinationType' in keys {list(destination_configuration.keys())}." + f"Missing 'destinationType' in configuration. Available keys: {list(destination_configuration.keys())}. " + f"Supported types: {list(conversion_fn_map.keys())}"
146-164
: Consider simplifying password handling logicThe password retrieval logic has multiple branches and nested conditions. Consider extracting this into a separate helper function for better maintainability and testability.
+def _get_snowflake_password( + credentials: Any, + password_secret_name: str +) -> str: + """Extract Snowflake password from credentials or secrets.""" + if ( + credentials + and hasattr(credentials, "password") + and isinstance(credentials.password, str) + ): + password = str(credentials.password) + return get_secret(password_secret_name if "****" in password else password) + return get_secret(password_secret_name) + def snowflake_destination_to_cache( destination_configuration: DestinationSnowflake | dict[str, Any], password_secret_name: str = SNOWFLAKE_PASSWORD_SECRET_NAME, ) -> SnowflakeCache: if isinstance(destination_configuration, dict): destination_configuration = DestinationSnowflake(**destination_configuration) - - snowflake_password: str | None = None - if ( - destination_configuration.credentials - and hasattr(destination_configuration.credentials, "password") - and isinstance(destination_configuration.credentials.password, str) - ): - destination_password = str(destination_configuration.credentials.password) - if "****" in destination_password: - try: - snowflake_password = get_secret(password_secret_name) - except ValueError as ex: - raise PyAirbyteSecretNotFoundError( - "Password is required for Snowflake cache, but it was not available." - ) from ex - else: - snowflake_password = get_secret(destination_password) - else: - snowflake_password = get_secret(password_secret_name) + try: + snowflake_password = _get_snowflake_password( + destination_configuration.credentials, + password_secret_name + ) + except ValueError as ex: + raise PyAirbyteSecretNotFoundError( + "Password is required for Snowflake cache, but it was not available." + ) from extests/integration_tests/cloud/test_cloud_api_util.py (2)
44-46
: Enhance workspace list assertions, wdyt?Consider adding more specific assertions to verify:
- The provided workspace_id exists in the returned list
- The workspace properties (name, status) are as expected
assert result assert len(result) > 0 assert all(isinstance(workspace, WorkspaceResponse) for workspace in result) + assert any(w.workspace_id == workspace_id for w in result)
Line range hint
152-228
: Consider using a context manager for resource cleanupThe test creates multiple resources that need cleanup. To ensure cleanup happens even if the test fails, consider using a context manager or fixture.
Example approach:
@contextmanager def temporary_connection_resources( workspace_id: str, api_root: str, client_id: SecretString, client_secret: SecretString, motherduck_api_key: str, ) -> Generator[tuple[SourceResponse, DestinationResponse, ConnectionResponse], None, None]: """Create and cleanup source, destination and connection for testing.""" source = destination = connection = None try: # Create resources... yield source, destination, connection finally: # Cleanup resources in reverse order... for resource in filter(None, [connection, source, destination]): try: # Delete resource... pass except Exception: logger.exception(f"Failed to cleanup {resource}")airbyte/_util/api_util.py (2)
558-568
: Consider enhancing destination type handling.The current mapping approach works but could be more maintainable. Would it make sense to:
- Move the mapping to a module-level constant?
- Consider using a factory pattern or registry for better extensibility?
Wdyt?
175-181
: Consider reducing error handling duplication.The error handling pattern is repeated across list functions. Would it make sense to extract this into a reusable helper function? Something like:
def handle_list_response(response: Any, resource_type: str, workspace_id: str) -> None: if not status_ok(response.status_code): raise AirbyteError( context={ "workspace_id": workspace_id, "response": response, } )Wdyt?
Also applies to: 215-221, 253-259
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
airbyte/_util/api_util.py
(18 hunks)airbyte/caches/base.py
(4 hunks)airbyte/cloud/connections.py
(8 hunks)airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)tests/integration_tests/cloud/test_cloud_api_util.py
(7 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(5 hunks)tests/integration_tests/cloud/test_cloud_workspaces.py
(2 hunks)tests/integration_tests/test_bigquery_cache.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/integration_tests/test_bigquery_cache.py
- airbyte/cloud/connections.py
- tests/integration_tests/cloud/test_cloud_sql_reads.py
🔇 Additional comments (6)
tests/integration_tests/cloud/test_cloud_workspaces.py (1)
80-84
:
Fix incorrect assertion type
The assertion is checking for SourceResponse
instead of DestinationResponse
. This could mask issues with the returned data type.
- and all(isinstance(source, SourceResponse) for source in result)
+ and all(isinstance(destination, DestinationResponse) for destination in result)
Likely invalid or redundant comment.
airbyte/caches/base.py (4)
65-74
: LGTM! Clean architectural change.
The shift from deployment-specific attributes to a destination pairing mechanism improves the design by better separating concerns and making the relationship between caches and destinations more explicit.
240-244
: LGTM! Nice Pythonic implementation.
The __len__
method provides a clean way to get the number of streams using Python's built-in len() function.
245-252
: LGTM! Good defensive programming.
Making caches always truthy prevents subtle bugs where empty caches might be incorrectly treated as False in boolean contexts.
296-298
: Verify the impact of return type change.
The return type change from DatasetBase
to CachedDataset
is more specific, but could this be a breaking change for existing clients? Wdyt?
✅ Verification successful
Let me gather more information about the inheritance hierarchy and usage of CachedDataset.
The return type change is safe to proceed with
From the codebase analysis, I can see that CachedDataset
inherits from SQLDataset
, which in turn inherits from DatasetBase
. All client code already expects and checks for CachedDataset
specifically, as evidenced by the type hints and isinstance checks in the test files and actual usage. The change makes the return type more precise without breaking the Liskov Substitution Principle.
Key observations:
CachedDataset
is the expected concrete type inReadResult
,SyncResult
, and test assertions- No code was found that relies on the base
DatasetBase
type specifically - The inheritance chain (CachedDataset -> SQLDataset -> DatasetBase) preserves type compatibility
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of DatasetBase type hints or isinstance checks
rg -l "DatasetBase" | xargs -I {} rg -A 2 -B 2 "DatasetBase" {}
Length of output: 1181
Script:
#!/bin/bash
# Check if CachedDataset inherits from DatasetBase and its usage
ast-grep --pattern 'class CachedDataset($_) {
$$$
}'
# Search for CachedDataset usage
rg -A 2 -B 2 "CachedDataset"
Length of output: 14588
airbyte/_util/api_util.py (1)
52-63
: LGTM! Security improvement with client credentials.
The switch from API key to client credentials authentication is a good security practice, providing better access control and token management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
airbyte/cloud/connectors.py (1)
35-36
: Update the docstring to reflect both sources and destinationsCurrently, the docstring for
connector_url
mentions "source connector," but since this class represents both sources and destinations, would it make sense to change it to "Get the URL of the connector" for clarity? Wdyt?airbyte/cloud/workspaces.py (1)
36-43
: Consider changing type annotations ofclient_id
andclient_secret
tostr
In the
CloudWorkspace
class,client_id
andclient_secret
are annotated asSecretString
, and then they're wrapped again withSecretString
in the__post_init__
method. To avoid wrapping an already wrapped object, should we change the type annotations tostr
? This way, the constructor accepts strings and wraps them appropriately. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte/cloud/connectors.py
(1 hunks)airbyte/cloud/workspaces.py
(4 hunks)pyproject.toml
(4 hunks)tests/integration_tests/cloud/test_cloud_sql_reads.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
…features-from-experimental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (10)
tests/integration_tests/cloud/test_cloud_api_util.py (3)
17-47
: Consider enhancing workspace test assertions, wdyt?The new workspace tests look good, but we could make them more robust. Maybe add assertions for specific workspace properties beyond just the ID? For example:
- Workspace name
- Workspace status
- Organization ID
This would help catch any unexpected changes in the workspace structure.
Line range hint
153-227
: Should we improve test isolation for the connection test?The connection test is doing a lot of setup and teardown. What do you think about:
- Moving the source/destination creation to fixtures?
- Using
pytest.fixture(autouse=True)
for cleanup?This would make the test more focused and resilient to failures during setup/teardown.
Example:
@pytest.fixture def test_source(workspace_id, airbyte_cloud_api_root, airbyte_cloud_client_id, airbyte_cloud_client_secret): source = api_util.create_source(...) yield source api_util.delete_source(source.source_id, ...) @pytest.fixture def test_destination(workspace_id, airbyte_cloud_api_root, airbyte_cloud_client_id, airbyte_cloud_client_secret, motherduck_api_key): destination = api_util.create_destination(...) yield destination api_util.delete_destination(destination.destination_id, ...)
Line range hint
88-227
: Should we add error handling for cleanup operations?The delete operations in teardown could fail silently. Consider wrapping them in try-except blocks to:
- Log cleanup failures
- Continue with remaining cleanup even if one fails
- Maybe retry failed cleanups
This would help prevent orphaned test resources, wdyt?
Example:
def safe_delete_source(source_id, **kwargs): try: api_util.delete_source(source_id, **kwargs) except Exception as e: logger.error(f"Failed to delete source {source_id}: {e}") # Optionally implement retry logic hereairbyte/destinations/_translate_dest_to_cache.py (6)
2-2
: Consider enhancing the module docstring?The current docstring "Cloud destinations for Airbyte" could be more descriptive. Maybe something like "Translation utilities for converting Airbyte destination configurations to cache implementations"? WDYT?
52-62
: Consider simplifying the destination type extraction?The nested attribute checks could be simplified using a helper function. Something like this maybe? WDYT?
- destination_type = ( - destination_configuration.get("DESTINATION_TYPE") - or destination_configuration["destinationType"] - ) - if hasattr(destination_type, "value"): - destination_type = destination_type.value - elif hasattr(destination_type, "_value_"): - destination_type = destination_type._value_ - else: - destination_type = str(destination_type) + def get_value(obj): + if hasattr(obj, "value"): return obj.value + if hasattr(obj, "_value_"): return obj._value_ + return str(obj) + + destination_type = get_value( + destination_configuration.get("DESTINATION_TYPE") + or destination_configuration["destinationType"] + )
63-65
: Enhance the error message with available types?The error message could be more helpful by including the available destination type keys. Maybe something like this? WDYT?
- f"Missing 'destinationType' in keys {list(destination_configuration.keys())}." + f"Missing destination type. Expected 'DESTINATION_TYPE' or 'destinationType' in configuration. " + f"Available keys: {list(destination_configuration.keys())}."
153-171
: Consider simplifying Snowflake password handling logic?The nested conditions for password handling could be simplified using early returns. Something like this maybe? WDYT?
- snowflake_password: str | None = None - if ( - destination_configuration.credentials - and hasattr(destination_configuration.credentials, "password") - and isinstance(destination_configuration.credentials.password, str) - ): - destination_password = str(destination_configuration.credentials.password) - if "****" in destination_password: - try: - snowflake_password = get_secret(password_secret_name) - except ValueError as ex: - raise PyAirbyteSecretNotFoundError( - "Password is required for Snowflake cache, but it was not available." - ) from ex - else: - snowflake_password = get_secret(destination_password) - else: - snowflake_password = get_secret(password_secret_name) + if not (destination_configuration.credentials + and hasattr(destination_configuration.credentials, "password") + and isinstance(destination_configuration.credentials.password, str)): + return get_secret(password_secret_name) + + destination_password = str(destination_configuration.credentials.password) + if "****" in destination_password: + try: + return get_secret(password_secret_name) + except ValueError as ex: + raise PyAirbyteSecretNotFoundError( + "Password is required for Snowflake cache, but it was not available." + ) from ex + + return get_secret(destination_password)
127-127
: Consider adding port range validation?Maybe we could add validation to ensure the port number is within a valid range (0-65535)? WDYT?
- port: int = int(destination_configuration.port) if destination_configuration.port else 5432 + port: int = int(destination_configuration.port) if destination_configuration.port else 5432 + if not (0 <= port <= 65535): + raise ValueError(f"Invalid port number: {port}. Must be between 0 and 65535.")
173-173
: Consider validating Snowflake account format?The account extraction assumes a specific format with ".snowflakecomputing". Should we add validation to ensure the host format is correct? WDYT?
+ if ".snowflakecomputing" not in destination_configuration.host: + raise ValueError( + f"Invalid Snowflake host format: {destination_configuration.host}. " + "Expected format: <account>.snowflakecomputing.com" + )airbyte/destinations/_translate_cache_to_dest.py (1)
78-79
: Update the docstring to reflect the correct cache typeThe docstring mentions "DuckDB cache," but this function is for the "MotherDuck cache." Should we update it to accurately describe the function? WDYT?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte/destinations/_translate_cache_to_dest.py
(1 hunks)airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)tests/integration_tests/cloud/test_cloud_api_util.py
(7 hunks)
🔇 Additional comments (3)
tests/integration_tests/cloud/test_cloud_api_util.py (1)
68-84
: Great fix on the destinations test! 🎉
The test is now correctly using list_destinations()
instead of list_sources()
. The assertions are also properly typed with DestinationResponse
. Nice work on maintaining consistency with the other list operation tests!
airbyte/destinations/_translate_dest_to_cache.py (1)
1-180
: Overall implementation looks solid!
The code is well-structured with good error handling and type hints. The suggested improvements are mostly optional refinements to make the code even more robust and maintainable.
airbyte/destinations/_translate_cache_to_dest.py (1)
56-60
: Consider combining the exception messages into a single string
Currently, the ValueError
is raised with multiple strings, which may cause the error message to be displayed as a tuple. How about merging them into a single string to improve readability? WDYT?
/fix-pr
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
airbyte/destinations/_translate_cache_to_dest.py (2)
Line range hint
170-170
: Consider making account extraction more robust.The current account extraction assumes a specific URL format. Would it be helpful to add validation or error handling for cases where the URL format differs? WDYT?
- account=destination_configuration.host.split(".snowflakecomputing")[0], + account=destination_configuration.host.split(".snowflakecomputing.com")[0] if ".snowflakecomputing.com" in destination_configuration.host else destination_configuration.host,
128-128
: Consider making dataset location configurable.The dataset location is currently hardcoded to "US". Should we make this configurable through the destination configuration? WDYT?
airbyte/destinations/_translate_dest_to_cache.py (1)
48-59
: Consider simplifying destination type extraction.The current implementation has multiple checks for different attribute types. Could we simplify this using a helper function? WDYT?
+def _extract_destination_type(destination_type: Any) -> str: + """Extract string value from destination type.""" + if hasattr(destination_type, "value"): + return destination_type.value + if hasattr(destination_type, "_value_"): + return destination_type._value_ + return str(destination_type) + if isinstance(destination_configuration, dict): try: destination_type = ( destination_configuration.get("DESTINATION_TYPE") or destination_configuration["destinationType"] ) - if hasattr(destination_type, "value"): - destination_type = destination_type.value - elif hasattr(destination_type, "_value_"): - destination_type = destination_type._value_ - else: - destination_type = str(destination_type) + destination_type = _extract_destination_type(destination_type)airbyte/_util/api_util.py (1)
552-570
: Consider improving destination type mapping.The current implementation uses a hardcoded dictionary for destination type mapping. Would it be more maintainable to move this to a configuration or constant? WDYT?
+DESTINATION_TYPE_MAPPING = { + "snowflake": models.DestinationSnowflake, + "bigquery": models.DestinationBigquery, + "postgres": models.DestinationPostgres, + "duckdb": models.DestinationDuckdb, +} + if status_ok(response.status_code) and response.destination_response: raw_response: dict[str, Any] = json.loads(response.raw_response.text) raw_configuration: dict[str, Any] = raw_response["configuration"] destination_type = raw_response.get("destinationType") - destination_mapping = { - "snowflake": models.DestinationSnowflake, - "bigquery": models.DestinationBigquery, - "postgres": models.DestinationPostgres, - "duckdb": models.DestinationDuckdb, - } - if destination_type in destination_mapping: - response.destination_response.configuration = destination_mapping[destination_type]( + if destination_type in DESTINATION_TYPE_MAPPING: + response.destination_response.configuration = DESTINATION_TYPE_MAPPING[destination_type](
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte/_util/api_util.py
(18 hunks)airbyte/destinations/_translate_cache_to_dest.py
(1 hunks)airbyte/destinations/_translate_dest_to_cache.py
(1 hunks)
🔇 Additional comments (5)
airbyte/destinations/_translate_cache_to_dest.py (4)
36-61
: LGTM! Clean and maintainable implementation.
The mapping approach makes it easy to add new cache types, and the error message is informative.
64-71
: LGTM! Simple and correct implementation.
The function correctly maps the required fields from DuckDB cache to destination configuration.
74-82
: LGTM! Secure handling of sensitive data.
Good use of SecretString for the API key and consistent schema handling with DuckDB.
Line range hint 120-135
: LGTM! Thorough implementation with good defaults.
The function includes appropriate defaults for port and schema, with proper password validation.
airbyte/_util/api_util.py (1)
54-65
: LGTM! Secure implementation of client credentials authentication.
Good transition from API key to client credentials-based authentication.
Summary by CodeRabbit
Release Notes
New Features
CloudWorkspace
with new resource listing methods and improved error handling.Bug Fixes
Documentation
Tests
Chores