-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add stats and performance logging #360
Conversation
Warning Rate limit exceeded@aaronsteers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 48 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughWalkthroughThe pull request introduces several enhancements across multiple files, including the addition of methods for retrieving connector and writer runtime information, a new function for generating structured stream success messages, and the introduction of data classes for managing runtime information. It also simplifies telemetry handling and enhances logging capabilities for performance metrics. These changes collectively aim to improve the tracking of connector performance, configuration integrity, and overall logging functionalities within the Airbyte application. Changes
Possibly related PRs
Suggested labels
Wdyt? Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (3)
airbyte/_util/hashing.py (1)
14-35
: LGTM! Just a couple of suggestions to improve the function's readability and maintainability. WDYT?The function looks great! It correctly handles different types of input objects, recursively hashes nested dictionaries and lists, and uses a secure hash algorithm (SHA-256) to generate one-way hashes. The use of a seed to ensure a unique domain of hashes is also a good practice.
I have a couple of suggestions to improve the function's readability and maintainability:
- Consider adding type hints for the return value and the input parameter. This will make it easier for other developers to understand the expected input and output of the function.
- Consider adding a docstring that explains the purpose of the function and the input and output parameters. This will make it easier for other developers to understand what the function does and how to use it.
Here's an example of how you could apply these suggestions:
+from typing import Any + def one_way_hash( - obj: Mapping | list | object, + obj: Mapping[Any, Any] | list[Any] | object, /, -) -> str: +) -> str: + """ + Generate a one-way hash of the given object. + + Args: + obj: The object to hash. Can be a mapping, list, or any other object. + + Returns: + A one-way hash of the given object as a string. + """ ...airbyte/logs.py (1)
213-267
: Looks good! Just a minor suggestion.The function correctly sets up a structured logger for performance metrics using the
structlog
library. It handles the case when the logging root or log file path is not valid and configures the logger with appropriate processors and a file handler. The implementation looks solid.One minor suggestion: Consider adding a docstring to document the return type and behavior of the function, especially the case when it returns a no-op logger. Wdyt?
airbyte/_connector_base.py (1)
128-142
: LGTM! Just a minor suggestion.The
config_hash
property method looks good! It correctly handles the case when_config_dict
isNone
and potential exceptions during hashing.What do you think about adding a docstring to explain the purpose and behavior of this method? It could help with maintainability and readability.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- airbyte/_connector_base.py (2 hunks)
- airbyte/_util/hashing.py (1 hunks)
- airbyte/_util/telemetry.py (4 hunks)
- airbyte/logs.py (1 hunks)
- airbyte/progress.py (5 hunks)
Additional comments not posted (11)
airbyte/logs.py (1)
195-210
: LGTM!The function correctly handles the case when
AIRBYTE_LOGGING_ROOT
isNone
and attempts to create the logging directory. It also logs a warning if the directory creation fails and returnsNone
. The implementation looks good to me.airbyte/_util/telemetry.py (7)
48-50
: LGTM!The import statements have been updated correctly to reflect the removal of the
one_way_hash
function from this file and importing it from theairbyte._util.hashing
module instead.
202-202
: Looks good!The addition of the
config_hash
attribute to theSourceTelemetryInfo
dataclass is a valuable enhancement for tracking the configuration states of sources in the telemetry data.
216-216
: LGTM!Assigning the
config_hash
from thesource
object to theSourceTelemetryInfo
instance in thefrom_source
class method is the correct way to populate the newly added attribute.
227-227
: Looks good!Adding the
config_hash
attribute to theDestinationTelemetryInfo
dataclass enables tracking the configuration states of destinations in the telemetry data, which is a valuable enhancement.
235-246
: LGTM!The updates to the
from_destination
class method ofDestinationTelemetryInfo
handle the cases when thedestination
parameter isNone
or a string by returning aDestinationTelemetryInfo
instance with default values. This ensures consistency in the telemetry data.
248-254
: Looks good!The additional condition in the
from_destination
class method ofDestinationTelemetryInfo
handles the case when thedestination
parameter is an instance ofDestination
. It correctly populates theDestinationTelemetryInfo
instance with the relevant attributes, including theconfig_hash
, ensuring comprehensive telemetry data for destinations.
260-260
: LGTM!Populating the
version
attribute of theDestinationTelemetryInfo
instance with thereported_version
of the destination's executor is the correct approach. The explicit casting ofdestination.executor
to theExecutor
type adds type safety and improves code readability.airbyte/progress.py (3)
436-453
: LGTM!The
_job_info
property method is a nice addition that provides a structured way to retrieve job information. The use ofasdict
is a good choice for converting dataclass instances to dictionaries. The function also handles the case whenself._source
,self._cache
, orself._destination
isNone
by not including their information in the returned dictionary.
Line range hint
454-517
: Excellent refactoring!The changes to the
_log_read_metrics
method significantly improve the structure and readability of the logging functionality. The use of the_job_info
property simplifies the code and provides a more organized way to include job information in the logs. The switch to using aBoundLogger
for performance metrics is also a great choice as it enhances the clarity and organization of log entries.The function correctly handles the case when no records have been read or the file logger is not available by returning early. Overall, these changes make it easier to capture and analyze job-related telemetry data.
Line range hint
28-56
: LGTM!The changes to the imports in the
_get_status_message
method are necessary to support the modifications made to the_log_read_metrics
method. The use ofasdict
and the telemetry-related imports indicate that the status message generation now includes more detailed telemetry information.These changes are consistent with the overall goal of improving the logging and telemetry capabilities of the Airbyte application. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (5)
airbyte/_util/connector_info.py (1)
25-30
: LGTM, but consider adding a docstring to the class.The
ConnectorRuntimeInfo
class is correctly defined as a data class that inherits fromRuntimeInfoBase
. Thename
,executor_type
,version
, andconfig_hash
attributes are properly defined with the correct types and default values.Wdyt about adding a docstring to the class to provide a brief description of its purpose? This could help improve the code's readability and maintainability.
airbyte/_writers/base.py (1)
34-39
: Looks good, but consider implementingconfig_hash
.The
_get_writer_runtime_info
method is implemented correctly and follows the expected return type. However, I noticed that theconfig_hash
property currently returnsNone
.What do you think about implementing the
config_hash
property to return a meaningful hash of the writer's configuration? This would enable better tracking and logging of writer configurations. Let me know if you need any help with that!airbyte/_message_iterators.py (1)
36-51
: LGTM! Just a minor suggestion.The
_new_stream_success_message
function looks good! It correctly constructs anAirbyteMessage
object for a stream success message using the appropriate types and models. The docstring is also clear and concise.One minor suggestion: Consider adding a type hint for the return value to make it explicit that the function returns an
AirbyteMessage
object. Wdyt?def _new_stream_success_message(stream_name: str) -> AirbyteMessage:Also, did you consider including additional details in the success message, such as the number of records processed or the processing time? That could provide more insights into the stream processing. What are your thoughts?
airbyte/_util/telemetry.py (1)
Line range hint
1-309
: Overall, the changes look great! 🚀The changes enhance the telemetry capabilities by simplifying the data structures used for tracking sources and destinations while removing redundant hashing logic. This is a nice improvement to the telemetry code.
Just a minor suggestion: What do you think about adding a brief comment explaining the purpose of the
ConnectorRuntimeInfo
andWriterRuntimeInfo
classes? This could help future readers understand their role in the telemetry code. wdyt?airbyte/logs.py (1)
218-272
: Looks good! Just a minor suggestion.The function is well-implemented and correctly sets up a structured logger for performance metrics using the
structlog
library. The logger configuration is appropriate, and it handles the scenarios where the logging root or log file path is invalid.One minor suggestion: The directory creation logic at lines 249-257 seems to be a duplicate of the logic at lines 206-213 in the
get_global_stats_log_path
function. Sinceget_global_stats_log_path
is already called before this block, the directory should already exist at this point. Wdyt about removing this duplicate logic to simplify the code a bit?
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- airbyte/_connector_base.py (3 hunks)
- airbyte/_message_iterators.py (2 hunks)
- airbyte/_util/connector_info.py (1 hunks)
- airbyte/_util/telemetry.py (6 hunks)
- airbyte/_writers/base.py (2 hunks)
- airbyte/caches/base.py (1 hunks)
- airbyte/logs.py (3 hunks)
- airbyte/progress.py (8 hunks)
Additional comments not posted (20)
airbyte/_util/connector_info.py (2)
13-16
: LGTM!The
RuntimeInfoBase
class is correctly defined and theto_dict
method is properly implemented to convert the class attributes to a dictionary, excluding the attributes withNone
values.
19-22
: LGTM!The
WriterRuntimeInfo
class is correctly defined as a data class that inherits fromRuntimeInfoBase
. Thetype
andconfig_hash
attributes are properly defined with the correct types and default values.airbyte/_util/telemetry.py (10)
40-41
: LGTM!The imports are necessary for type hinting and casting.
48-51
: LGTM!The imports are necessary for using the
ConnectorRuntimeInfo
andWriterRuntimeInfo
classes in the telemetry code.
52-52
: LGTM!The import is necessary for using the
one_way_hash
function in the telemetry code.
201-203
: LGTM!The changes to the function signature are consistent with the simplification of telemetry information for sources and destinations. This looks good to me!
224-225
: LGTM!The change is consistent with the simplification of telemetry information for sources. This looks good to me!
226-228
: LGTM!The change is consistent with the simplification of telemetry information for destinations. This looks good to me!
229-230
: LGTM!The change is consistent with the addition of cache information to the telemetry payload. This looks good to me!
267-268
: LGTM!The change is consistent with the simplification of telemetry information for sources. This looks good to me!
287-288
: LGTM!The change is consistent with the simplification of telemetry information for sources. This looks good to me!
303-303
: LGTM!The change is consistent with the simplification of telemetry information for sources. This looks good to me!
airbyte/logs.py (1)
200-215
: LGTM!The function logic is correct, and the implementation is accurate. It handles the directory creation failure scenario appropriately by logging a warning and returning
None
.airbyte/_connector_base.py (2)
81-88
: LGTM!The
_get_connector_runtime_info
method looks good. It correctly constructs and returns aConnectorRuntimeInfo
object with the relevant metadata. Using theconfig_hash
property to get the hash of the current configuration is a nice touch for tracking configuration integrity.
138-152
: Looks good!The
config_hash
property method is implemented correctly. It computes the hash of the current configuration using theone_way_hash
function and handles edge cases appropriately by returningNone
when the config is not set or when an exception occurs during hashing. This ensures the method doesn't fail unexpectedly. Nice work!airbyte/progress.py (5)
385-408
: LGTM!The
_send_telemetry
method looks good. It consolidates the telemetry sending process and is used appropriately in the other methods. Accessing the non-public APIs to get the runtime info for participants is acceptable for an internal method.
433-448
: Looks good!The
_job_info
property is a nice addition. It constructs a structured dictionary containing job-related information, which is used to improve the organization of logged performance metrics in the_log_read_metrics
method. Accessing the non-public APIs to get the runtime info for participants is acceptable for an internal property.
Line range hint
450-512
: Looks great!The updates to the
_log_read_metrics
method look good. It now leverages the new_job_info
property to improve the organization of logged performance metrics, replacing the previous approach of separately logging each component of the job description. The method logs the performance metrics to both the file logger and the global stats logger, which is a nice touch.
Line range hint
526-542
: LGTM!The updates to the
log_success
method look good. It now leverages the new_send_telemetry
method to send telemetry data with theEventState.SUCCEEDED
state and the total number of records read.
Line range hint
544-559
: Looks good!The updates to the
log_failure
method look good. It now leverages the new_send_telemetry
method to send telemetry data with theEventState.FAILED
state, the total number of records read, and the exception that caused the failure.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation