-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source Iterable: Add email validation to list_users stream method execution #8380
🎉 Source Iterable: Add email validation to list_users stream method execution #8380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unittests for this functionality, please.
airbyte-integrations/connectors/source-iterable/unit_tests/test_streams.py
Outdated
Show resolved
Hide resolved
does this fix mean that we now skip some records? can't we fix broken emails somehow? or use another way of retrieving this data? |
airbyte-integrations/connectors/source-iterable/source_iterable/api.py
Outdated
Show resolved
Hide resolved
Add unit test `test_events_slices_response`.
…le-connection-failing-with-error
I've found the solution on how to properly resolve the issue. My fault that I could not found it before. |
Update setup.py requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great find and unit testing. My only question is about schema changes:
is this schema change going to break existing user workflows? How to guarantee in the future we understand the schema coming from Iterable? It seems every event can have a different schema. I took the events from the account in question on the OC ticket and counted how often the fields appear in the events coming back from the API:
{'createdAt': 4, 'signupSource': 1, 'emailListIds': 2, 'itblInternal': 4, '_type': 4, 'messageTypeIds': 2, 'channelIds': 2, 'email': 4, 'profileUpdatedAt': 1, 'productRecommendationCount': 1, 'campaignId': 2, 'contentId': 1, 'messageId': 3, 'messageBusId': 1, 'templateId': 3, 'messageTypeId': 1, 'catalogCollectionCount': 1, 'catalogLookupCount': 1, 'channelId': 1, 'recipientState': 2, 'unsubSource': 1}
did you have a strategy in mind when changing the schema for how to handle this?
@@ -1,7 +1,97 @@ | |||
{ | |||
"properties": { | |||
"data": { | |||
"type": ["null", "object"] | |||
"lastName": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this schema described anywhere in the API? is it guaranteed that the schema will be the same for a customer's account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not described anywhere, that what I've got from the records output.
Previously we had this schema:
{
"properties": {
"data": {
"type": ["null", "object"]
}
},
"type": ["null", "object"]
}
which doesn't describe any fields at all.
The output records in previous and current endpoints are almost the same. Except the eventType
in prev endpoint output became _type
in current endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So perhaps the only solution on how to be sure about the schema is to put the old schema back.
@@ -1,7 +1,97 @@ | |||
{ | |||
"properties": { | |||
"data": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this field removed? is the schema between the two endpoints different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, previously we had output records in the following format:
{"type": "RECORD", "record": {"stream": "events", "data": {"data": {...}}, "emitted_at": 123}}
.
It actually doesn't have any sense, just an odd additional wrapper.
…le-connection-failing-with-error
/test connector=connectors/source-iterable
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htrueman resolving the schema is a blocking issue here, without describing the schema the output will not be useful for the user. My suggestion:
- for fields which we know exist for all events put them in the top level schema
- All other properties which are event dependent should go inside a dict called
data
oradditional_properties
inside the record
Update events stream unit tests. Update events stream schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @htrueman !
/publish connector=connectors/source-iterable
|
…xecution (airbytehq#8380) * Update iterable.md docs. * Update `Events` stream to use `export/userEvents` endpoint. Update `events` stream schema. Add `test_events_parse_response` unit test. * Update events parse_response data collection. Update events stream unit tests. Update events stream schema. * Bump docker version
What
Closes https://github.com/airbytehq/oncall/issues/43
The broken emails may appear in https://api.iterable.com/api/lists/getUsers response. That affects the
Events
stream read as we we use these emails to generate theEvents
stream urls so we get the 404 Error.How
Events
stream).ListUsers.parse_response()
method execution was added. So now we yield records with valid email only.Recommended reading order
api.py
test_streams.py
🚨 User Impact 🚨
None
Pre-merge Checklist
/test connector=connectors/<name>
command is passing./publish
command described here