-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDL-19319 Make replication key as automatic inclusion #195
TDL-19319 Make replication key as automatic inclusion #195
Conversation
* Added versionTimestamp in contacts stream. * Updated bookmark test case. * Updated comments. * Updated replication method of contacts stream.
* Initial commit for request_timeout * Updated unit test case error * Removed path of unittest file * Updated config file * Updated readme file
tap_hubspot/__init__.py
Outdated
@@ -599,14 +599,23 @@ def sync_companies(STATE, ctx): | |||
def has_selected_custom_field(mdata): | |||
top_level_custom_props = [x for x in mdata if len(x) == 2 and 'property_' in x[1]] | |||
for prop in top_level_custom_props: | |||
if mdata.get(prop, {}).get('selected') == True: | |||
if (mdata.get(prop, {}).get('selected') == True) or (mdata.get(prop, {}).get('inclusion') == "automatic"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return True
if the custom field is automatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prijendev can you add this statement as the Code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment in the code.
tap_hubspot/__init__.py
Outdated
@@ -599,14 +599,23 @@ def sync_companies(STATE, ctx): | |||
def has_selected_custom_field(mdata): | |||
top_level_custom_props = [x for x in mdata if len(x) == 2 and 'property_' in x[1]] | |||
for prop in top_level_custom_props: | |||
if mdata.get(prop, {}).get('selected') == True: | |||
if (mdata.get(prop, {}).get('selected') == True) or (mdata.get(prop, {}).get('inclusion') == "automatic"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prijendev can you add this statement as the Code comment
@@ -105,7 +105,7 @@ def expected_metadata(self): # DOCS_BUG https://stitchdata.atlassian.net/browse | |||
"deals": { | |||
self.PRIMARY_KEYS: {"dealId"}, | |||
self.REPLICATION_METHOD: self.INCREMENTAL, | |||
self.REPLICATION_KEYS: {"hs_lastmodifieddate"}, | |||
self.REPLICATION_KEYS: {"property_hs_lastmodifieddate"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include the property_
prefix from the denesting method? None of the other replicaiton keys have this. And I think it will break every customer's connection that is replicating this stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, hs_lastmodifieddate
is the bookmark key for the deals
stream. hs_lastmodifieddate
is available in the properties field at the nested level.
For example, the response structure:
{
"dealId": 1,
"properties": {
"hs_lastmodifieddate": 1653590080602
}
}
As hs_lastmodifieddate
is not available at the 1st level, it can not be marked as automatic inclusion and we can not select that particular field even. Also, tap includes all nested fields of the properties field as custom fields in the schema by appending the prefix property_
along with each field.
For example, the schema of the deals stream looks like as below,
{
"dealId": {"type": ["null", "integer"]},
"properties": {"type": "object", "properties": {
"hs_lastmodifieddate": {},
"hs_likelihood_to_close": {}
}
},
"property_hs_lastmodifieddate": {},
"property_hs_likelihood_to_close": {}
}
That's why we are just marking property_hs_lastmodifieddate
as the inclusion of automatic. So, I am not sure in which case it will break the customer's connection because we have not removed or updated any fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this nested structure is true of the other streams as well, but companies
for example denests the key and does not prefix property_
onto the key. I think it will break downstream for customers because the value of the replication key is changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kspeer825 We have changed the bookmark key just for the deals
stream. This change will not affect any other streams.
If bookmark(last_modifieddate
) is already available for the deals stream, then tap will use only that value for the first time. (#196 )
After completing sync, the tap will write an updated bookmark in the new bookmark key(i.e. property_lastmodified_date
).
We haven't removed the existing bookmark as well, so in case we need to revert back, we can able to.
After sync, the bookmark will look like the below.
"bookmarks": {
"stream_id_1": {
"old_bookmark": "2022-04-08T00:00:00",
"current_bookmark": "2022-04-15T00:00:00"
}
}
We have incorporated this change to fix the customer's issue. So, @KrisPersonal shall we plan an alpha release first for the particular customer?
@@ -69,7 +69,7 @@ def test_run(self): | |||
expected_keys = self.expected_automatic_fields().get(stream) | |||
|
|||
# BUG_TDL-9939 https://jira.talendforge.org/browse/TDL-9939 Replication keys are not included as an automatic field for these streams | |||
if stream in {'companies', 'deals', 'subscription_changes', 'email_events'}: | |||
if stream in {'companies', 'subscription_changes', 'email_events'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this change for the other streams with replication keys that are missing automatic inclusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kspeer825 Yes, the replication key is not automatic for companies, subscription_changes, and email_events streams.
- The
companies
stream has the same issue(link) of the replication key as thedeals
stream. So, can we create a card to resolve this issue and work on that? subscription_changes
andemail_events
streams are retrieving records in the date window. These streams are saving thestart date of the last window
as a bookmark value instartTimestamp
field.
email_events
stream has acreated
field in the API response andsubscription_changes
has atimestamp
field in the response that we may use as a replication key for the respective stream. But, we are not sure whether filter parameters(Start timestamp
,End timestamp
) are following these keys or not as we can't find any doc stating the same, so we asked a question for the same in the community(link).
Possible solutions:- We can add a new
startTimestamp
field in the record explicitly and mark that field as an automatic to keep current behavior as is. - Update the bookmark key for
email_events
andsubscription_changes
ascreated
andtimestamp
respectively if we got a positive reply from the support.
- We can add a new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sounds good
- Are you able to verify the usage of these keys by exploratory testing? If you have a record before the StartTimestamp, another record after the EndTimestamp, and another record between those two I think you'd answer that question. I supposed you'd actually need two more records at either of those fence posts to see if the API treats either inclusively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per discussion with @cosimon , we concluded to make a change in the replication key just for the deals
stream and company
stream as this problem is for 2 streams only. We are not making any changes in subscription_changes
and email_events
as of now.
https://github.com/singer-io/tap-hubspot into TDL-19319-make-replication-key-as-automatic-inclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
* Clearing bookmark for existing connection. * Updated bookmark logic to utilize older bookmark key. * Updated logic to keep older bookmark key in the state. * Added unittest cases. * Updated unit test cases. * Resolved review comments.
* fixed pylint issues TDL-15447 * changing methods to functions since there is no-self-use * pylint issues * more pylint issues * added pylint disable comment for some scenarios * remove duplicate-code from config.yaml * pylint comment for test_bookmarks * remove duplicate code * removed wrong-import-order
* Qa/save test artifacts (singer-io#201) * add logging and save test logs as artifacts * name should be static method so it plays well with BaseCase Co-authored-by: kspeer <[email protected]> * skip fields missing from schema for contact_lists (singer-io#202) Co-authored-by: kspeer <[email protected]> * add limitExempt to all fields test workaround (singer-io#203) Co-authored-by: kspeer <[email protected]> * TDL-19319 Make replication key as automatic inclusion (singer-io#195) * TDL 19241 add version timestamp in contacts (singer-io#191) * Added versionTimestamp in contacts stream. * Updated bookmark test case. * Updated comments. * Updated replication method of contacts stream. * TDL 16686 Implement request timeout (singer-io#177) * Initial commit for request_timeout * Updated unit test case error * Removed path of unittest file * Updated config file * Updated readme file * Initial commit. * Updated interruptes sync offset test case * Resolved comments. * Replaced replication key of deals stream with property_hs_lastmodifieddate field. * Updated discover test case. * Updated ingterrupted_sync test case. * Utilize bookmark of existing connection (singer-io#196) * Clearing bookmark for existing connection. * Updated bookmark logic to utilize older bookmark key. * Updated logic to keep older bookmark key in the state. * Added unittest cases. * Updated unit test cases. * Resolved review comments. * Updated get_start to use old bookmark for companies stream method. * Tdl 15447 fix pylint issues (singer-io#204) * fixed pylint issues TDL-15447 * changing methods to functions since there is no-self-use * pylint issues * more pylint issues * added pylint disable comment for some scenarios * remove duplicate-code from config.yaml * pylint comment for test_bookmarks * remove duplicate code * removed wrong-import-order Co-authored-by: kethan1122 <[email protected]> --------- Co-authored-by: Kyle Speer <[email protected]> Co-authored-by: kspeer <[email protected]> Co-authored-by: Prijen Khokhani <[email protected]> Co-authored-by: kethan1122 <[email protected]>
Description of change
property_hs_lastmodifieddate
field.deals
stream in the tap toINCREMENTAL
.base.py
of tap-tester.automatic_field
andstart_date
tap tester test case.Manual QA steps
Risks
Rollback steps