Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up synchronized methods in OCF framework #8391

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

mandy-chessell
Copy link
Contributor

Background

The OCF framework has 3 standard methods - initialize(), start() and disconnect(). These methods are invoked in that order to define the active lifetime of a connector. The OIF extends this with the refresh() method that is called multiple times between start() and disconnect().

Many of the OMIS and OMES services that extend the OIF and GAF respectively support the ability for the connector to register a listener to receive particular types of events. These events are processed in different threads and may coincide with the invocation of start(), refresh() or disconnect().

The OCF also has a boolean flag isActive that was true between the start() and disconnect() calls; false otherwise. This is to allow a connector to check it is still active - particularly when processing events on different threads.

The OIF and GAF pass a context to their connectors between initialize() and start(). Initially, the OIF and GAF were clearing the context after disconnect(). As a result, the synchronized keyword was added to the start(), refresh() and disconnect() methods of the frameworks - and then implemented inconsistently throughout the connector implementations. This was to "protect" the setting of the context object. The frameworks no longer clear the context object on disconnect().

There are also synchronized methods in the OIF's RequestedCatalogTargetsManager. Recently, we have seen a deadlock in the docker deployment of the OpenLineageEventReceiverConnector. No deadlock occurs when running the same scenario outside of docker so there is a subtle timing difference.

This use of synchronized is too course-grained for many connectors, and unnecessary. There is no proof at this point that the deadlock is caused by the use synchronized on the framework methods, but it seems reasonable to clean up the frameworks, to eliminate this legacy from the investigation.

Description of the changes

This PR removes the synchronized key from the start(), refresh() and disconnect(). It create a new method to wrap the setting of isActive flag and this, along with the existing isActive() method are synchronized. Similarly, synchronized is removed from the RequestedCatalogTargetsManager methods and an inner class is now supporting the map used by RequestedCatalogTargetsManager to maintain the list of active catalog targets. The inner class's methods are synchronized.

The result of the changes is that thread locking is only active for 1-2 lines of code on order to protect the use of key variables in multi-threaded operation.

Individual connectors may still add synchronized to their methods. In fact the KafkaOpenMetadataTopicConnector used by OpenLineageEventReceiverConnector makes extensive use of the synchronized keyword. This will be the next point of investigation if this PR does not resolve the deadlock.

Related Issue(s)

The Google Tink methods used in the encrypted config document store have been deprecated for some time. This connector was written (and used as the default) at a time when certificates and passwords were a part of the configuration document. With the introduction of the secrets connector, these are now stored in protected areas outside of the configuration document so that an encrypted store is not necessary. This PR switches the default config store to the clear-text file implementation. In a later release, the encrypted file version of the connector will probably be removed.

Testing

Since the problem only occurs in docker, the only regression testing has occured on the main code at this point. We need to get the change into dockerhub to determine if the fix has resolved the deadlock.

Release Notes & Documentation

The change of the default config file store is being added to the release notes in a separate PR on egeria-docs.

Additional notes

None

@mandy-chessell mandy-chessell merged commit 85da3dc into odpi:main Sep 20, 2024
4 checks passed
@@ -32,15 +28,14 @@
*
* @param governanceContext specialist context for this type of governance action.
*/
public synchronized void setGovernanceContext(GovernanceActionContext governanceContext)
public void setGovernanceContext(GovernanceActionContext governanceContext)

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
GovernanceActionServiceConnector.setGovernanceContext
; it is advisable to add an Override annotation.
@@ -25,7 +25,7 @@
*
* @param governanceContext specialist context for this type of governance action.
*/
public synchronized void setGovernanceContext(GovernanceActionContext governanceContext)
public void setGovernanceContext(GovernanceActionContext governanceContext)

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
GovernanceActionServiceConnector.setGovernanceContext
; it is advisable to add an Override annotation.
@@ -33,15 +31,14 @@
*
* @param governanceContext specialist context for this type of governance action.
*/
public synchronized void setGovernanceContext(GovernanceActionContext governanceContext)
public void setGovernanceContext(GovernanceActionContext governanceContext)

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
GovernanceActionServiceConnector.setGovernanceContext
; it is advisable to add an Override annotation.
@@ -28,15 +26,14 @@
*
* @param governanceContext specialist context for this type of governance action.
*/
public synchronized void setGovernanceContext(GovernanceActionContext governanceContext)
public void setGovernanceContext(GovernanceActionContext governanceContext)

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
GovernanceActionServiceConnector.setGovernanceContext
; it is advisable to add an Override annotation.
@@ -36,7 +34,7 @@
*
* @param governanceContext specialist context for this type of governance action.
*/
public synchronized void setGovernanceContext(GovernanceActionContext governanceContext)
public void setGovernanceContext(GovernanceActionContext governanceContext)

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
GovernanceActionServiceConnector.setGovernanceContext
; it is advisable to add an Override annotation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant