Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Google Cloud Data Loss Prevention #9651

Merged
merged 1 commit into from
Aug 27, 2020

Conversation

OmairK
Copy link
Contributor

@OmairK OmairK commented Jul 4, 2020


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:docs provider:google Google (including GCP) related issues labels Jul 4, 2020
@OmairK OmairK changed the title Documentation for Google Cloud Data Loss Prevention (#8201) Documentation for Google Cloud Data Loss Prevention Jul 4, 2020
@potiuk
Copy link
Member

potiuk commented Jul 4, 2020

Still some checks to fix :(

@OmairK
Copy link
Contributor Author

OmairK commented Jul 4, 2020

Still some checks to fix :(

I fixed the static checks, one of the integration check is breaking.

Comment on lines 32 to 109
Templates
^^^^^^^^^

Templates can be used to create and persist
configuration information to use with the Cloud Data Loss Prevention.
There are two types of templates supported by Cloud DLP:

* `Inspection Template <https://cloud.google.com/dlp/docs/creating-templates-inspect>`__,
* `De-Identification Template <https://cloud.google.com/dlp/docs/creating-templates-deid>`__,

Here we will be using identification template for our example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information can be outdate in some future. So I would suggest to skip this section and let the user use the Google docs

@turbaszek
Copy link
Member

I have the impression that there are more DLP operators but they are not covered in the example DAG. I think we should add the documentation. I'm ok with not extending the DAG itself.

@OmairK
Copy link
Contributor Author

OmairK commented Jul 6, 2020

I have the impression that there are more DLP operators but they are not covered in the example DAG. I think we should add the documentation. I'm ok with not extending the DAG itself.

I have added documentation for all the DLP operators. Should I add other DAGs to extend the example?

@mik-laj
Copy link
Member

mik-laj commented Jul 6, 2020

@OmairK Example tasks for the rest operators will be helpful.

@potiuk
Copy link
Member

potiuk commented Jul 23, 2020

Hey @OmairK - I re-run the latest commit run and there are few errors but not a mypy one :).

  1. You will need to rebase to latest master, there were some changes in Dockerfile/requirements recently that will make your CI checks run much faster when you rebase.

  2. You have some trailing whitespace - that's why the static checks fail. But if you install pre-commit locally and run it with --all-files those should get fixed automatically.

  3. There are some doc errors - all of them have good description, except one that is "document missing in a TOC tree" - you need to add a link to the dlp index in the relevant place in one of the doc/*.rst files.

  4. In the "Backport packages" you have an import error from the example dag of dlp - and the reason is that task_id parameter is missing in the example_dlp.

I hope it's helpful.

J.

@OmairK OmairK force-pushed the master branch 3 times, most recently from b3b737d to 8c2fde5 Compare July 23, 2020 22:51
@OmairK
Copy link
Contributor Author

OmairK commented Jul 24, 2020

Hey @OmairK - I re-run the latest commit run and there are few errors but not a mypy one :).

Thanks alot @potiuk, but this wasn't the PR with mypy errors. I have mentioned you there, sorry for the trouble.
I have extended the example dag and fixed the issue with the docs (I forgot to build docs after the rebase). Is there anything I could do to make the documentation better?

Comment on lines 116 to 112
create_info_type = CloudDLPCreateStoredInfoTypeOperator(
project_id=GCP_PROJECT,
config=custom_info_types,
stored_info_type_id=custom_info_type_id,
dag=dag,
task_id="create_info_type",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task fails with:

  File "/airflow/airflow/providers/google/cloud/hooks/dlp.py", line 432, in create_stored_info_type
    metadata=metadata,
  File ".virtualenvs/airflow/lib/python3.6/site-packages/google/cloud/dlp_v2/gapic/dlp_service_client.py", line 2625, in create_stored_info_type
    parent=parent, config=config, stored_info_type_id=stored_info_type_id
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.privacy.dlp.v2.StoredInfoTypeConfig got list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to help with solving this but probably it would be best if the same person who develops the DAG run the DAG to verify if the example works. @mik-laj @kaxil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Comment on lines +162 to +152
create_trigger = CloudDLPCreateJobTriggerOperator(
project_id=GCP_PROJECT,
job_trigger=JOB_TRIGGER,
trigger_id=TRIGGER_ID,
dag=dag,
task_id="create_trigger",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task fails with this error:

  File "/airflow/airflow/providers/google/cloud/operators/dlp.py", line 433, in execute
    metadata=self.metadata,
  File "/airflow/airflow/providers/google/common/hooks/base_google.py", line 376, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/airflow/airflow/providers/google/cloud/hooks/dlp.py", line 375, in create_job_trigger
    metadata=metadata,
  File "/.virtualenvs/airflow/lib/python3.6/site-packages/google/cloud/dlp_v2/gapic/dlp_service_client.py", line 2557, in create_job_trigger
    request, retry=retry, timeout=timeout, metadata=metadata
  File "/.virtualenvs/airflow/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/.virtualenvs/airflow/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 `StorageConfig` must be set.

@turbaszek
Copy link
Member

@OmairK would you mind rebasing? I'm ok with merging this PR with not working system tests. We need them to debug the DLP operators that are reported to be not working properly.

@OmairK OmairK force-pushed the master branch 2 times, most recently from 627998f to 79627d9 Compare August 24, 2020 11:52
@mik-laj
Copy link
Member

mik-laj commented Aug 25, 2020

@OmairK Can you do a rebase and leave a comment so we get notifications? Without it, we don't know that you've found the time to respond to our request.

@OmairK
Copy link
Contributor Author

OmairK commented Aug 25, 2020

@OmairK Can you do a rebase and leave a comment so we get notifications? Without it, we don't know that you've found the time to respond to our request.

@mik-laj I did the rebase. I informed @turbaszek via slack, I thought against sending two notifications for the same thing.

@turbaszek
Copy link
Member

@OmairK will you be able to fix the statics checks?

@OmairK
Copy link
Contributor Author

OmairK commented Aug 26, 2020

@OmairK will you be able to fix the statics checks?

@turbaszek Done 😅

@potiuk potiuk merged commit 91ff31a into apache:master Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants