Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): improve docker quickstart #7184

Merged
merged 18 commits into from
Feb 3, 2023

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Jan 31, 2023

This is mainly an improvement to telemetry, but contains a few other things too.

  • Reduces the number of containers in the "required" list, and moved most to the "check if present list"
  • Adds granular exception types for better reporting e.g. DockerNotRunningError, DockerComposeVersionError, DockerLowMemoryError
  • Include system arch in telemetry
  • Include certain quickstart args in telemetry
  • Include per-container statuses in quickstart failures
  • Refactors much of the quickstart code
  • Maintains the same error messages and CLI formatting as before

When quickstart fails, telemetry pushes a payload that looks like this:

{'arch': 'x86_64',
 'arg_arch': None,
 'arg_backup': False,
 'arg_build_locally': False,
 'arg_kafka_setup': False,
 'arg_pull_images': False,
 'arg_restore': False,
 'arg_restore_indices': False,
 'arg_standalone_consumers': False,
 'arg_stop': False,
 'arg_version': None,
 'container_broker': 'UNHEALTHY',
 'container_datahub-frontend-react': 'OK',
 'container_datahub-gms': 'OK',
 'container_elasticsearch': 'OK',
 'container_elasticsearch-setup': 'OK',
 'container_mysql': 'OK',
 'container_mysql-setup': 'OK',
 'container_schema-registry': 'OK',
 'container_zookeeper': 'OK',
 'datahub_version': 'unavailable (installed in develop mode)',
 'error': 'datahub.cli.docker_check.QuickstartError',
 'function': 'datahub.cli.docker_cli.quickstart',
 'os': 'Darwin',
 'python_version': '3.9.4',
 'server_id': 'n/a',
 'server_type': 'n/a',
 'server_version': 'n/a',
 'status': 'error'}

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jan 31, 2023
"mysql",
"mysql-setup",
"cassandra",
"cassandra-setup",
"neo4j",
"elasticsearch-setup",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- Why does this need to appear in both ENSURE_EXIT_SUCCESS and CONTAINERS_TO_CHECK_IF_PRESENT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does kafka-setup belong in here as well? (If needed in both places)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes adding kafka-setup

except docker.errors.DockerException as error:
yield None, error
raise DockerNotRunningError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice!

try:
client.ping()
except docker.errors.DockerException as error:
raise DockerNotRunningError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This is much better.

if error:
issues.append("Docker doesn't seem to be running. Did you start it?")
return issues
def run_quickstart_preflight_checks(client: docker.DockerClient) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the name of this method.


if preflight_only:
return issues
class ContainerStatus(enum.Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor code style --

Organization in this file confused me a bit... Is there a convention around where to put classes vs methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have any established convention for it right now

elif "Health" in container.attrs["State"]:
if container.attrs["State"]["Health"]["Status"] == "starting":
issues.append(f"{container.name} is still starting")
status = ContainerStatus.STARTING
elif container.attrs["State"]["Health"]["Status"] != "healthy":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: are there any other states we should be aware of? since this will count any state aside from starting and healthy as "unhealthy"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't seen any other ones

@@ -245,20 +238,16 @@ def _docker_compose_v2() -> Union[List[str], Literal[False]]:
# instead of as a plugin.
return ["docker-compose"]

click.secho(
raise DockerComposeVersionError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - so do these exceptions still appear as beautifully to the end user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - that's what the changes in entrypoints.py are for

@@ -273,8 +262,6 @@ def _attempt_stop(quickstart_compose_file: List[pathlib.Path]) -> None:
if compose_files_for_stopping:
# docker-compose stop
compose = _docker_compose_v2()
if not compose:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that with exceptions we can clean this up

issues = check_local_docker_containers()
if not issues:
status = check_docker_quickstart()
if status.is_ok():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niceeeeee

@@ -739,7 +736,7 @@ def quickstart(
check=True,
env=_docker_subprocess_env(),
)
click.secho("Finished building docker images!")
logger.info("Finished building docker images!")

# Start it up! (with retries)
max_wait_time = datetime.timedelta(minutes=6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we should bump this wait interval? Or make it configurable. I feel like sometimes maybe we just need to give it more time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i can bump it to 8

after some point it's just not going to start and any additional time is just adding to user frustration for no reason

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true true. I'm think in CI it might account for some of what we're seeing

# Unless the exception is a ValidationError or explicitly opts out of stack traces,
# we should show the stack trace.

if isinstance(exc, ValidationError) or isinstance(exc.__cause__, ValidationError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. So any exceptions that we know root cause for does not result in a full exception trace. This makes sense - and in fact is closely aligned with how we should think about source exception handling

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup exactly. we're a bit away from enabling this for other exception types, but adding the scaffolding here will help us in the future

@runtime_checkable
class ExceptionWithProps(Protocol):
def get_telemetry_props(self) -> Dict[str, Any]:
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean... pass?

Copy link
Collaborator Author

@hsheth2 hsheth2 Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's a python thing

... is convention over pass for typing-related stuff

# Don't print the full stack trace for simple config errors.
logger.error(exc)
logger.debug("Error: %s", exc, exc_info=exc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- good we can still get this


properties = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of changes. Especially nerve-racking is the wrapping of with_telemetry in that outer function (since it affects everywhere) Have you been able to verify that telemetry is coming out as expected?

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. I think this will vastly improve our ability to provide exceptional support to the Community!

@anshbansal anshbansal merged commit d2bcdc3 into datahub-project:master Feb 3, 2023
ericyomi pushed a commit to ericyomi/datahub that referenced this pull request Feb 8, 2023
oleg-ruban pushed a commit to RChygir/datahub that referenced this pull request Feb 28, 2023
@hsheth2 hsheth2 deleted the docker-quickstart branch February 14, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants