Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5065] Add colors to console log #5681

Merged
merged 1 commit into from
Jul 30, 2019

Conversation

turbaszek
Copy link
Member

@turbaszek turbaszek commented Jul 29, 2019

This commit adds custom formatter for Airflow console log.

Make sure you have checked all steps below.

Jira

Description

  • This PR adds custom formatter for Airflow console log:

61932490-d58ee880-af83-11e9-9232-cb383f920ca4

Tests

  • My PR adds the following unit tests

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@mik-laj
Copy link
Member

mik-laj commented Jul 29, 2019

How does it look in Web UI? Very useful feature. This has long been missing for me. Now logs are very unreadable.

@turbaszek
Copy link
Member Author

In UI you can spot some ASCII codes 😕

*** Reading local file: /airflow/logs/example_gcp_dataproc_create_cluster/create_cluster/2019-07-29T14:14:17.519593+00:00/1.log
[2019-07-29 14:16:35,052] {taskinstance.py:614} INFO - Dependencies all met for <TaskInstance: example_gcp_dataproc_create_cluster.create_cluster 2019-07-29T14:14:17.519593+00:00 [queued]>
[2019-07-29 14:16:35,061] {taskinstance.py:614} INFO - Dependencies all met for <TaskInstance: example_gcp_dataproc_create_cluster.create_cluster 2019-07-29T14:14:17.519593+00:00 [queued]>
[2019-07-29 14:16:35,061] {taskinstance.py:832} INFO - 
--------------------------------------------------------------------------------
[2019-07-29 14:16:35,061] {taskinstance.py:833} INFO - Starting attempt 1 of 1
[2019-07-29 14:16:35,061] {taskinstance.py:834} INFO - 
--------------------------------------------------------------------------------
[2019-07-29 14:16:35,079] {taskinstance.py:853} INFO - Executing <Task(DataprocClusterCreateOperator): create_cluster> on 2019-07-29T14:14:17.519593+00:00
[2019-07-29 14:16:35,079] {base_task_runner.py:127} INFO - Running: ['airflow', 'tasks', 'run', 'example_gcp_dataproc_create_cluster', 'create_cluster', '2019-07-29T14:14:17.519593+00:00', '--job_id', '4', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/example_gcp_dataproc_create_cluster.py', '--cfg_path', '/tmp/tmpjta0f2_e']
[2019-07-29 14:16:36,836] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster /workspace/airflow/models/dagbag.py:21: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
[2019-07-29 14:16:36,836] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster   import imp
[2019-07-29 14:16:37,008] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster [�[34m2019-07-29 14:16:37,008�[0m] {�[34msettings.py�[0m:173} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=78770
[2019-07-29 14:16:37,129] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster [�[34m2019-07-29 14:16:37,128�[0m] {�[34m__init__.py�[0m:51} INFO - Using executor �[1mLocalExecutor�[0m
[2019-07-29 14:16:37,743] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster [�[34m2019-07-29 14:16:37,741�[0m] {�[34mdagbag.py�[0m:86} INFO - Filling up the DagBag from �[1m/airflow/dags/example_gcp_dataproc_create_cluster.py�[0m
[2019-07-29 14:16:37,947] {base_task_runner.py:113} INFO - Job 4: Subtask create_cluster [�[34m2019-07-29 14:16:37,946�[0m] {�[34mcli.py�[0m:536} INFO - Running �[1m<TaskInstance: example_gcp_dataproc_create_cluster.create_cluster 2019-07-29T14:14:17.519593+00:00 [running]>�[0m on host �[1md40eb3bb69b0�[0m
[2019-07-29 14:16:37,966] {dataproc_operator.py:433} INFO - Creating cluster: cluster-tomasz36-build
[2019-07-29 14:16:37,967] {logging_mixin.py:98} INFO - [�[34m2019-07-29 14:16:37,967�[0m] {�[34mgcp_api_base_hook.py�[0m:98} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2019-07-29 14:16:40,973] {logging_mixin.py:98} INFO - [�[34m2019-07-29 14:16:40,973�[0m] {�[34m_metadata.py�[0m:86} INFO - Compute Engine Metadata server unavailable onattempt 1 of 3
[2019-07-29 14:16:43,943] {logging_mixin.py:98} INFO - [�[34m2019-07-29 14:16:43,943�[0m] {�[34m_metadata.py�[0m:86} INFO - Compute Engine Metadata server unavailable onattempt 2 of 3
[2019-07-29 14:16:43,944] {logging_mixin.py:98} INFO - [�[34m2019-07-29 14:16:43,944�[0m] {�[34m_metadata.py�[0m:86} INFO - Compute Engine Metadata server unavailable onattempt 3 of 3
[2019-07-29 14:16:43,945] {taskinstance.py:1045} ERROR - Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Traceback (most recent call last):
  File "/workspace/airflow/models/taskinstance.py", line 920, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/workspace/airflow/contrib/operators/dataproc_operator.py", line 66, in execute
    self.hook.wait(self.start())
  File "/workspace/airflow/contrib/operators/dataproc_operator.py", line 437, in start
    self.hook.get_conn().projects().regions().clusters().create(  # pylint: disable=no-member
  File "/workspace/airflow/contrib/hooks/gcp_dataproc_hook.py", line 442, in get_conn
    http_authorized = self._authorize()
  File "/workspace/airflow/contrib/hooks/gcp_api_base_hook.py", line 145, in _authorize
    credentials = self._get_credentials()
  File "/workspace/airflow/contrib/hooks/gcp_api_base_hook.py", line 100, in _get_credentials
    credentials, _ = google.auth.default(scopes=scopes)
  File "/root/.virtualenvs/airflow36/lib/python3.6/site-packages/google/auth/_default.py", line 317, in default
    raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)

potiuk
potiuk previously approved these changes Jul 29, 2019
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot - having coloured error logs in console might be really helpful in detecting errors. I just wonder if it can have some @ashb @BasPH @feluelle @kaxil -> what do you think - is there any reason console log should not be coloured for compatibility ? I think any automation of the logs uses Files not reading from stdout so we should be rather safe ?

@potiuk
Copy link
Member

potiuk commented Jul 29, 2019

In UI you can spot some ASCII codes 😕

We can check if stdout has terminal and only use colours if it has @nuclearpinguin :

https://stackoverflow.com/questions/1077113/how-do-i-detect-whether-sys-stdout-is-attached-to-terminal-or-not

@potiuk potiuk dismissed their stale review July 29, 2019 14:28

Ascii codes.

@BasPH
Copy link
Contributor

BasPH commented Jul 29, 2019

I like it a lot - having coloured error logs in console might be really helpful in detecting errors. I just wonder if it can have some @ashb @BasPH @feluelle @kaxil -> what do you think - is there any reason console log should not be coloured for compatibility ? I think any automation of the logs uses Files not reading from stdout so we should be rather safe ?

@potiuk I imagine the logs to be read by humans and thus coloured logs are a big plus.

Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the coloured logs! I've made some line-by-line comments.

One more general question I have is if you investigated out-of-the-box log colouring packages, instead of implementing & maintaining our own logic?

airflow/config_templates/airflow_local_settings.py Outdated Show resolved Hide resolved
airflow/config_templates/airflow_local_settings.py Outdated Show resolved Hide resolved
airflow/utils/log/colored_log.py Outdated Show resolved Hide resolved
airflow/utils/log/colored_log.py Outdated Show resolved Hide resolved
airflow/utils/log/colored_log.py Outdated Show resolved Hide resolved
@turbaszek
Copy link
Member Author

Thanks, I like the coloured logs! I've made some line-by-line comments.

One more general question I have is if you investigated out-of-the-box log colouring packages, instead of implementing & maintaining our own logic?

I think the logic behind adding colors (color_symbol + txt + reset_symbol) is so simple we can have our own. Additionally it's one dependency less.

Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Like
+ it
! a
# lot

@ashb
Copy link
Member

ashb commented Jul 29, 2019

I'll look at this tomorrow. Like the idea, but I wonder if we can do something like git does (where you have %Cgeeen or similar to give control to users and if there is a module that already does this.

@potiuk
Copy link
Member

potiuk commented Jul 29, 2019

I made a quick check - and I think indeed we could use for example this: https://github.com/borntyping/python-colorlog instead of our custom Formatter. It will then be more configurable for the user. Possibly we should consider using it instead. I wonder if this has any disadvantages @nuclearpinguin ?

@mik-laj
Copy link
Member

mik-laj commented Jul 29, 2019

@potiuk Does this library write arguments using a different text style? I think not. One request from my side was that the arguments should be written using a different text style.

@turbaszek
Copy link
Member Author

Some of my thoughts:

  • allowing users to add colours - I think it's not the best idea because we can end up with totally inconsistent usage like green errors and red info.
  • I think the colouring should be as simple as it can be. We can even implement our own logic instead of using termcolors.

@potiuk
Copy link
Member

potiuk commented Jul 29, 2019

@mik-laj @nuclearpinguin -> I have no idea if it supports it, that's a good question. But it would be worthwhile to check if this (or other library) already provides similar or the same features.
I think it's good to make a deliberate decision on whether we cannot use some existing library.

I think it's a good question just raised - whether customisable colours in logging (configured via string as Ash mentioned) is more important that our own "opinionated" way of colouring.

And maybe we do not have to choose - if one of the ready-to-use libraries already provides enough flexibility. Just a search away and I also found another library for that, which seems super-customisable. https://coloredlogs.readthedocs.io/en/latest/api.html#examples-of-customization.

I am not saying one approach is better than the other. There is a value in both approaches, but it would be great to know that we considered alternatives (especially that they are easily available), and why we chose this one.

@ashb
Copy link
Member

ashb commented Jul 30, 2019

coloredlogs doesn't look like it gives the flexability we want (the different colours for each of the parts is really nice) but https://github.com/borntyping/python-colorlog (pip install colorlog) is close:

from colorlog import ColoredFormatter
import logging

logging.basicConfig(level=logging.INFO)

log = logging.getLogger(__name__)

log_format = '[%(blue)s%(asctime)s%(reset)s] {%(blue)s%(filename)s%(reset)s:%(lineno)d} %(log_color)s%(levelname)s%(reset)s - %(message_log_color)s%(message)s'
formatter = ColoredFormatter(
    log_format,
    log_colors={
        'WARNING': 'yellow',
        'ERROR': 'red',
        'CRITICAL': 'red,bold',
    },
    secondary_log_colors={
        'message': {
            'INFO': 'bold',
        }
    },
)

logging.root.handlers[0].setFormatter(formatter)

log.info("Usiung connection to: %s", {'id': 'proxy_postgres_tcp'})
log.warning("Hi")
log.error("Hi")

try:
    raise RuntimeError("x")
except Exception:
    log.critical("err")

Looks like:

Screenshot 2019-07-30 at 11 56 19

And it automatically detects when in a TTY. But it may not be written in a way that lets us get the colours we want.

@turbaszek
Copy link
Member Author

Both coloredlogs and colorlog has no out of the box option for formatting message arguments and exception traceback(?), that's why I decided to make a custom formatter. Additionally my implementation works only on LogRecord thus not modifying message format as in case of colorlog.

To use colorlog it's enough to add this in DEFAULT_LOGGING_CONFIG:

        'airflow': {
            'class': 'colorlog.ColoredFormatter',
            'format': "%(blue)s[%(asctime)s] {%(filename)s:%(lineno)d}%(reset)s %(log_color)s%(levelname)s - %(message)s%(reset)s"
        }

Personally I really like the idea of changing color / boldness / background of attributes in log messages. But this could be easily added aby extending colorlog.ColoredFormatter.

@ashb
Copy link
Member

ashb commented Jul 30, 2019

Not changing the LogRecord does seem like a good idea.

@turbaszek turbaszek force-pushed the colored-log branch 2 times, most recently from 1222cc7 to 68ccde1 Compare July 30, 2019 13:15
@turbaszek
Copy link
Member Author

Finally I decided to extend colorlog.TTYColoredFormatter by adding colors to message's arguments and error traceback. The code is simpler and allows to define color format like this:

'[%(blue)s%(asctime)s%(reset)s] {%(blue)s%(filename)s%(reset)s:%(lineno)d} %(log_color)s%(levelname)s%(reset)s - %(message_log_color)s%(message)s'

Now two dependencies are required: colorlog and termcolor.

@ashb
Copy link
Member

ashb commented Jul 30, 2019

colorlog and termcolor.

Checked and those are both MIT so that's cool.

@ashb
Copy link
Member

ashb commented Jul 30, 2019

Confirmed task logs look okay now, but I noticed that we have one file being written with ascii escapes in: ~/airflow/logs/scheduler/latest/example-dag.py.log

I think we can hack around that by adding os.environ['AIRFLOW__CORE__COLORED_CONSOLE_LOG'] = 'False' in to airflow/utils/dag_processing.py around line 565 (find the reload_module)

@ashb
Copy link
Member

ashb commented Jul 30, 2019

I pushed a fixup commit with this change directly to your branch, ack but that now conflicts. I'll fix that too.

This commit adds custom formatter for Airflow console log.
@turbaszek
Copy link
Member Author

I pushed a fixup commit with this change directly to your branch, ack but that now conflicts. I'll fix that too.

Thanks!

@ashb
Copy link
Member

ashb commented Jul 30, 2019

Boom, all green, merging!

@ashb ashb merged commit f76d9da into apache:master Jul 30, 2019
ashb pushed a commit to ashb/airflow that referenced this pull request Jul 30, 2019
This commit adds custom formatter for Airflow console log to display colours
when connected to a TTY

(cherry picked from commit 2d7b1a0)
ashb pushed a commit that referenced this pull request Jul 30, 2019
This commit adds custom formatter for Airflow console log to display colours
when connected to a TTY

(cherry picked from commit 2d7b1a0)
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
This commit adds custom formatter for Airflow console log to display colours
when connected to a TTY

(cherry picked from commit 2d7b1a0)
@yuqian90
Copy link
Contributor

Since 1.10.4, some ANSCI escape characters are showing up in the Web UI like this:

[2019-08-13 17:40:52,693] {logging_mixin.py:95} INFO - [�[34m2019-08-13 17:40:52,686�[0m] {�[34mlog.py:�[0m1051} INFO�[0m - Hello

Any idea how to suppress these?

@turbaszek
Copy link
Member Author

@yuqian90 check this fix:
#6046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants