Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data visualization script fails to generate certain graphs #4975

Closed
GGP1 opened this issue Feb 19, 2024 · 5 comments · Fixed by #5021
Closed

Data visualization script fails to generate certain graphs #4975

GGP1 opened this issue Feb 19, 2024 · 5 comments · Fixed by #5021
Assignees
Labels

Comments

@GGP1
Copy link
Member

GGP1 commented Feb 19, 2024

Description

During the research performed in https://github.com/wazuh/wazuh-jenkins/issues/4748, we found that the data visualization script used to generate the graphics from the data collected is not working when used on the log or statistics (except logcollectord) files.

Artifacts: artifacts.zip

These were the errors found:

Type error
Traceback (most recent call last):
  File "/usr/local/bin/data-visualizer", line 33, in <module>
    sys.exit(load_entry_point('wazuh-testing==4.8.0', 'console_scripts', 'data-visualizer')())
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/scripts/data_visualizations.py", line 33, in main
    dv.plot()
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 305, in plot
    self._plot_cluster_dataset()
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 286, in _plot_cluster_dataset
    self._plot_data(elements=list(self.dataframe['activity'].unique()), generic_label='Managers')
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 226, in _plot_data
    statistics=DataVisualizer._get_statistics(
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 141, in _get_statistics
    statistics += f"Mean: {round(pd.DataFrame.mean(df), 3)}\n"
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 11666, in mean
    result = super().mean(axis, skipna, numeric_only, **kwargs)
TypeError: super(type, obj): obj must be an instance or subtype of type
Key error
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 153, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 182, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'queued_msgs'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/bin/data-visualizer", line 33, in <module>
    sys.exit(load_entry_point('wazuh-testing==4.8.0', 'console_scripts', 'data-visualizer')())
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/scripts/data_visualizations.py", line 33, in main
    dv.plot()
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 299, in plot
    self._plot_remoted_dataset()
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 263, in _plot_remoted_dataset
    self._plot_data(elements=columns, title=title, generic_label=element)
  File "/usr/local/lib/python3.9/site-packages/wazuh_testing-4.8.0-py3.9.egg/wazuh_testing/tools/performance/visualization.py", line 243, in _plot_data
    self._basic_plot(ax, self.dataframe[element], label=element, color=color)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 4090, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3809, in get_loc
    raise KeyError(key) from err
KeyError: 'queued_msgs'

We should investigate the root cause of the failures, apply a fix and verify that all graphs are generated successfully.

@wazuhci wazuhci moved this to Blocked in Release 4.8.0 Feb 21, 2024
@wazuhci wazuhci moved this from Blocked to Backlog in Release 4.8.0 Feb 21, 2024
@GGP1 GGP1 self-assigned this Feb 23, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.8.0 Feb 23, 2024
@GGP1
Copy link
Member Author

GGP1 commented Feb 23, 2024

Update

I took some artifacts from a recent build and tested the data graphic generation like the pipeline does. Cluster, remoted and analysisd were failing but after applying some fixes they were generated successfully.

Statistic files used: artifacts.zip

Binaries 🟢

Artifacts: binaries.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s wazuh-analysisd.csv wazuh-apid_child_1.csv wazuh-apid_child_2.csv wazuh-apid_child_3.csv wazuh-apid.csv wazuh_clusterd_child_1.csv wazuh_clusterd_child_2.csv wazuh_clusterd.csv wazuh-logcollector.csv wazuh-remoted.csv -t binary -d /tmp/test -n binaries
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test/
binaries_CPU.svg              binaries_Disk_Write_Speed.svg  binaries_PSS.svg       binaries_SWAP.svg  binaries_Write_Ops.svg
binaries_Disk_Read_Speed.svg  binaries_Disk_Written.svg      binaries_Read_Ops.svg  binaries_USS.svg
binaries_Disk_Read.svg        binaries_FD.svg                binaries_RSS.svg       binaries_VMS.svg
Cluster (fixed) 🟢

Previously failing, after some changes I was able to generate them successfully.

Artifacts: cluster.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s agent-groups_send.csv agent-info_sync.csv integrity_check.csv integrity_sync.csv -t cluster -d /tmp/test -n cluster
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test
cluster_agent-groups_send.svg  cluster_agent-info_sync.svg  cluster_integrity_check.svg  cluster_integrity_sync.svg
Logcollector 🟢

Artifacts: logcollector.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s active-responses_log.csv -t logcollector -d /tmp/test -n active-responses_log
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s audit_log.csv -t logcollector -d /tmp/test -n audit_log
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s df.csv -t logcollector -d /tmp/test -n df
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s last.csv -t logcollector -d /tmp/test -n last
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s maillog.csv -t logcollector -d /tmp/test -n maillog
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s messages.csv -t logcollector -d /tmp/test -n messages
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s netstat.csv -t logcollector -d /tmp/test -n netstat
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s secure.csv -t logcollector -d /tmp/test -n secure
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test
active-responses_log_bytes.svg         df_bytes.svg           maillog_bytes.svg          netstat_bytes.svg
active-responses_log_events.svg        df_events.svg          maillog_events.svg         netstat_events.svg
active-responses_log_target_drops.svg  df_target_drops.svg    maillog_target_drops.svg   netstat_target_drops.svg
audit_log_bytes.svg                    last_bytes.svg         messages_bytes.svg         secure_bytes.svg
audit_log_events.svg                   last_events.svg        messages_events.svg        secure_events.svg
audit_log_target_drops.svg             last_target_drops.svg  messages_target_drops.svg  secure_target_drops.svg
Remoted (fixed) 🟢

Previously failing, after removing a header that was not in the .csv file it worked.

Artifacts: remoted.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s wazuh-remoted_stats.csv -t remote -d /tmp/test -n remoted
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test
remoted_events_info.svg  remoted_queue_size.svg  remoted_recv_bytes.svg  remoted_tcp_sessions.svg
Analysisd (fixed) 🟢

Previously failing, after removing some headers that were not in the .csv file it worked.

Artifacts: analysisd.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s wazuh-analysisd_stats.csv -t analysis -d /tmp/test -n analysisd
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test
analysisd_alerts_info.svg  analysisd_decoded_events.svg  analysisd_queue_usage.svg

@GGP1
Copy link
Member Author

GGP1 commented Feb 23, 2024

Update

During #4890, some changes were made to the statistics generation script which now gathers data from the API. The files generated from the API information are completely different to the ones used previously and there were no changes to the data visualization script, meaning that their graphics generation will fail.

I created the following issue to solve this: #5022

@GGP1 GGP1 linked a pull request Feb 23, 2024 that will close this issue
@wazuhci wazuhci moved this from In progress to Pending review in Release 4.8.0 Feb 23, 2024
@wazuhci wazuhci moved this from Pending review to In review in Release 4.8.0 Feb 28, 2024
@wazuhci wazuhci moved this from In review to Pending final review in Release 4.8.0 Feb 28, 2024
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.8.0 Feb 29, 2024
@wazuhci wazuhci moved this from In final review to On hold in Release 4.8.0 Feb 29, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 4.8.0 Feb 29, 2024
@GGP1
Copy link
Member Author

GGP1 commented Feb 29, 2024

Update

Fixed some linting errors and updated the changelog.

@wazuhci wazuhci moved this from In progress to In final review in Release 4.8.0 Feb 29, 2024
@wazuhci wazuhci moved this from In final review to Pending final review in Release 4.8.0 Feb 29, 2024
@wazuhci wazuhci moved this from Pending final review to In progress in Release 4.8.0 Feb 29, 2024
@GGP1
Copy link
Member Author

GGP1 commented Mar 1, 2024

Update

I launched a build with the changes in the QA package but the graphics are not being generated because there have been several changes in the name of the columns of every file inside the data folder.

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/471

I will leave the issue on hold until we decide how to proceed.

@wazuhci wazuhci moved this from In progress to On hold in Release 4.8.0 Mar 1, 2024
@GGP1
Copy link
Member Author

GGP1 commented Mar 1, 2024

Update

We decided to fix the graphics generation for the logcollector files only. A few changes were required to parse logcollectord statistics files correctly because the file column names changed since the last tests.

Statistics files used: artifacts.zip

Logcollectord (fixed) 🟢

Graphs: graphs.zip

(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s active-responses_log.csv -t logcollector -d /tmp/test -n active-responses_log
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s audit_log.csv -t logcollector -d /tmp/test -n audit_log
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s df.csv -t logcollector -d /tmp/test -n df
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s last.csv -t logcollector -d /tmp/test -n last
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s maillog.csv -t logcollector -d /tmp/test -n maillog
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s messages.csv -t logcollector -d /tmp/test -n messages
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s netstat.csv -t logcollector -d /tmp/test -n netstat
(venv) gasti@pop-os:~/work/wazuh-qa$ data-visualizer -s secure.csv -t logcollector -d /tmp/test -n secure
(venv) gasti@pop-os:~/work/wazuh-qa$ ls /tmp/test/
 active-responses_log_Bytes.svg          'audit_log_Target Drops.svg'   last_Events.svg             messages_Bytes.svg          'netstat_Target Drops.svg'
 active-responses_log_Events.svg          df_Bytes.svg                 'last_Target Drops.svg'      messages_Events.svg          secure_Bytes.svg
'active-responses_log_Target Drops.svg'   df_Events.svg                 maillog_Bytes.svg          'messages_Target Drops.svg'   secure_Events.svg
 audit_log_Bytes.svg                     'df_Target Drops.svg'          maillog_Events.svg          netstat_Bytes.svg           'secure_Target Drops.svg'
 audit_log_Events.svg                     last_Bytes.svg               'maillog_Target Drops.svg'   netstat_Events.svg

@wazuhci wazuhci moved this from On hold to In progress in Release 4.8.0 Mar 1, 2024
@wazuhci wazuhci moved this from In progress to Pending final review in Release 4.8.0 Mar 1, 2024
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.8.0 Mar 7, 2024
@wazuhci wazuhci moved this from In final review to Pending final review in Release 4.8.0 Mar 7, 2024
@wazuhci wazuhci moved this from Pending final review to Done in Release 4.8.0 Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants