Skip to content

Pyspark vulnerability in datahub-ingestion image

High
david-leifker published GHSA-2q7w-7r2r-572w Aug 14, 2023

Package

datahub-ingestion

Affected versions

non-slim

Patched versions

>=v0.10.1-slim

Description

Details

There are maven /java based vulnerabilities in the datahub images due to pyspark upstream dependencies.

These are vulnerability coming in many datahub components:
datahub-ingestion

These are resources path which is causing the vulnerability.
/usr/local/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.0.jar
/usr/local/lib/python3.10/site-packages/pyspark/jars/ivy-2.4.0.jar
/usr/local/lib/python3.10/site-packages/pyspark/jars/hadoop-common-3.2.0.jar
/usr/local/lib/python3.10/site-packages/pyspark/jars/hadoop-client-api-3.3.2.jar
/usr/local/lib/python3.10/site-packages/pyspark/jars/parquet-jackson-1.12.2.jar
/usr/local/lib/python3.10/site-packages/pyspark/jars/hadoop-client-runtime-3.3.2.jar
usr/local/lib/python3.10/site-packages/pyspark/jars/jackson-databind-2.13.3.jar

The pyspark site package has ivy jar package which is causing Critical and High security vulnerability

v2.4.0+
v2.5.0+

CVE
CVE-2022-37865
GHSA-8wm5-8h9c-47pc
CVE-2022-16612
GHSA-f8vc-wfc8-hxqh
hadoop-common v3.2.0 ,v3.3.2 {GHSA-rmpj-7c96-mrg8 , GHSA-8wm5-8h9c-47pc , GHSA-gx2c-fvhc-ph4j }
log4j v1.2.17 {GHSA-65fg-84f6-3jq3 , https://github.com/advisories/GHSA-2qrg-x229-3v8q}
snakeyaml v1.2.4 , 1.30 {https://github.com/advisories/GHSA-mjmj-j48q-9wg2}
json-smart v2.3 {https://github.com/advisories/GHSA-v528-7hrm-frqp}
common-text v1.9 v1.6 {https://github.com/advisories/GHSA-599f-7c49-w659}
nimbus-jose-jwt v4.4.1.1 {https://github.com/advisories/GHSA-f6vf-pq8c-69m4}
protobuf-java v2.5.0 {GHSA-jwvw-v7c5-m82h }
jackson-databind
mesos {https://github.com/advisories/GHSA-p2xq-vcm7-xjj6}
libthrift {GHSA-rj7p-rfgp-852x ,https://github.com/advisories/GHSA-g2fg-mr77-6vrm}
hadoop-hdfs-client v3.2.0.jar {https://github.com/advisories/GHSA-rj7p-rfgp-852x}

Resolution
Published a -slim version without the pyspark dependencies included.

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Local
Attack complexity
Low
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:H

CVE ID

No known CVE

Weaknesses

No CWEs

Credits