Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto-Trino Does Not Work Superset Version 1.5.1 - Solution Found #20428

Closed
Thelin90 opened this issue Jun 17, 2022 · 3 comments
Closed

Presto-Trino Does Not Work Superset Version 1.5.1 - Solution Found #20428

Thelin90 opened this issue Jun 17, 2022 · 3 comments
Labels
#bug Bug report

Comments

@Thelin90
Copy link
Contributor

I am currently deploying the latest version of superset in my local k8s (helm). I have added trino package, but also tried sqlalchemy-trino . And I can see it gets installed, all fine.
Superset boots up and all look fine.

I have a presto - trino deployment running in local k8s (coordinator + hive metastore). For the presto-trino deployment I am running, I am not using any user/password, since I just want to verify the functionality.

When I connect to trino, the database connection is OK! Works fine no worries.

I can then extract metadata information around catalog and schema. But when I try to display data, I get this weird error logs from superset:

Traceback (most recent call last):
  File "/app/superset/sql_lab.py", line 275, in execute_sql_statement
    db_engine_spec.handle_cursor(cursor, query, session)
  File "/app/superset/db_engine_specs/presto.py", line 970, in handle_cursor
    polled = cursor.poll()
AttributeError: 'Cursor' object has no attribute 'poll'

How to reproduce the bug

  1. Run a presto-trino cluster with version >= 385 . Then install superset via docs: https://superset.apache.org/docs/installation/running-on-kubernetes/

  2. Use official superset helm with image tag, and modify to install sqlalchemy-trino, I have used following values.yaml:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default values for superset.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

# User ID directive. This user must have enough permissions to run the bootstrap script
# Runn containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure
runAsUser: 0

# Create custom service account for Superset. If create: true and name is not provided, superset.fullname will be used.
# serviceAccountName: superset
serviceAccount:
  create: false

# Install additional packages and do any other bootstrap configuration in this script
# For production clusters it's recommended to build own image with this step done in CI
bootstrapScript: |
  #!/bin/bash
  rm -rf /var/lib/apt/lists/* && \
  pip install \
    psycopg2-binary==2.9.1 \
    --upgrade trino \
    --upgrade pyhive \
    redis==3.5.3 && \
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
## The name of the secret which we will use to generate a superset_config.py file
## Note: this secret must have the key superset_config.py in it and can include other files as well
##
configFromSecret: '{{ template "superset.fullname" . }}-config'

## The name of the secret which we will use to populate env vars in deployed pods
## This can be useful for secret keys, etc.
##
envFromSecret: '{{ template "superset.fullname" . }}-env'
## This can be a list of template strings
envFromSecrets: []

## Extra environment variables that will be passed into pods
##
extraEnv: {}
  # Extend timeout to allow long running queries.
  # GUNICORN_TIMEOUT: 300


  # OAUTH_HOME_DOMAIN: ..
  # # If a whitelist is not set, any address that can use your OAuth2 endpoint will be able to login.
  # #   this includes any random Gmail address if your OAuth2 Web App is set to External.
  # OAUTH_WHITELIST_REGEX: ...

## Extra environment variables in RAW format that will be passed into pods
##
extraEnvRaw: []
  # Load DB password from other secret (e.g. for zalando operator)
  # - name: DB_PASS
  #   valueFrom:
  #     secretKeyRef:
  #       name: superset.superset-postgres.credentials.postgresql.acid.zalan.do
  #       key: password

## Extra environment variables to pass as secrets
##
extraSecretEnv: {}
  # MAPBOX_API_KEY: ...

  # # Google API Keys: https://console.cloud.google.com/apis/credentials
  # GOOGLE_KEY: ...
  # GOOGLE_SECRET: ...

extraConfigs: {}
  # import_datasources.yaml: |
  #     databases:
  #     - allow_file_upload: true
  #       allow_ctas: true
  #       allow_cvas: true
  #       database_name: example-db
  #       extra: "{\r\n    \"metadata_params\": {},\r\n    \"engine_params\": {},\r\n    \"\
  #         metadata_cache_timeout\": {},\r\n    \"schemas_allowed_for_file_upload\": []\r\n\
  #         }"
  #       sqlalchemy_uri: example://example-db.local
  #       tables: []

extraSecrets: {}

extraVolumes: []
 # - name: customConfig
 #   configMap:
 #     name: '{{ template "superset.fullname" . }}-custom-config'
 # - name: additionalSecret
 #   secret:
 #     secretName: my-secret
 #     defaultMode: 0600

extraVolumeMounts: []
 # - name: customConfig
 #   mountPath: /mnt/config
 #   readOnly: true
 # - name: additionalSecret:
 #   mountPath: /mnt/secret

# A dictionary of overrides to append at the end of superset_config.py - the name does not matter
# WARNING: the order is not guaranteed
configOverrides: {}
  #  extend_timeout: |
  #    # Extend timeout to allow long running queries.
  #    SUPERSET_WEBSERVER_TIMEOUT = ...
  # enable_oauth: |
  #   from flask_appbuilder.security.manager import (AUTH_DB, AUTH_OAUTH)
  #   AUTH_TYPE = AUTH_OAUTH

  #   OAUTH_PROVIDERS = [
  #       {
  #           "name": "google",
  #           "whitelist": [ os.getenv("OAUTH_WHITELIST_REGEX", "") ],
  #           "icon": "fa-google",
  #           "token_key": "access_token",
  #           "remote_app": {
  #               "client_id": os.environ.get("GOOGLE_KEY"),
  #               "client_secret": os.environ.get("GOOGLE_SECRET"),
  #               "api_base_url": "https://www.googleapis.com/oauth2/v2/",
  #               "client_kwargs": {"scope": "email profile"},
  #               "request_token_url": None,
  #               "access_token_url": "https://accounts.google.com/o/oauth2/token",
  #               "authorize_url": "https://accounts.google.com/o/oauth2/auth",
  #               "authorize_params": {"hd": os.getenv("OAUTH_HOME_DOMAIN", "")}
  #           }
  #       }
  #   ]
  #   # Map Authlib roles to superset roles
  #   AUTH_ROLE_ADMIN = 'Admin'
  #   AUTH_ROLE_PUBLIC = 'Public'
  #   # Will allow user self registration, allowing to create Flask users from Authorized User
  #   AUTH_USER_REGISTRATION = True
  #   # The default user self registration role
  #   AUTH_USER_REGISTRATION_ROLE = "Admin"
  # secret: |
  #   # Generate your own secret key for encryption. Use openssl rand -base64 42 to generate a good key
  #   SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
# Same as above but the values are files
configOverridesFiles: {}
  # extend_timeout: extend_timeout.py
  # enable_oauth: enable_oauth.py


configMountPath: "/app/pythonpath"

extraConfigMountPath: "/app/configs"

image:
  repository: apache/superset
  tag: latest
  pullPolicy: IfNotPresent

imagePullSecrets: []

initImage:
  repository: busybox
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 8088
  annotations: {}
    # cloud.google.com/load-balancer-type: "Internal"
  loadBalancerIP: null

ingress:
  enabled: false
  # ingressClassName: nginx
  annotations: {}
    # kubernetes.io/tls-acme: "true"
    ## Extend timeout to allow long running queries.
    # nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
    # nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    # nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
  path: /
  pathType: ImplementationSpecific
  hosts:
    - chart-example.local
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # The limits below will apply to all Superset components. To set individual resource limitations refer to the pod specific values below.
  # The pod specific values will overwrite anything that is set here.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

##
## Custom hostAliases for all superset pods
## https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/
hostAliases: []
# - hostnames:
#   - nodns.my.lan
#   ip: 18.27.36.45


##
## Superset node configuration
supersetNode:
  command:
    - "/bin/sh"
    - "-c"
    - ". {{ .Values.configMountPath }}/superset_bootstrap.sh; /usr/bin/run-server.sh"
  connections:
    # Change in case of bringing your own redis and then also set redis.enabled:false
    redis_host: '{{ template "superset.fullname" . }}-redis-headless'
    # redis_password: superset
    redis_port: "6379"
    # You need to change below configuration incase bringing own PostgresSQL instance and also set postgresql.enabled:false
    db_host: '{{ template "superset.fullname" . }}-postgresql'
    db_port: "5432"
    db_user: superset
    db_pass: superset
    db_name: superset
  env: {}
  forceReload: false # If true, forces deployment to reload on each upgrade
  initContainers:
    - name: wait-for-postgres
      image: "{{ .Values.initImage.repository }}:{{ .Values.initImage.tag }}"
      imagePullPolicy: "{{ .Values.initImage.pullPolicy }}"
      envFrom:
        - secretRef:
            name: '{{ tpl .Values.envFromSecret . }}'
      command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
  ## Annotations to be added to supersetNode deployment
  deploymentAnnotations: {}
  ## Annotations to be added to supersetNode pods
  podAnnotations: {}
  ## Labels to be added to supersetNode pods
  podLabels: {}
  # Resource settings for the supersetNode pods - these settings overwrite might existing values from the global resources object defined above.
  resources: {}
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi
##
## Superset worker configuration
supersetWorker:
  command:
    - "/bin/sh"
    - "-c"
    - ". {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker"
  forceReload: false # If true, forces deployment to reload on each upgrade
  initContainers:
    - name: wait-for-postgres
      image: "{{ .Values.initImage.repository }}:{{ .Values.initImage.tag }}"
      imagePullPolicy: "{{ .Values.initImage.pullPolicy }}"
      envFrom:
        - secretRef:
            name: '{{ tpl .Values.envFromSecret . }}'
      command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
  ## Annotations to be added to supersetWorker deployment
  deploymentAnnotations: {}
  ## Annotations to be added to supersetWorker pods
  podAnnotations: {}
  ## Labels to be added to supersetWorker pods
  podLabels: {}
  # Resource settings for the supersetWorker pods - these settings overwrite might existing values from the global resources object defined above.
  resources: {}
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi
##
## Superset beat configuration (to trigger scheduled jobs like reports)
supersetCeleryBeat:
  # This is only required if you intend to use alerts and reports
  enabled: false
  command:
    - "/bin/sh"
    - "-c"
    - ". {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule"
  forceReload: false # If true, forces deployment to reload on each upgrade
  initContainers:
    - name: wait-for-postgres
      image: "{{ .Values.initImage.repository }}:{{ .Values.initImage.tag }}"
      imagePullPolicy: "{{ .Values.initImage.pullPolicy }}"
      envFrom:
        - secretRef:
            name: '{{ tpl .Values.envFromSecret . }}'
      command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
  ## Annotations to be added to supersetCeleryBeat deployment
  deploymentAnnotations: {}
  ## Annotations to be added to supersetCeleryBeat pods
  podAnnotations: {}
  ## Labels to be added to supersetCeleryBeat pods
  podLabels: {}
  # Resource settings for the CeleryBeat pods - these settings overwrite might existing values from the global resources object defined above.
  resources: {}
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi
##
## Init job configuration
init:
  # Configure resources
  # Warning: fab command consumes a lot of ram and can
  # cause the process to be killed due to OOM if it exceeds limit
  # Make sure you are giving a strong password for the admin user creation( else make sure you are changing after setup)
  # Also change the admin email to your own custom email.
  resources: {}
    # limits:
    #   cpu:
    #   memory:
    # requests:
    #   cpu:
    #   memory:
  command:
    - "/bin/sh"
    - "-c"
    - ". {{ .Values.configMountPath }}/superset_bootstrap.sh; . {{ .Values.configMountPath }}/superset_init.sh"
  enabled: true
  loadExamples: false
  createAdmin: true
  adminUser:
    username: admin
    firstname: Superset
    lastname: Admin
    email: [email protected]
    password: admin
  initContainers:
    - name: wait-for-postgres
      image: "{{ .Values.initImage.repository }}:{{ .Values.initImage.tag }}"
      imagePullPolicy: "{{ .Values.initImage.pullPolicy }}"
      envFrom:
        - secretRef:
            name: '{{ tpl .Values.envFromSecret . }}'
      command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ]
  initscript: |-
    #!/bin/sh
    set -eu
    echo "Upgrading DB schema..."
    superset db upgrade
    echo "Initializing roles..."
    superset init
    {{ if .Values.init.createAdmin }}
    echo "Creating admin user..."
    superset fab create-admin \
                    --username {{ .Values.init.adminUser.username }} \
                    --firstname {{ .Values.init.adminUser.firstname }} \
                    --lastname {{ .Values.init.adminUser.lastname }} \
                    --email {{ .Values.init.adminUser.email }} \
                    --password {{ .Values.init.adminUser.password }} \
                    || true
    {{- end }}
    {{ if .Values.init.loadExamples }}
    echo "Loading examples..."
    superset load_examples
    {{- end }}
    if [ -f "{{ .Values.extraConfigMountPath }}/import_datasources.yaml" ]; then
      echo "Importing database connections.... "
      superset import_datasources -p {{ .Values.extraConfigMountPath }}/import_datasources.yaml
    fi
  ## Annotations to be added to init job pods
  podAnnotations: {}
##
## Configuration values for the postgresql dependency.
## ref: https://github.com/kubernetes/charts/blob/master/stable/postgresql/README.md
postgresql:
  ##
  ## Use the PostgreSQL chart dependency.
  ## Set to false if bringing your own PostgreSQL.
  enabled: true

  ## Authentication parameters
  auth:
    ## The name of an existing secret that contains the postgres password.
    existingSecret:
    ## PostgreSQL name for a custom user to create
    username: superset
    ## PostgreSQL password for the custom user to create. Ignored if `auth.existingSecret` with key `password` is provided
    password: superset
    ## PostgreSQL name for a custom database to create
    database: superset


  ## PostgreSQL Primary parameters
  primary:
    ##
    ## Persistent Volume Storage configuration.
    ## ref: https://kubernetes.io/docs/user-guide/persistent-volumes
    persistence:
      ##
      ## Enable PostgreSQL persistence using Persistent Volume Claims.
      enabled: true
      ##
      ## Persistant class
      # storageClass: classname
      ##
      ## Access modes:
      accessModes:
        - ReadWriteOnce
    ## PostgreSQL port
    service:
      ports:
        postgresql: "5432"

## Configuration values for the Redis dependency.
## ref: https://github.com/bitnami/charts/blob/master/bitnami/redis
## More documentation can be found here: https://artifacthub.io/packages/helm/bitnami/redis
redis:
  ##
  ## Use the redis chart dependency.
  ##
  ## If you are bringing your own redis, you can set the host in supersetNode.connections.redis_host
  ##
  ## Set to false if bringing your own redis.
  enabled: true
  ##
  ## Set architecture to standalone/replication
  architecture: standalone
  ##
  ## Auth configuration:
  ##
  auth:
    ## Enable password authentication
    enabled: false
    ## The name of an existing secret that contains the redis password.
    existingSecret: ""
    ## Name of the key containing the secret.
    existingSecretKey: ""
    ## Redis password
    password: superset
  ##
  ## Master configuration
  ##
  master:
    ##
    ## Image configuration
    # image:
      ##
      ## docker registry secret names (list)
      # pullSecrets: nil
    ##
    ## Configure persistance
    persistence:
      ##
      ## Use a PVC to persist data.
      enabled: false
      ##
      ## Persistant class
      # storageClass: classname
      ##
      ## Access mode:
      accessModes:
      - ReadWriteOnce

nodeSelector: {}

tolerations: []

affinity: {}

Expected results

Being able to query delta.io table via presto-trino deployment.

Actual results

No output.

Root Cause To Problem

There was a chance to the codebase via this commit:
b08e21e

Solution

After downgrading to 1.5.0, querying works, so the change from:

BaseEngineSpec -> PrestoEngineSpec is causing the issue.

Screenshots

When troubles:
image (3)
image (2)
image (1)

When downgrading to 1.5.0

image

Environment

  • browser type and version:
  • superset version: 1.5.1
  • python version: Python 3.8.12
@Thelin90 Thelin90 added the #bug Bug report label Jun 17, 2022
@quguangyu0321
Copy link

thank you, I also had the same problem in latest version superset. although I can read metadata of Trino, but I can not query it.

@Thelin90
Copy link
Contributor Author

@quguangyu0321 slack conversation here: https://apache-superset.slack.com/archives/C015WAZL0KH/p1655449901353689

I will be trying to raise a PR to resolve this now

@Thelin90
Copy link
Contributor Author

Closing issue now resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants