Skip to content

PyAthenaJDBC is a Python DB API 2.0 (PEP 249) compliant wrapper for Amazon Athena JDBC driver.

License

Notifications You must be signed in to change notification settings

dwa/PyAthenaJDBC

 
 

Repository files navigation

https://circleci.com/gh/laughingman7743/PyAthenaJDBC.svg?style=shield

PyAthenaJDBC

PyAthenaJDBC is a Python DB API 2.0 (PEP 249) compliant wrapper for Amazon Athena JDBC driver.

Requirements

  • Python
    • CPython 2.6, 2,7, 3,4, 3.5
  • Java
    • Java >= 8

Installation

$ pip install PyAthenaJDBC

Usage

Basic usage

from pyathenajdbc import connect

conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
               region_name='us-west-2')
try:
    with conn.cursor() as cursor:
        cursor.execute("""
        SELECT * FROM one_row
        """)
        print(cursor.description)
        print(cursor.fetchall())
finally:
    conn.close()

Cursor iteration

from pyathenajdbc import connect

conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
               region_name='us-west-2')
try:
    with conn.cursor() as cursor:
        cursor.execute("""
        SELECT * FROM many_rows LIMIT 10
        """)
        for row in cursor:
            print(row)
finally:
    conn.close()

Query with parameter

from pyathenajdbc import connect

conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
               region_name='us-west-2')
try:
    with conn.cursor() as cursor:
        cursor.execute("""
        SELECT col_int FROM one_row_complex where col_int = {0}
        """, 2147483647)
        print(cursor.fetchall())

        cursor.execute("""
        SELECT col_string FROM one_row_complex where col_string = {param}
        """, param='a string')
        print(cursor.fetchall())
finally:
    conn.close()

Minimal example for Pandas DataFrame

from pyathenajdbc import connect
import pandas as pd

conn = connect(access_key='YOUR_ACCESS_KEY_ID',
               secret_key='YOUR_SECRET_ACCESS_KEY',
               s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
               region_name='us-west-2',
               jvm_path='/path/to/jvm')  # optional, as used by JPype
df = pd.read_sql("SELECT * FROM many_rows LIMIT 10", conn)

As Pandas DataFrame

import contextlib
from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas

with contextlib.closing(
        connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/'
                region_name='us-west-2'))) as conn:
    with conn.cursor() as cursor:
        cursor.execute("""
        SELECT * FROM many_rows
        """)
        df = as_pandas(cursor)
print(df.describe())

Examples

Redash query runner example

See examples/redash/athena.py

Credential

Support AWS CLI credentials configuration.

Credential Files

~/.aws/credentials

[default]
aws_access_key_id=YOUR_ACCESS_KEY_ID
aws_secret_access_key=YOUR_SECRET_ACCESS_KEY

~/.aws/config

[default]
region=us-west-2
output=json

Environment variables

$ export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
$ export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
$ export AWS_DEFAULT_REGION=us-west-2

Additional environment variable:

$ export AWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/

Testing

Depends on the AWS CLI credentials and the following environment variables:

~/.aws/credentials

[default]
aws_access_key_id=YOUR_ACCESS_KEY_ID
aws_secret_access_key=YOUR_SECRET_ACCESS_KEY

Environment variables

$ export AWS_DEFAULT_REGION=us-west-2
$ export AWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/

Run test

$ pip install pytest awscli
$ scripts/upload_test_data.sh
$ py.test
$ scripts/delete_test_data.sh

Run test multiple Python versions

$ pip install tox awscli
$ scripts/upload_test_data.sh
$ pyenv local 2.6.9 2.7.12 3.4.5 3.5.2
$ tox
$ scripts/delete_test_data.sh

About

PyAthenaJDBC is a Python DB API 2.0 (PEP 249) compliant wrapper for Amazon Athena JDBC driver.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%