Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw reflection limits display columns to 100 #61

Closed
darian-heede opened this issue Nov 8, 2022 · 5 comments · Fixed by #90 or #97
Closed

Raw reflection limits display columns to 100 #61

darian-heede opened this issue Nov 8, 2022 · 5 comments · Fixed by #90 or #97
Assignees

Comments

@darian-heede
Copy link

darian-heede commented Nov 8, 2022

Description

When creating a raw reflection on a view that has more than 100 columns without specifying the display configuration, the resulting raw reflection only has the first 100 display columns.

The following model is used to create the reflection:

{{
  config(
    materialized = 'reflection',
    schema = 'test',
    reflection_type = 'raw'
  )
}}

-- depends_on: {{ ref('view_with_over_100_columns') }}

The dbt.log shows that only the first 100 columns are used in the display section:

  alter dataset "db"."test"."view_with_over_100_columns"
    create raw reflection "ref_raw_view_with_over_100_columns"
    using
      display ("<only the first 100 columns>")

Interestingly, the macro dremio__get_columns_in_relation that is used to get the columns uses a query that returns all available columns:

select column_name as column_name
    ,lower(data_type) as data_type
    ,character_maximum_length
    ,numeric_precision
    ,numeric_scale
from information_schema.columns
where ilike(table_schema, 'db.test')
and ilike(table_name, 'view_with_over_100_columns')
order by ordinal_position

Expectation

I would expect all available columns to be used as display columns when the display configuration is not specified in the model.

Environment

dbt-dremio version: 1.1.0
dbt-core version: 1.3.0
Python: 3.8.5
OS: Linux 18a2b7f4c743 6.0.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 03 Nov 2022 18:01:58 +0000 x86_64 GNU/Linux

@fabrice-etanchaud
Copy link

fabrice-etanchaud commented Nov 8, 2022

Hi @darian-heede , in the dremio jobs panel, could you please double check that the ALTER DATASET ... CREATE RAW REFLECTION ... contains only the 100 first columns ? There is no such thing as a display truncation in the code. Thank you !

@darian-heede
Copy link
Author

Hi @fabrice-etanchaud, thanks for your quick reply! I can confirm that the submitted SQL in the Jobs panel only contains the 100 first columns. I also checked the query executed right before getting all columns in the view for the default display, which looks fine and returns all columns:

/* {"app": "dbt", "dbt_version": "1.1.2", "profile_name": "dremio_test", "target_name": "dev", "node_id": "model.dremio_test.ref_raw_test_view"} */

    select column_name as column_name
        ,lower(data_type) as data_type
        ,character_maximum_length
        ,numeric_precision
        ,numeric_scale
    from information_schema.columns
    where ilike(table_schema, 'db.test')
    and ilike(table_name, 'test_view')
    order by ordinal_position

I also flew over the code and can't find any place where the truncation would happen. Can you replicate the issue?

@darian-heede
Copy link
Author

darian-heede commented Nov 9, 2022

Hi @fabrice-etanchaud, we found the cause for why this is happening. There is a fixed limit=100 for the population of the job results here:

def _populate_job_results(self):
        if self._job_results == None:
            self._job_results = job_results(
                self._parameters, self._job_id, offset=0, limit=100, ssl_verify=True
            )

Changing this limit directly changes the number of display columns. A quick fix is to increase the limit to a number which is required for our use case (max is 500). It would probably be better to implement some sort of paging for the results?

@jlarue26
Copy link
Contributor

Thanks @darian-heede for the investigation. Pagination is the way to go.

We will fix this in two phases:

  1. Up the limit to 500 (band-aid) in next release dbt-dremio 1.3.1 (2nd week of Dec)
  2. Implement pagination in dbt-dremio 1.3.2 (TBD)

@ArgusLi ArgusLi self-assigned this Nov 24, 2022
@ArgusLi ArgusLi linked a pull request Nov 24, 2022 that will close this issue
1 task
ArgusLi added a commit that referenced this issue Nov 24, 2022
### Summary

Update _populate_job_results pagination limit to be 500

### Description

Change __populate_job_results()'s call of job_results() to have a limit
of 500, which is Dremio's limit. Part 1 of the fix, with part 2 being
implementing pagination.

### Test Results

Ran all of the tests using software, with them all passing apart from a
known failure in tests/functional/adapter/grants/test_model_grants.py.

### Changelog

-   [x] Added a summary of what this PR accomplishes to CHANGELOG.md

### Related Issue

#61
@ArgusLi
Copy link
Contributor

ArgusLi commented Nov 24, 2022

Pagination is still yet to be completed.

@ArgusLi ArgusLi reopened this Nov 24, 2022
ArgusLi added a commit that referenced this issue Dec 12, 2022
### Summary

Implement job results pagination

### Description

- Add optional row_limit argument to _populate_job_results().
- Set row_limit default to 100.
- Create test_job_result.py unit test that tests pagination.

### Test Results
#### smoke_test
softwareUP --> PASS
softwarePAT --> PASS

### Changelog

-   [x] Added a summary of what this PR accomplishes to CHANGELOG.md

### Related Issue

Resolves #61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants