Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto template processor functions are not available in Trino #19400

Open
3 tasks done
aaronfeng opened this issue Mar 28, 2022 · 5 comments
Open
3 tasks done

Presto template processor functions are not available in Trino #19400

aaronfeng opened this issue Mar 28, 2022 · 5 comments
Labels
#bug Bug report

Comments

@aaronfeng
Copy link

aaronfeng commented Mar 28, 2022

Hi,

We are in the process of migrating from Presto to Trino and noticed template processor functions don't work with Trino. We have a lot of datasets using first_latest_partition function.

How to reproduce the bug

Below is working for presto driver, however, it doesn't work for Trino.

SELECT *
FROM foo 
WHERE ds='{{ presto.first_latest_partition('foo') }}'

Expected results

Template processor function should also work for Trino.

Actual results

When using Trino driver, an error is thrown because there are not template processor functions for Trino.

https://github.com/apache/superset/blob/master/superset/jinja_context.py#L553

Screenshots

Environment

  • superset version: 1.3.0

I looked at master code, it seems like this will also be an issue for 1.4.0

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@aaronfeng aaronfeng added the #bug Bug report label Mar 28, 2022
@pashkash
Copy link

pashkash commented Sep 7, 2022

@aaronfeng any luck with this?

@aaronfeng
Copy link
Author

@pashkash no, the workaround is to use

max(ds)

but I believe that will scan all partitions.

@rusackas
Copy link
Member

Maybe @bkyryliuk @nytai @villebro can confirm if this is an Issue we ought to keep open or close as inactive/deprioritized?

@rusackas
Copy link
Member

This is at risk of being closed as stale, but meanwhile, hopefully @dosu-bot can give us a little added insight.

Copy link

dosubot bot commented Jun 10, 2024

Hey @rusackas! I'm Dosu, a bot here to assist you with bugs, questions, and contributions while you're waiting for a human maintainer. Let's squash those bugs together! 🤖

To make the first_latest_partition template processor function work with Trino in Superset, you can use the TrinoTemplateProcessor class. This class ensures compatibility with Trino by inheriting from PrestoTemplateProcessor and setting up the context correctly for Trino.

Here's how you can set it up:

  1. Define the TrinoTemplateProcessor class:

    class TrinoTemplateProcessor(PrestoTemplateProcessor):
        engine = "trino"
    
        def process_template(self, sql: str, **kwargs: Any) -> str:
            template = self.env.from_string(sql)
            kwargs.update(self._context)
    
            # Backwards compatibility if migrating from Presto.
            context = validate_template_context(self.engine, kwargs)
            context["presto"] = context["trino"]
            return template.render(context)
  2. Update the DEFAULT_PROCESSORS dictionary in your Superset configuration:

    DEFAULT_PROCESSORS = {
        "presto": PrestoTemplateProcessor,
        "hive": HiveTemplateProcessor,
        "spark": SparkTemplateProcessor,
        "trino": TrinoTemplateProcessor,
    }
  3. Ensure that the TrinoEngineSpec class handles the latest_partition functionality. This is already implemented in the TrinoEngineSpec class:

    class TrinoEngineSpec(PrestoBaseEngineSpec):
        engine = "trino"
        engine_name = "Trino"
        allows_alias_to_source_column = False
    
        @classmethod
        def get_extra_table_metadata(
            cls,
            database: Database,
            table: Table,
        ) -> dict[str, Any]:
            metadata = {}
    
            if indexes := database.get_indexes(table):
                col_names, latest_parts = cls.latest_partition(
                    database,
                    table,
                    show_first=True,
                    indexes=indexes,
                )
    
                if not latest_parts:
                    latest_parts = tuple([None] * len(col_names))
    
                metadata["partitions"] = {
                    "cols": sorted(
                        list(
                            {
                                column_name
                                for index in indexes
                                if index.get("name") == "partition"
                                for column_name in index.get("column_names", [])
                            }
                        )
                    ),
                    "latest": dict(zip(col_names, latest_parts)),
                    "partitionQuery": cls._partition_query(
                        table=table,
                        indexes=indexes,
                        database=database,
                    ),
                }
    
            if database.has_view(Table(table.table, table.schema)):
                with database.get_inspector(
                    catalog=table.catalog,
                    schema=table.schema,
                ) as inspector:
                    metadata["view"] = inspector.get_view_definition(
                        table.table,
                        table.schema,
                    )
    
            return metadata
  4. Testing: Ensure that the template processor is tested for Trino. You can refer to the integration tests to see how the template processor is tested for different backends, including Trino:

    def test_template_trino(app_context: AppContext, mocker: MockerFixture) -> None:
        lp_mock = mocker.patch(
            "superset.jinja_context.TrinoTemplateProcessor.latest_partition"
        )
        lp_mock.return_value = "the_latest"
        database = mock.Mock()
        database.backend = "trino"
        template = "{{ trino.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"
    
        # Backwards compatibility if migrating from Presto.
        template = "{{ presto.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"

By following these steps, the first_latest_partition function should work with Trino in the same way it works with Presto [1][2][3].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report
Projects
None yet
Development

No branches or pull requests

3 participants