Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_urns_to_lowercase: inconsistency of usage in SQLAlchemySource.get_db & get_identifier methods #9857

Closed
sleeperdeep opened this issue Feb 14, 2024 · 2 comments · Fixed by #10773
Labels
bug Bug report stale

Comments

@sleeperdeep
Copy link
Contributor

Describe the bug
Link to slack thread.

To Reproduce
Using

def get_db_name(self, inspector: Inspector) -> str:
engine = inspector.engine
if engine and hasattr(engine, "url") and hasattr(engine.url, "database"):
if engine.url.database is None:
return ""
return str(engine.url.database).strip('"').lower()
else:
raise Exception("Unable to get database name from Sqlalchemy inspector")

can lead to unexpected behaviour of urn creating. Despite of convert_urns_to_lowercase = True or False - database name will be in lowercase anyway.

Additional context
Investigating of relations between configs & modules leaded to next additional questions:
1.

class SQLServerConfig(BasicSQLAlchemyConfig):

has attribute

convert_urns_to_lowercase: bool = Field(
default=False,
description="Enable to convert the SQL Server assets urns to lowercase",
)

but at the same moment LowerCaseDatasetUrnConfigMixin also have this attribute

class LowerCaseDatasetUrnConfigMixin(ConfigModel):
convert_urns_to_lowercase: bool = Field(
default=False,
description="Whether to convert dataset urns to lowercase.",
)

So, is this mistake or those attributes should perform different actions? If yes, why they have same name? If no, can we get rid off it from child class?

When I`m trying to find explicit using of this attributes in code, I see only next things:

a)

if self.config.convert_urns_to_lowercase

b)

c)

auto_lowercase_dataset_urns: Optional[MetadataWorkUnitProcessor] = None
if (
self.ctx.pipeline_config
and self.ctx.pipeline_config.source
and self.ctx.pipeline_config.source.config
and (
(
hasattr(
self.ctx.pipeline_config.source.config,
"convert_urns_to_lowercase",
)
and self.ctx.pipeline_config.source.config.convert_urns_to_lowercase
)
or (
hasattr(self.ctx.pipeline_config.source.config, "get")
and self.ctx.pipeline_config.source.config.get(
"convert_urns_to_lowercase"
)
)
)
):
auto_lowercase_dataset_urns = auto_lowercase_urns

I did not check it yet, but does it mean, that we can rid off all explicit using of convert_urns_to_lowercase attribute, because this parent class have method:

def auto_lowercase_urns(
stream: Iterable[MetadataWorkUnit],
) -> Iterable[MetadataWorkUnit]:
"""Lowercase all dataset urns"""
for wu in stream:
try:
old_urn = wu.get_urn()
lowercase_dataset_urns(wu.metadata)
wu.id = wu.id.replace(old_urn, wu.get_urn())
yield wu
except Exception as e:
logger.warning(f"Failed to lowercase urns for {wu}: {e}", exc_info=True)
yield wu

Or, if take a look at this from opposite point of view, I see only two explicit usage of convert_urns_to_lowercase attribute: in SQLServerSource and SnowflakeCommonMixin. Does it mean, that in all others sources this mechanizm (converting urns to lowercase) does not work?

@hsheth2 Can you, please, look at it one more time and after it I will create PR according to Slack thread. Thanks.

@sleeperdeep sleeperdeep added the bug Bug report label Feb 14, 2024
@sleeperdeep sleeperdeep changed the title convert_urns_to_lowercase: inconsistency of usage in SQLAlchemySource.get_db & cources get_identifier methods convert_urns_to_lowercase: inconsistency of usage in SQLAlchemySource.get_db & get_identifier methods Feb 14, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Mar 16, 2024
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report stale
Projects
None yet
1 participant