Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows the usage of Hive Metastore #1533

Merged
merged 6 commits into from
Feb 23, 2016
Merged

Conversation

rizzatti
Copy link
Contributor

@rizzatti rizzatti commented Feb 2, 2016

The usage of Hive Metastore greatly speeds up table and partition lookup in luigi.
Although the MetastoreClient was already implemented, the code always looked for one of two alternatives of the hive command, which is slow.

This PR's settings are only active if

[hive]
release: metastore

is set

import hive_metastore.ttypes
partition_str = self.partition_spec(partition)
thrift_table = client.get_partition_by_name(database, table, partition_str)
except hive_metastore.ttypes.NoSuchObjectException as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this flake8 issue?
./luigi/contrib/hive.py:187:71: F841 local variable 'e' is assigned to but never used

(btw you can run tox -e flake8 locally to test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can fix that.
@erikbern This PR requires either the Hive python libs in the pythonpath or hive-thrift-py package installed. Do you have any idea on how to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just leave it as it is. Any dependency in luigi.contrib shouldn't be in requirements.txt. Optionally throw a warning if the right package isn't installed (and you should probably add a note that installing hive-thrift-py is highly recommended)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can do that.
But that's the reason Travis tests are failing as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see that? When I look at the Travis tests, all I see is the error above
./luigi/contrib/hive.py:187:71: F841 local variable 'e' is assigned to but never used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add test time dependencies in tox.ini:

https://github.com/spotify/luigi/blob/master/tox.ini

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. You're right. Previously there was an error, but my last commit had fixed it. I thought it still had something to do with the libraries, but if I had checked travis sooner, and seen it was this simple, I would have fixed it earlier.

erikbern added a commit that referenced this pull request Feb 23, 2016
Allows the usage of Hive Metastore
@erikbern erikbern merged commit 0f03cba into spotify:master Feb 23, 2016
@rizzatti
Copy link
Contributor Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants