-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch Dataset Versions #272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will pyarrow.dataset.dataset be able to read this at all? or we need lance.dataset for this to work with versioning?
cpp/src/lance/format/manifest.cc
Outdated
@@ -85,4 +85,6 @@ void Manifest::AppendFragments(const std::vector<std::shared_ptr<DataFragment>>& | |||
fragments_.insert(std::end(fragments_), std::begin(fragments), std::end(fragments)); | |||
} | |||
|
|||
arrow::DatasetVersion Manifest::ToDatasetVersion() const { return arrow::DatasetVersion{version_}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not sure what's the right C++ convention but here's we're not turning a Manifest into a DatastVersion, instead we're getting it's version_ as a Datasetversion right? Isn't GetDatasetVersion more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. will change.
|
||
ARROW_ASSIGN_OR_RAISE(auto file_infos, impl_->fs->GetFileInfo(selector)); | ||
for (const auto& finfo : file_infos) { | ||
ARROW_ASSIGN_OR_RAISE(auto manifest, OpenManifest(impl_->fs, finfo.path())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need an explicit close or are these automatically cleaned up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this opens as arrow::RandomInputFile
which will be closed once it is out of scope.
std::make_shared<lance::format::DataFragment>(lance::format::DataFile(path, field_ids))); | ||
} | ||
return fragments; | ||
return paths | views::transform([&field_ids](auto& path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just confirming: these don't need any error handling because they're not actually opening the resources right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, there is no system call in this transform.
We need |
Provide
LanceDataset::versions()
andLanceDataset::latest_version()
methods to fetch available versions.