-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only load manifest once within the dataset and share Manifest amount the readers #155
Conversation
@@ -12,7 +12,7 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
#include "lance/arrow/reader.h" | |||
#include "lance/io/reader.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the public FileReader interface. We did not use it publicly before, and it only acts a proxy to lance::io::FileReader
.
So this PR needs to pass Schema / Manifest to FileReader, if we are still using lance::arrow::FileReader
it means that we need to public Schema / Manifest
as well. I feel that it is not necessary in the moment.
cpp/src/lance/format/schema.h
Outdated
@@ -232,6 +232,7 @@ class Field final { | |||
friend class FieldVisitor; | |||
friend class ToArrowVisitor; | |||
friend class WriteDictionaryVisitor; | |||
friend class LoadDictionaryVisitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between Read and Load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is loading the string values of the dictionary. I am fine to rename it. wdyt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just be consistent if it's doing the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -345,6 +346,7 @@ std::shared_ptr<Field> Field::Copy(bool include_children) const { | |||
new_field->logical_type_ = logical_type_; | |||
new_field->extension_name_ = extension_name_; | |||
new_field->encoding_ = encoding_; | |||
new_field->dictionary_ = dictionary_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for dictionary arrays? or is this some other dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the dictionary value array (i.e., strings), do not contain indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small nit in the comments
Additionally it only reads the dictionary columns once , to prevent repeated read dictionary on each open file.