-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterator for DAC vectors #1366
Iterator for DAC vectors #1366
Conversation
|
||
/** | ||
* Normally `operator[]` should return a reference or const reference | ||
* to the array element. Given we only support arrays of basic types and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? We don't support vectors of strings? Flatbuffers does...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't support them yet through DDL, but I don't remember exactly. @chuan can confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to support strings. I don't remember any blocking technical reasons not to.
dac_vector_t<T_type>::dac_vector_t(const flatbuffers::Vector<T_type>* vector_ptr) | ||
: m_vector(vector_ptr) | ||
{ | ||
static_assert(std::is_arithmetic<T_type>::value, "dac_vector_t only supports basic types!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "basic type" is sufficiently descriptive. At least I would have assumed that it included strings. If you're going to use std::is_arithmetic
as the type bound then you should make the message directly reflect its semantics: dac_vector_t only supports integer and floating-point types!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps say "scalar types", like flatbuffers does?
template <typename T_type> | ||
typename dac_vector_const_iterator_t<T_type>::reference dac_vector_const_iterator_t<T_type>::operator*() const | ||
{ | ||
return *(m_iterator_data + m_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No bounds checking for m_index
? Surely that would be trivial to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with runtime penalty for bounds checking ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all indexed structures should be bounds-checked by default, with a possible opt-out if it can be demonstrated that performance requirements require bounds checking to be omitted. Our system introduces unavoidable overheads (e.g., from transaction logging and copy-on-write updates) that should dwarf any possible overhead from bounds checking. If you can demonstrate a microbenchmark that indicates otherwise, I would certainly reconsider.
template <typename T_type> | ||
typename dac_vector_const_iterator_t<T_type>::pointer dac_vector_const_iterator_t<T_type>::operator->() const | ||
{ | ||
return m_iterator_data + m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, why no bounds checking?
template <typename T_type> | ||
dac_vector_const_iterator_t<T_type>& dac_vector_const_iterator_t<T_type>::operator++() | ||
{ | ||
++m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be trivial to add bounds checking on m_index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or does it not really matter until a dereference? In that case we should add the bounds checking there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to iterate until we hit the iterator returned by end()
. From what I can tell, this will be an index equal to the size of the array.
typename dac_vector_const_iterator_t<T_type>::difference_type dac_vector_const_iterator_t<T_type>::operator-( | ||
const dac_vector_const_iterator_t<T_type>& rhs) const | ||
{ | ||
return m_index - rhs.m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overflow checking?
template <typename T_type> | ||
dac_vector_const_iterator_t<T_type>& dac_vector_const_iterator_t<T_type>::operator--() | ||
{ | ||
--m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No underflow checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. We need to be able to iterate until we hit the 'rendof the array (reverse iterator end). To that point we also need to add an
rend` method.
dac_vector_const_iterator_t<T_type>& dac_vector_const_iterator_t<T_type>::operator+=( | ||
dac_vector_const_iterator_t<T_type>::difference_type rhs) | ||
{ | ||
m_index += rhs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overflow checking?
dac_vector_const_iterator_t<T_type>& dac_vector_const_iterator_t<T_type>::operator-=( | ||
dac_vector_const_iterator_t<T_type>::difference_type rhs) | ||
{ | ||
m_index -= rhs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underflow checking?
|
||
// Forward declaring so `dac_vector_const_iterator_t` can use it. | ||
template <typename T_type> | ||
class dac_vector_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide if we want to rename the file to match the class name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this change independently of any other considerations.
const T_type* m_iterator_data; | ||
uint32_t m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it enough to just use a pointer for the iterator implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but I also think that it's worth adding boundary checks on the index, so I'd keep the index and add the vector size as well here. Just because the standard iterator interface allows some undefined behavior doesn't mean that we should not provide a safer implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my ignorance, We use m_index
as an offset to the base pointer. How do we know uint32_t
is always the right type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good starting implementation, but i agree with Tobin's comments: the iterator operations should throw out of bound exceptions for bad index parameters.
bool operator>=(const dac_vector_const_iterator_t<T_type>& rhs) const; | ||
|
||
private: | ||
const T_type* m_iterator_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is part of an "iterator" class, we could drop the "iterator" part and just call this m_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, from looking closer at the implementation, something like m_base_data
might be better. I initially thought like @chuan that this references the current element, rather than the base element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on discussions we had in today's db syncup. Bounds checking can be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice and clean code. None of my comments are blocking.
Please address outstanding comments (eg. boundary check).
{ | ||
protected: | ||
array_field_test() | ||
: db_catalog_test_base_t(std::string("addr_book.ddl")){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary to make std::string
explicit (I believe...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably, would, just so folks know its from std::
I don't know where someone is using namespace std
in the included headers (hopefully none of ours) but I'm a fan of calling out std::
explicitly in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not talking about std::
(which I like better too) I'm talking about the entire std::string
, we don't do that almost anywhere...
const int32_t q1_sales = 200; | ||
const int32_t q2_sales = 300; | ||
const int32_t q3_sales = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to keep these into separated consts? Seeing them I would expect they being used somewhere (besides filling the array) but they aren't. I would remove and if clang-tidy complains suppress the warning.
gaia_id_t id = customer_t::insert_row(customer_name, sales_by_quarter); | ||
txn.commit(); | ||
|
||
auto c = customer_t::get(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writing customer won't hurt.
const int32_t q3_sales = 300; | ||
|
||
auto_transaction_t txn; | ||
auto w = customer_writer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also write:
customer_writer w;
So that it's clear you are calling a class constructor.
// Pick up our template implementation. These still | ||
// need to be in the header so that template specializations | ||
// that are declared later will pick up the definitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to comment this, otherwise we should comment it everywhere.
const T_type* m_iterator_data; | ||
uint32_t m_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my ignorance, We use m_index
as an offset to the base pointer. How do we know uint32_t
is always the right type?
txn.commit(); | ||
|
||
std::vector<int32_t> sales; | ||
for (auto sale : customer.sales_by_quarter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add an auto& sale : customer.sales_by_quarter()
iteration test.
/** | ||
* Constructs an iterator that does not point to any data. | ||
*/ | ||
dac_vector_const_iterator_t(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just be = default
;
// | ||
|
||
template <typename T_type> | ||
dac_vector_const_iterator_t<T_type>::dac_vector_const_iterator_t() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per my comment in the header, you can remove this implementation and just use the default one provided by C++. I would then default initialize members for m_index
and m_iterator_data
@@ -0,0 +1,206 @@ | |||
///////////////////////////////////////////// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: we know a-priori every type we could support for this class since we only support primitive types. We can actually have a C++ file here and then instantiate the specializations we support as well as make it easier to change this code later.
const dac_vector_const_iterator_t<T_type>& rhs); | ||
|
||
/** | ||
* The base class of array fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who inherits from dac_vector_t
? Nobody as far I know so this is not a base class.
EXPECT_TRUE(std::equal(sales.begin(), sales.end(), customer.sales_by_quarter().data())); | ||
|
||
sales.clear(); | ||
for (auto sales_iter = customer.sales_by_quarter().begin(); sales_iter != customer.sales_by_quarter().end(); ++sales_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a "random access" iterator, we should add a test for iterating in reverse as well. This would involve adding rbegin
and rend
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I'm requesting the following changes:
- Since this is a random access iterator we should be able to support reverse iteration. To that end, add
rbegin
andrend
methods - [perhaps optional]. If we only support scalar types then we have a finite subset of template instantiations for the array. We could then provide a
.cpp
file instead on an.inc
file with explicit template specializations.
…iteration # Conflicts: # production/inc/gaia/direct_access/dac_vector.hpp
* @tparam T_type The type of `dac_vector_t` elements that the iterator traverses. | ||
*/ | ||
template <typename T_type> | ||
class dac_vector_const_reverse_iterator_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, do you need an entirely new class here? I thought you only had to add rend
and rbegin
methods and then honor the operator-
overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kind of modeled after std with forward and backward iterators as separate classes. It is possible to make a single iterator class with a flag to switch the behavior between forward an backward movement but I believe it would just overcomplicate the code. Another option is to create a base class and push common stuff there so these classes will be a little bit smaller
GAIAPLAT-1493
We want to use range-based for-loops on array fields (in procedural code), so I implemented a
const_iterator
for DAC vectors (the DAC API behind "array fields").DAC code before:
and after:
Out of scope
In this PR, I will not:
dac_vector_t
dac_vector_const_iterator_t
If we want those out-of-scope items, put new JIRA tasks on the backlog.