Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Support limit, offset, top-k serial and parallel versions in codegen #1435

Merged
merged 17 commits into from
Aug 6, 2018

Conversation

pmenon
Copy link
Member

@pmenon pmenon commented Jun 27, 2018

After #1385 was merged in, I resurrected LIMIT, OFFSET from my old branch, and implemented Top-K optimizations. These both work for serial and parallel execution, and are needed for TPC-C.

Not quite ready for merging in, but folks can take a look at it.

Copy link
Contributor

@tcm-marcel tcm-marcel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but I haven't tried it out yet. But I do have some questions: (this one and some in the code)

  • It seems like limit and offset are compile-time constants. That means, the compiled plan cannot be reused for other limit/offset values. Is that correct?

@@ -417,7 +417,7 @@ void Aggregation::DoNullCheck(
default: { break; }
}
}
agg_is_null.ElseBlock("Agg.IfAggIsNotNull");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to remove the naming? I found all these names very helpful when looking at IR. I know that all the string handling introduces some overhead, maybe we could drop then in Release mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most students will probably use printfs to trace through the code. We still allow naming of the 'if' statements and the IR tracks predecessors which conveys the same information.

HANDLE_EXPLICIT_CALL_INST(peloton_sorter_sort,
peloton::codegen::util::Sorter::Sort)
HANDLE_EXPLICIT_CALL_INST(peloton_sorter_sortparallel,
peloton::codegen::util::Sorter::SortParallel)
HANDLE_EXPLICIT_CALL_INST(peloton_sorter_sorttopkparallel,
peloton::codegen::util::Sorter::SortTopKParallel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added more explicit call instructions for the bytecode interpreter. FYI: Once the interpreter is connected, it will print a debug message for every function call where no explicit call instruction exists, so it will be easy to find those.

}

// Generate loading code and save in cache
val = load_func();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the caching happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loaded values are inserted into the cached_vars_ member variable that is probed earlier.

codegen->CreateStore(next_limit, limit_count_ptr);
} else {
// Parallel mode. Atomically increment the count.
next_limit = codegen->CreateAtomicRMW(
Copy link
Contributor

@tcm-marcel tcm-marcel Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what this looks like in IR and what LLVM compiles it to? (I haven't look at it yet, Google was not helpful) Whatever it is - I have the feeling that the Interpreter doesn't support it yet.

Copy link
Member Author

@pmenon pmenon Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generate LLVM's atomicrmw with sequential consistency. This may lower to different instructions depending on architecture, but on the few Intel Skylake+ machines I tested, it uses lock xadd.

In the interpreter, you can use our peloton::atomic_add(), which boils down to a locked xadd.

child_pipeline_(this, Pipeline::Parallelism::Serial) {
// Aggregations happen serially (for now ...)
pipeline.SetSerial();
child_pipeline_(this, Pipeline::Parallelism::Flexible) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly: The child pipeline of the Orderby (before the hashing) can be parallel, but the order-by itself has to be serial because of the aggregations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is saying is that the consumer portion of the order-by (i.e., the child pipeline) can be executed in parallel, and that the production portion (i.e., the scanning of the sorter) will be serial. It has no information about other operators in the pipeline.

@@ -97,8 +131,11 @@ void Sorter::Sort() {

timer.Stop();

LOG_DEBUG("Sorted %zu tuples in %.2f ms", tuples_.size(),
timer.GetDuration());
#ifndef NDEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be #ifdef LOG_DEBUG_ENABLED

* Sift down the element at the root of the heap while maintaining the heap
* property.
*/
void HeapSiftDown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference of this function and STL push_heap()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In essence, they are the same. But, push_heap alone doesn't give us the functionality we need. We need to remove the smallest (largest) element off the min (max) heap and insert a new element. We could use a pair of push_heap and pop_heap calls, but we've managed to achieve the same functionality with a single O(logn) operation.

@coveralls
Copy link

coveralls commented Jul 1, 2018

Coverage Status

Coverage increased (+0.05%) to 76.421% when pulling 0e2de1e on pmenon:limit into 2406b76 on cmu-db:master.

@pmenon
Copy link
Member Author

pmenon commented Jul 22, 2018

@tcm-marcel Want to take a last crack at this? Don't want this to get stale.

Copy link
Contributor

@tcm-marcel tcm-marcel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long. I think it is good to go!

@apavlo
Copy link
Member

apavlo commented Aug 3, 2018

@tcm-marcel I think this is stuck on the Jenkins mac build?

@tcm-marcel
Copy link
Contributor

The Jenkins Mac build fails because no space is left on the hard drive:

/tmp/workspace/peloton_PR-1435-XOL5JG3LYATJV2HWKIVAIOIZQP2OCJC5FY5VVGBXQZ6BT33JVMUA@2: No space left on device

@apavlo I am not familiar with the Jenkins Mac machine. But on Travis the Mac build succeeds.

@apavlo
Copy link
Member

apavlo commented Aug 4, 2018

@crd477 Can you check this?

@crd477
Copy link

crd477 commented Aug 6, 2018

Fixed with the quick fix of restarting the VM from a clean snapshot. The more complete fix of how to do this cleanly and regularly was still on my TODO list before I left for vacation last week and I'll get back to it today.
I also restarted the Jenkins builds for master and #1435. They should pass soon.

@pmenon pmenon merged commit 1de8979 into cmu-db:master Aug 6, 2018
@crd477
Copy link

crd477 commented Aug 7, 2018

The macOS Jenkins worker should be stable now. I grew the VM's disk and added a workspace cleanup cronjob as the other Jenkins workers have.

mtunique pushed a commit to mtunique/peloton that referenced this pull request Apr 16, 2019
…mu-db#1435)

* Initial limit commit

* Cleanup OrderByPlan

* Don't use SQL types for the result of the comparison when sorting

* Sorting for Top-K works. Probably slow.

* Optimize pop-then-push into single operation

* Fix lang::If to allow injection of custom then and else blocks

* Added early exit block to consumer context to allow operators to quick exit

* More comments

* Fast-exit limits

* Cleanup comments. Enabled parallel sorting.

* Fixed bytecode compilation. Added explicit call handlers for sorter to bytecode builder. Fixed parallel sorting top-k

* Set limit and offset to 0 when no limit/offset provided

* Fix offset

* Fix sort test in optimize to have well-defined order (some sort values were equal producing non-deterministic outputs). Fixed order by translator test to push a limit plan on top. This wasn't there before because we didn't need a limit

* Cleanup

* Fix limit without order-by clause
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants