Summary:
Commit 59b3636f505a7dc75ae91ab76348b20a89528e90 introduces read batching
for INSERT ON CONFLICT. Commit 3f2ac8c27617a8775320f4edb0c829f059bd1eff
refactors the map part of the design. This commit refactors the
ExecInsert part of the design and introduces RETURNING support.
- Decoupling of YB's read batching from PG's foreign data wrapper
batching: before this change, the INSERT ON CONFLICT read batching
flow sticks closely to the INSERT FDW batching.
- Slots were added to per-relinfo buffers, and flushing happened
whenever any of these buffers hit the batch size. This caused some
inconsistency in the order rows were inserted, and depending on the
batch size, it could give different results. Change these buffers
to be per-plan so that an INSERT that gets routed to multiple
partitioned tables still maintains insertion order. One unobvious
behavior change is that, now, in case of a mixed stream of input
values, the write RPCs will increase as each batch touches a lot
more tables whereas previously, each batch touched one table.
Furthermore, since only one buffer exists per plan now rather than
one buffer per partitioned table, the memory usage of storing these
buffers reduces by that much.
- The flushing also used to happen after we exceeded the batch size,
not beforehand. For example, if the batch size is 3, flushing
happened when trying to insert the fourth value. This causes some
headache when trying to support RETURNING as that fourth value is
already consumed from input while we are trying to flush the
previous first to third values and trying to return those values.
Restructure the flow to remove this odd behavior: now, the flush
would happen after batching the third value, before reading the
fourth value. This involves a major code refactor, splitting
ExecInsert into YbExecInsertPrologue and YbExecInsertAct. The
previous code duplicated part of ExecInsert with
YbFlushSlotsFromBatch. Now, that is deduplicated by
YbExecInsertAct, which is called from both ExecInsert and
YbFlushSlotsFromBatch. While there is this deduplication, this
restructuring introduces some duplication elsewhere, such as
rechecking indexes whether they should be considered for ON CONFLICT
between YbBatchFetchConflictingRows and ExecCheckIndexConstraints.
Hopefully, these are minor enough that they can be deduplicated in
the future, and the bigger goal is to reuse the pieces of this new
structure for supporting read batching of the MERGE command in the
future.
- INSERT FDW batching calls ExecPendingInserts to flush _all_ buffers,
including those for other tables in other plan nodes (see
estate->es_insert_pending_result_relations), right before a BEFORE
INSERT or BEFORE UPDATE trigger or when the current plan node's
ExecModifyTable is exausted and we are flushing out the remainder.
This is not necessarily correct if we want to stick close to
non-batched behavior. Neither is it always correct to flush only
the current buffer and ignore other plan nodes' buffers. This is
because, in the lifetime of a batching cycle, we could get context
switched out during the batching phase (which calls for flushing
other plan nodes' buffers) or during the flushing phase (which calls
for not flushing other plan nodes' buffers). Two simple examples of
these cases is a WITH CTE getting referenced (for the first time) in
the values being inserted or it getting referenced (for the first
time) in an ON CONFLICT DO UPDATE expression, respectively. Since
it is not simple to always determine which buffers ought to be
flushed, default to flushing only the current buffer as that is
simpler to implement (don't have to worry about infinite loops).
- Stop reusing the fields that INSERT FDW batching uses for keeping
track of batching, such as ri_NumSlots and ri_Slots, except for
ri_BatchSize. Store the rest of the information in a new struct
YbInsertOnConflictBatchState.
- RETURNING support: besides the restructuring to make INSERT ON
CONFLICT read batching with RETURNING easier to support, make several
additional notable changes:
- To support context switching out after having consumed a value from
input, store the slot and plan slot in new fields in
YbInsertOnConflictBatchState called pickup_slot and
pickup_plan_slot. Once flushing iscompleted, context is restored
using these pickup slots.
- Change the conflict map to be per-plan instead of global. This is
necessary in case we get context-switched out during flushing phase
and interact with the conflict map of a different plan. Keep the
intent set global as we want to detect whether the same row is
inserted twice across plans. Truthfully, such detection should not
be fully global as it should not mix into plans created by triggers,
but that can be loosened in the future. Along the same lines, there
are other deficiencies with the map design such as how it is not
updated when a plan does INSERT, UPDATE, or DELETE without involving
INSERT ON CONFLICT as the map/set alteration only happens within
INSERT ON CONFLICT code. Maybe it is better to go back to design
closer to before 3f2ac8c27617a8775320f4edb0c829f059bd1eff where the
map/set updates happen in the index INSERT/DELETE/UPDATE paths to
capture all the situations. Push these finer details to be future
work.
Add a bunch of test coverage for these finer cases involving WITH CTEs
and triggers. Initial code by Karthik; finalizing design and adding
tests by me.
Jira: DB-13712
Test Plan:
On Almalinux 8:
#!/usr/bin/env bash
set -euo pipefail
./yb_build.sh fastdebug --gcc11
find java/yb-pgsql/src/test/java/org/yb/pgsql -name 'TestPgRegressInsertOnConflict*' \
| grep -oE 'TestPgRegress\w+' \
| while read -r testname; do
./yb_build.sh fastdebug --gcc11 --java-test "$testname" --sj
done
Co-authored-by: Karthik Ramanathan <[email protected]>
Reviewers: kramanathan, amartsinchyk
Reviewed By: kramanathan
Subscribers: aagrawal, svc_phabricator, mihnea, yql, smishra
Differential Revision: https://phorge.dev.yugabyte.com/D39229