Release 2.25.1.0-b12: [#24648] YSQL: batch INSERT ON CONFLICT RETURNING · yugabyte/yugabyte-db

2.25.1.0-b12
a120c25
Compare

Choose a tag to compare

Loading

View all tags
2.25.1.0-b12: [#24648] YSQL: batch INSERT ON CONFLICT RETURNING
2.25.1.0-b12
a120c25
Compare

Choose a tag to compare

Loading

View all tags
jaki tagged this 13 Dec 01:18
Summary:
Commit 59b3636f505a7dc75ae91ab76348b20a89528e90 introduces read batching
for INSERT ON CONFLICT.  Commit 3f2ac8c27617a8775320f4edb0c829f059bd1eff
refactors the map part of the design.  This commit refactors the
ExecInsert part of the design and introduces RETURNING support.

- Decoupling of YB's read batching from PG's foreign data wrapper
  batching: before this change, the INSERT ON CONFLICT read batching
  flow sticks closely to the INSERT FDW batching.

  - Slots were added to per-relinfo buffers, and flushing happened
    whenever any of these buffers hit the batch size.  This caused some
    inconsistency in the order rows were inserted, and depending on the
    batch size, it could give different results.  Change these buffers
    to be per-plan so that an INSERT that gets routed to multiple
    partitioned tables still maintains insertion order.  One unobvious
    behavior change is that, now, in case of a mixed stream of input
    values, the write RPCs will increase as each batch touches a lot
    more tables whereas previously, each batch touched one table.
    Furthermore, since only one buffer exists per plan now rather than
    one buffer per partitioned table, the memory usage of storing these
    buffers reduces by that much.
  - The flushing also used to happen after we exceeded the batch size,
    not beforehand.  For example, if the batch size is 3, flushing
    happened when trying to insert the fourth value.  This causes some
    headache when trying to support RETURNING as that fourth value is
    already consumed from input while we are trying to flush the
    previous first to third values and trying to return those values.
    Restructure the flow to remove this odd behavior: now, the flush
    would happen after batching the third value, before reading the
    fourth value.  This involves a major code refactor, splitting
    ExecInsert into YbExecInsertPrologue and YbExecInsertAct.  The
    previous code duplicated part of ExecInsert with
    YbFlushSlotsFromBatch.  Now, that is deduplicated by
    YbExecInsertAct, which is called from both ExecInsert and
    YbFlushSlotsFromBatch.  While there is this deduplication, this
    restructuring introduces some duplication elsewhere, such as
    rechecking indexes whether they should be considered for ON CONFLICT
    between YbBatchFetchConflictingRows and ExecCheckIndexConstraints.
    Hopefully, these are minor enough that they can be deduplicated in
    the future, and the bigger goal is to reuse the pieces of this new
    structure for supporting read batching of the MERGE command in the
    future.
  - INSERT FDW batching calls ExecPendingInserts to flush _all_ buffers,
    including those for other tables in other plan nodes (see
    estate->es_insert_pending_result_relations), right before a BEFORE
    INSERT or BEFORE UPDATE trigger or when the current plan node's
    ExecModifyTable is exausted and we are flushing out the remainder.
    This is not necessarily correct if we want to stick close to
    non-batched behavior.  Neither is it always correct to flush only
    the current buffer and ignore other plan nodes' buffers.  This is
    because, in the lifetime of a batching cycle, we could get context
    switched out during the batching phase (which calls for flushing
    other plan nodes' buffers) or during the flushing phase (which calls
    for not flushing other plan nodes' buffers).  Two simple examples of
    these cases is a WITH CTE getting referenced (for the first time) in
    the values being inserted or it getting referenced (for the first
    time) in an ON CONFLICT DO UPDATE expression, respectively.  Since
    it is not simple to always determine which buffers ought to be
    flushed, default to flushing only the current buffer as that is
    simpler to implement (don't have to worry about infinite loops).
  - Stop reusing the fields that INSERT FDW batching uses for keeping
    track of batching, such as ri_NumSlots and ri_Slots, except for
    ri_BatchSize.  Store the rest of the information in a new struct
    YbInsertOnConflictBatchState.

- RETURNING support: besides the restructuring to make INSERT ON
  CONFLICT read batching with RETURNING easier to support, make several
  additional notable changes:

  - To support context switching out after having consumed a value from
    input, store the slot and plan slot in new fields in
    YbInsertOnConflictBatchState called pickup_slot and
    pickup_plan_slot.  Once flushing iscompleted, context is restored
    using these pickup slots.
  - Change the conflict map to be per-plan instead of global.  This is
    necessary in case we get context-switched out during flushing phase
    and interact with the conflict map of a different plan.  Keep the
    intent set global as we want to detect whether the same row is
    inserted twice across plans.  Truthfully, such detection should not
    be fully global as it should not mix into plans created by triggers,
    but that can be loosened in the future.  Along the same lines, there
    are other deficiencies with the map design such as how it is not
    updated when a plan does INSERT, UPDATE, or DELETE without involving
    INSERT ON CONFLICT as the map/set alteration only happens within
    INSERT ON CONFLICT code.  Maybe it is better to go back to design
    closer to before 3f2ac8c27617a8775320f4edb0c829f059bd1eff where the
    map/set updates happen in the index INSERT/DELETE/UPDATE paths to
    capture all the situations.  Push these finer details to be future
    work.

Add a bunch of test coverage for these finer cases involving WITH CTEs
and triggers.  Initial code by Karthik; finalizing design and adding
tests by me.
Jira: DB-13712

Test Plan:
On Almalinux 8:

    #!/usr/bin/env bash
    set -euo pipefail
    ./yb_build.sh fastdebug --gcc11
    find java/yb-pgsql/src/test/java/org/yb/pgsql -name 'TestPgRegressInsertOnConflict*' \
    | grep -oE 'TestPgRegress\w+' \
    | while read -r testname; do
      ./yb_build.sh fastdebug --gcc11 --java-test "$testname" --sj
    done

Co-authored-by: Karthik Ramanathan <[email protected]>

Reviewers: kramanathan, amartsinchyk

Reviewed By: kramanathan

Subscribers: aagrawal, svc_phabricator, mihnea, yql, smishra

Differential Revision: https://phorge.dev.yugabyte.com/D39229
Assets 2
Provide feedback

Saved searches

Use saved searches to filter your results more quickly