`Backfill` process never completes during periods of high `INSERT` query rates #583

santhosh-programmer · 2025-01-10T12:11:27Z

Hello team,

For testing, I tried this example 14_add_reviews_table.json to create a reviews table and then 17_add_rating_column.json to add a rating column.

I have inserted 50k dummy rows initially to this table.

Now to test alter column operation, i started 18_change_column_type.json migration and i ran pgbench parallelly as follows,

pgbench -f insert_script.sql -c 1 -T 2147483647 postgres

insert_script.sql

INSERT INTO public.reviews (username, product, review, rating)
SELECT 
    'user_' || generate_series(1, 10000), 
    'Product_' || generate_series(1, 10000), 
    'This is a dummy review for Product_' || generate_series(1, 10000),
    (random() * 5)::integer::text;

Since the table is loaded with high INSERT query rates from pgbench, the backfill procedure is not ending. Though initially 50k rows was present in the table, the backfilling continued for all the newer rows inserted by pgbench.

Since the trigger is created before starting backfill, i think it is redundant to backfill the newer rows which are inserted after trigger creation as backfill for this rows will be taken care by the trigger created earlier. Below is the console screenshot where the backfilling continued for 6 million+ rows created with pgbench and running infinitely.

In a large scale production environment, we can expect high volume of INSERT queries because multiple connections parallelly sends the INSERT queries. In such case, what should be done to avoid infinite backfilling problem? Should we block INSERT queries by manually acquiring some LOCK until backfill is completed ? If so, does it violates the concept of zero-downtime ?

The text was updated successfully, but these errors were encountered:

andrew-farries · 2025-01-13T15:00:52Z

Thanks for opening this @santhosh-programmer 🙏

I've added the issue to the v1 milestone as it's something we need to address.

We need some way to prevent the backfill process from touching new rows added after backfill start. A couple of ideas OTOH:

Use transaction ids to ensure that only rows visible before backfill start are considered for each batch
Have the backfill run entirely inside one transaction that is isolated to exclude rows committed by other transactions.

andrew-farries added this to the v1 milestone Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Backfill` process never completes during periods of high `INSERT` query rates #583

`Backfill` process never completes during periods of high `INSERT` query rates #583

santhosh-programmer commented Jan 10, 2025

andrew-farries commented Jan 13, 2025

Backfill process never completes during periods of high INSERT query rates #583

Backfill process never completes during periods of high INSERT query rates #583

Comments

santhosh-programmer commented Jan 10, 2025

andrew-farries commented Jan 13, 2025

`Backfill` process never completes during periods of high `INSERT` query rates #583

`Backfill` process never completes during periods of high `INSERT` query rates #583