Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return bools from archive/delete_batch #95

Merged
merged 18 commits into from
Sep 3, 2023

Conversation

craigpastro
Copy link
Contributor

Closes #89.

This is an attempt at resolving #89. I also refactored archive in terms of archive_batch. Please let me know if you want me to revert that part.

It has been quite a while since I have written any Rust. I get the feeling I should be doing more maping and and_thening and using less mut 🙂

Before

pgmq=# select * from pgmq_archive('poop', array[1, 3, 5]);
 pgmq_archive
--------------
 t
(1 row)

pgmq=# select * from pgmq_archive('poop', array[1, 3, 5]);
 pgmq_archive
--------------
 t
(1 row)

After

pgmq=# select * from pgmq_archive('poop', array[1, 3, 5]);
 pgmq_archive 
----------
 t
 t
 f
(3 rows)

pgmq=# select * from pgmq_archive('poop', array[1, 3, 5]);
 pgmq_archive 
----------
 f
 f
 f
(3 rows)

@craigpastro craigpastro marked this pull request as draft August 30, 2023 18:26
@craigpastro craigpastro marked this pull request as ready for review August 30, 2023 18:47
@ChuckHend
Copy link
Member

ChuckHend commented Aug 30, 2023

First, thank you for contributing!

This is an attempt at resolving #89. I also refactored archive in terms of archive_batch. Please let me know if you want me to revert that part.

I think its ok to overload the functions so that batch and single operations use the same API. Underneath though, do you know how much perf difference there will be for treating the single operations the same as a batch? e.g. a batch of size 1 vs a single? I have not benchmarked it yet.

@craigpastro
Copy link
Contributor Author

First, thank you for contributing!

This is an attempt at resolving #89. I also refactored archive in terms of archive_batch. Please let me know if you want me to revert that part.

I think its ok to overload the functions so that batch and single operations use the same API. Underneath though, do you know how much perf difference there will be for treating the single operations the same as a batch? e.g. a batch of size 1 vs a single? I have not benchmarked it yet.

Excellent question. Let me see if I can run one with pgbench and I'll check back in.

Another thing to try I suppose is to add a quick benchmark test (based on https://doc.rust-lang.org/unstable-book/library-features/test.html) in another PR, and then we can compare them that way too.

@craigpastro
Copy link
Contributor Author

So I wrote a small script that looks like

select pgmq_create('q');
select * from pgmq_send('q', '{"foo":"bar"}');
select * from pgmq_archive('q', 1);
... (a bunch more)
select pgmq_drop_queue('q');

and ran pgbench with that. The results are below. There doesn't seem to be much of a difference, but I am not an expert of pgbench or Postgres benchmarks. Any suggestions? Thanks!

Current

$ pgbench -h localhost -p 28815 -U craig --client=1 --jobs=1 -t 500 -n --file=pgmq_bench.sql pgmq
pgbench (15.3 (Homebrew))
transaction type: pgmq_bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
number of failed transactions: 0 (0.000%)
latency average = 8.198 ms
initial connection time = 5.505 ms
tps = 121.983560 (without initial connection time)

This PR

$ pgbench -h localhost -p 28815 -U craig --client=1 --jobs=1 -t 500 -n --file=pgmq_bench.sql pgmq
pgbench (15.3 (Homebrew))
transaction type: pgmq_bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
number of failed transactions: 0 (0.000%)
latency average = 8.246 ms
initial connection time = 4.762 ms
tps = 121.263713 (without initial connection time)

@craigpastro
Copy link
Contributor Author

craigpastro commented Aug 31, 2023

There is no #[pg_bench] attribute in pgrx so I don't think something like https://doc.rust-lang.org/unstable-book/library-features/test.html would work. Ah, but maybe an integration test could work. Let me check.

Update: I don't see how it can work with #[tokio::test].

@ChuckHend
Copy link
Member

The in is about 0.01 ms slower than the = according this bench, just comparing the deletes. I'm curious if there are situations that could make that better/worse.

pgbench postgres://user:abc@localhost:28815/pgmq -P 1 -r -T 60 -f in.sql -f eq.sql
transaction type: multiple scripts
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 102134
number of failed transactions: 0 (0.000%)
latency average = 0.587 ms
latency stddev = 0.120 ms
initial connection time = 3.283 ms
tps = 1702.194694 (without initial connection time)
SQL script 1: in.sql
 - weight: 1 (targets 50.0% of total)
 - 51290 transactions (50.2% of total, tps = 854.813929)
 - number of failed transactions: 0 (0.000%)
 - latency average = 0.593 ms
 - latency stddev = 0.146 ms
 - statement latencies in milliseconds and failures:
         0.593           0  delete from bencher where record_id in (select max(record_id) from bencher)
SQL script 2: eq.sql
 - weight: 1 (targets 50.0% of total)
 - 50844 transactions (49.8% of total, tps = 847.380764)
 - number of failed transactions: 0 (0.000%)
 - latency average = 0.582 ms
 - latency stddev = 0.086 ms
 - statement latencies in milliseconds and failures:
         0.582           0  delete from bencher where record_id = (select max(record_id) from bencher)

@craigpastro
Copy link
Contributor Author

🤔 it's unlikely, but archive_batch also has to go through some string construction and manipulation which could add a slight latency. Let me try to replace IN with = ANY to avoid the string manipulation (and it is also nicer I think) and can you try to run the benchmark again?

@craigpastro
Copy link
Contributor Author

I tried explain analyze just to see what happens when using = 1 and = ANY(array[1]) and the plan is the same except for the index condition, but I can't see that having much of an impact. It just so happens here that the former was a bit slower, but those times fluctuate quite a bit.

pgmq=# EXPLAIN ANALYZE SELECT * FROM pgmq_q WHERE msg_id = 1;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Index Scan using pgmq_q_pkey on pgmq_q  (cost=0.15..8.17 rows=1 width=60) (actual time=0.019..0.020 rows=0 loops=1)
   Index Cond: (msg_id = 1)
 Planning Time: 0.236 ms
 Execution Time: 0.130 ms
(4 rows)

pgmq=# EXPLAIN ANALYZE SELECT * FROM pgmq_q WHERE msg_id = ANY(array[1]);
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Index Scan using pgmq_q_pkey on pgmq_q  (cost=0.15..8.17 rows=1 width=60) (actual time=0.023..0.024 rows=0 loops=1)
   Index Cond: (msg_id = ANY ('{1}'::integer[]))
 Planning Time: 0.224 ms
 Execution Time: 0.058 ms
(4 rows)

What was kind of interesting is that in (...) gets transformed into = ANY(...), or at least this is how I interpret below:

pgmq=# EXPLAIN ANALYZE SELECT * FROM pgmq_q WHERE msg_id in (1,2,3);
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pgmq_q  (cost=4.47..11.58 rows=3 width=60) (actual time=0.009..0.010 rows=0 loops=1)
   Recheck Cond: (msg_id = ANY ('{1,2,3}'::bigint[]))
   ->  Bitmap Index Scan on pgmq_q_pkey  (cost=0.00..4.47 rows=3 width=0) (actual time=0.005..0.006 rows=0 loops=1)
         Index Cond: (msg_id = ANY ('{1,2,3}'::bigint[]))
 Planning Time: 0.459 ms
 Execution Time: 0.112 ms
(6 rows)

@ChuckHend
Copy link
Member

Will need to bump core/Cargo.toml and ./Cargo.toml versions.

Also add an empty migration script for this version.

@craigpastro
Copy link
Contributor Author

👋 @ChuckHend. Bumped minor versions and added empty migration scripts. I added 0.21 to 0.22 too since we are now at 0.22. Is that correct? Thanks!

@ChuckHend
Copy link
Member

👋 @ChuckHend. Bumped minor versions and added empty migration scripts. I added 0.21 to 0.22 too since we are now at 0.22. Is that correct? Thanks!

Yes! Good call. It's easy to forget those. Wish we had CI checks for them.

@ChuckHend
Copy link
Member

ChuckHend commented Sep 2, 2023

I think the migration script will need some handing in it.

Upgrading an existing install of 0.21.0 to 0.22.0, i think fails because we need to drop the existing functions first, since we've changed the return signature.
https://www.postgresql.org/docs/current/sql-createfunction.html

Install 0.21.0

git checkout main
cargo pgrx run

pgmq=# drop extension pgmq; create extension pgmq; select pgmq_create('test_queue');

Then checkout this PR and update the extension.

gh pr checkout 95
cargo pgrx run

pgmq=# alter extension pgmq update to '0.22.0';
ALTER EXTENSION
pgmq=# select * from  pgmq_delete('test_queue', ARRAY[1::bigint, 2::bigint]);
ERROR:  table-function protocol for value-per-call mode was not followed

That fails, because I think the extension has to be completely dropped when the signature changes. Maybe pgrx can be configured to do this?

However, if we completely drop the extension (which drops all the objects), it works as I'd expect it to.

pgmq=# drop extension pgmq; create extension pgmq; select pgmq_create('test_queue');
DROP EXTENSION
CREATE EXTENSION
 pgmq_create 
-------------
 
(1 row)
pgmq=# select * from  pgmq_delete('test_queue', ARRAY[1::bigint, 2::bigint]);
 pgmq_delete 
-------------
 f
 f
(2 rows)

So I don't think there's anything wrong with the code, just the migration.

@craigpastro
Copy link
Contributor Author

🤔 I spent a bunch of time trying to get this to work before realizing it doesn't really work like I thought it would. This issue pgcentralfoundation/pgrx#121 seems to indicate that alter extension pgmq update will never work for this.

@@ -157,7 +157,7 @@ async fn test_ext_send_read_delete() {
.expect("failed to delete");
assert!(deleted);

// try to delete a message that doesnt exist
// try to delete a message that doesn't exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is triggering CI to fail on pgmq-rs. only because its telling us that the version in pgmq-rs/Cargo.toml already exists in crates.io. We can ignore that CI error.

@ChuckHend ChuckHend merged commit a4bd31c into tembo-io:main Sep 3, 2023
@craigpastro craigpastro deleted the return-bools-from-archive-batch branch September 11, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return value of the batch values of pgmq_archive and pgmq_delete
3 participants