Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased "search update" reads after #175 #178

Closed
lemon24 opened this issue Jul 18, 2020 · 4 comments
Closed

Increased "search update" reads after #175 #178

lemon24 opened this issue Jul 18, 2020 · 4 comments

Comments

@lemon24
Copy link
Owner

lemon24 commented Jul 18, 2020

After resolving #175, I started seeing increased search update reads, leading to the EBS burst balance being exhausted on the 4GB volume where I have reader deployed.

While this is not necessarily a bug, it is a regression (and reader should support my current use case).


I continue to suspect the increased number of delete from entries_search queries as the root cause for this (each query results in an entries_search scan). I'll have to confirm this by measuring.

After a bit of thinking, this can probably be fixed by storing the rowids of the entries_search rows corresponding to an entry in entry_search_sync_state; queries by rowid should use an index, unlike those for any of the other columns (virtual tables can't have indexes).

One can get the rowid of the last insert by using the last_insert_rowid function. However, this will require us to bump the minimum SQLite version from 3.15 to 3.18 (last_insert_rowid works correctly for FTS5 tables only starting with 3.18). It shouldn't be an issue, since an up-to-date Ubuntu 18.04 (my main/oldest deployment target) seems to have at least 3.22). We should probably still allow using 3.15 if search is not enabled.


Another thing we could explore is to increase the page_size (the SQLite defaults may not be optimized for SSDs/EBS), but we should probably fix the code first.

lemon24 added a commit that referenced this issue Jul 20, 2020
lemon24 added a commit that referenced this issue Jul 20, 2020
lemon24 added a commit that referenced this issue Jul 20, 2020
lemon24 added a commit that referenced this issue Jul 20, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Jul 20, 2020

So, in about 5 hours of minutely update search (graph below), it managed to read 21 GiB, most of them (<.1%) by the two queries below; based on the VolumeReadBytes metric for that volume, that's about 97.5% of all the reads.

It's interesting that although none of the _delete_from_search queries deleted anything, and all of them had 10356 reads, 35 out of 304 are reported as having read 0 bytes. Maybe related to this, on a VirtualBox VM with roughly the same OS (Ubuntu 18.04), all the queries reported reading 0 bytes; per this answer, it may have to do with the underlying storage.

$ python scripts/debug_storage_stats.py < update.log.2020-07-18T19:57 -s read_mib | head -n28
_search.py _delete_from_search
DELETE FROM entries_search WHERE (_id, ...
<no exc>
       duration    read_count    write_count    read_mib    write_mib
n      304                304            304     304              304
sum    787.3579       3148224              0   13341.867            0
avg      2.59           10356              0      43.888            0
min      0.03           10356              0       0                0
p50      2.983          10356              0      50.834            0
p90      3.1553         10356              0      52.849            0
p95      3.2638         10356              0      53.627            0
p99      3.501          10356              0      55.514            0
max      3.6226         10356              0      56.918            0

_search.py _insert_into_search_one_chunk
SELECT entries.id, entries.feed, entries.last_updated, coalesce(feeds.user_title, ...
<no exc>
       duration    read_count    write_count    read_mib    write_mib
n      304                304            304     304              304
sum    190.7367       1716080              0    8085.676            0
avg      0.6274          5645              0      26.598            0
min      0.0144          5645              0       0                0
p50      0.7163          5645              0      31.133            0
p90      0.7646          5645              0      31.375            0
p95      0.7962          5645              0      31.41             0
p99      0.8324          5645              0      31.566            0
max      0.9313          5645              0      31.699            0

image

@lemon24
Copy link
Owner Author

lemon24 commented Jul 21, 2020

This gist has output from explain query plan and .stats for the 2 queries mentioned above, ran on the same copy of my database from 2020-06-23, with SQLite 3.18.

The _delete_from_search DELETE FROM entries_search does a full scan when deleting by (_id, _feed), but not when deleting by rowid, resulting in a 1000x decrease in "bytes read from storage" for 4 entries. This confirms the approach suggested in the first comment is correct.

The _insert_into_search_one_chunk SELECT FROM entries_search_sync_state JOIN entries JOIN feeds reads a lot of stuff because it scans entries before going to entries_search_sync_state (which appears in a WHERE clause). Running analyze seems to fix this; again, the decrease in bytes read is >1000x.

#143 would result in analyze being run regularly; alternatively, we could force a particular join order using CROSS JOIN.

lemon24 added a commit that referenced this issue Jul 22, 2020
lemon24 added a commit that referenced this issue Jul 22, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Jul 22, 2020

From 2 hours of logs after deploying the changes above:

$ python scripts/debug_storage_stats.py < update.log.2020-07-22T19:51 -s read_mib | grep -vE '^(<no exc>|min|p[0-9]+)' | head -n24 
_search.py _insert_into_search_one_chunk
SELECT entries.id, entries.feed, entries.last_updated, esss.es_rowids, ...
       duration    read_count    write_count    read_mib    write_mib
n      127              127              127     127              127
sum      7.293       733895                0     192.449            0
avg      0.0574        5778.7              0       1.515            0
max      0.8802        5787                0      30.812            0

_search.py _delete_from_search_one_chunk
SELECT id, feed, es_rowids FROM ...
       duration    read_count    write_count    read_mib    write_mib
n      124                124            124     124              124
sum      0.5267         35464              0       4.328            0
avg      0.0042           286              0       0.035            0
max      0.1102           286              0       1.312            0

_search.py _insert_into_search_one_chunk
DELETE FROM entries_search WHERE rowid ...
       duration    read_count    write_count    read_mib    write_mib
n        7                7                7       7                7
sum      0.0253          26                0       0.34             0
avg      0.0036           3.7              0       0.049            0
max      0.0229           6                0       0.25             0

image

lemon24 added a commit that referenced this issue Jul 22, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Jul 22, 2020

I ran ANALYZE manually on the main database at 20:26.

Update: For 11 hours since then:

$ python scripts/debug_storage_stats.py < update.log.2020-07-23T07:21 -s read_mib | grep -vE '^(<no exc>|min|p[0-9]+)' | head -n16
_search.py _delete_from_search_one_chunk
SELECT id, feed, es_rowids FROM ...
       duration    read_count    write_count    read_mib    write_mib
n      654                654            654     654              654
sum      1.6969        187044              0       9.734            0
avg      0.0026           286              0       0.015            0
max      0.1088           286              0       1.477            0

_search.py _insert_into_search_one_chunk
DELETE FROM entries_search WHERE rowid ...
       duration    read_count    write_count    read_mib    write_mib
n        1                  1              1       1                1
sum      0.0022             6              0       0.055            0
avg      0.0022             6              0       0.055            0
max      0.0022             6              0       0.055            0

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant