Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for retaining bookmarks order on deletion + printing multiline descriptions #818

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

LeXofLeviafan
Copy link
Collaborator

  • implementing a retained_order parameter for deletion-related operations (when truthy, compactdb() updates multiple record indices at once in a way that prevents their reordering)
  • adding a --retained-order CLI option to allow for using said functionality from CLI
    (...there doesn't seem to be any deletion support in the interactive shell 🤔)
  • changing Bukuserver to use this functionality by default
  • fixing a bug in delete_resultset()
  • ensuring that DB file is resized when committing (batch) deletions
  • additionally, fixing text wrapping during record printing (for proper output of multiline descriptions, and to allow for linebreaks between tags instead of hyphens)

@jarun honestly, I'd say it would make sense to always retain ordering in CLI deletions as well, since this at least guarantees consistent bookmarks order (and IDs are not consistent either way). …Alternatively we could drop the "old" DB compression logic altogether; though the new one is slower when deleting a large arbitrary resultset. (Then again, it's not like such updates need to be done very often.)

Screenshots

printing records with multiline data

(in case of tags, we're tricking textwrap into inserting linebreaks after commas rather than hyphens)
sample data
(an example of multiline description which looks better when rendered as such)
usecase

batch deletion of search results (with retained order of remaining records) + resized DB file
# deleting all https:// bookmarks from generated.db
yes | buku --nostdin --db generated.db --markers :https:// --delete --retain-order

(tail output after removing 9987 records)
deletion log
(remaining records still retain their original order)
remaining records
(the file size got reduced compared to the original database)
file size

batch deletion of random records (with retained order)
# deleting a bunch of randomly picked bookmarks from generated.db
yes | buku --nostdin --db generated.db --np --markers :: --random 9995 --delete --retain-order

(tail output after removing 9995 records)
deletion log
(remaining records still retain their original order)
remaining records

batch deletion of an index range (with retained order)
yes | buku --nostdin --db generated.db --np --delete 3-9997 --retain-order

(tail output after removing 9995 sequential records)
deletion log
(remaining records still retain their original order)
remaining records

batch deletion of an index range without retained order

(removed indices are replaced in a single DB update)
deletion log
(the last 10 records of resulting database)
resulting database
(this also works when there's not enough records with larger indices to replace all that got removed)
deletion log 2
(the last 4 records of resulting database)
resulting database 2

webUI

(GUI deletion uses retained order now since it's a more convenient and intuitively expected UX)
image
image
image

fillwidth = columns - INDENT
desc_lines = [s for line in row.desc.splitlines()
for s in textwrap.wrap(line, width=fillwidth) or ['']]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting into lines then wrapping each separately (the fallback value is for empty lines since wrap() swallows them)

desc_lines = [s for line in row.desc.splitlines()
for s in textwrap.wrap(line, width=fillwidth) or ['']]
TR = str.maketrans(',-', '-,') # we want breaks after commas rather than hyphens
tag_lines = textwrap.wrap(row.tags.translate(TR), width=fillwidth)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap() inserts linebreaks after hyphens by default; which is not really a good thing when displaying tags. On the other hand, it doesn't insert linebreaks after commas (unless those are followed by whitespace); so what we want is to replace - with , and vice versa before passing tags string to wrap() (then revert the change during the printing).

print(DESC_STR % line, end='')
ln_num += 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ln_num since it's simpler to just use enumerate()

with self.lock:
for pos, row in reversed(list(enumerate(results))):
self.delete_rec(row[0], delay_commit=True)
for pos, id in reversed(list(enumerate(ids))):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no actual guarantee that the input list is sorted by index and contains unique entries; which led to a bunch of warnings about missing indices (likely accompanied by removing wrong records) when I was testing this on certain searches.

if idx not in ids:
ids += (idx,)

ids = set(args.delete)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually care about the order, so using set() is a simpler approach here

@@ -1456,17 +1464,18 @@ class BukuDb:
if delay_delete:
self.conn.commit()
else:
self.commit_delete()
self.commit_delete(retain_order=retain_order)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, this value is passed directly as the respective parameter of calls which may result in record deletion

self.conn.commit()
self.cur.execute('VACUUM')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VACUUM is necessary to shrink the file after deletion (particularly when deleting a large amount of records). Note that it effectively creates (and commits) a transaction of its own, so placing it before conn.commit() results in an error.

@@ -1841,28 +1850,34 @@ class BukuDb:
delay_commit : bool
True if record should not be committed to the DB,
leaving commit responsibility to caller. Default is False.
upto : int, optional
If specified, multiple indices are moved at once.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is used specifically for optimization of range deletions (whether we're using retain_order or not, there's no reason to make multiple updates when a single one would suffice, without a worse outcome).

"""

# Return if the last index left in DB was just deleted
max_id = self.get_max_id()
if not max_id:
if not max_id or (upto and upto < index):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a range was specified, it needs to be non-empty

with self.lock:
if retain_order or (upto or 0) > index:
step = (max(max_id - upto, upto + 1 - index) if not retain_order else
1 if not upto else upto + 1 - index)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's 3 possibility here:

  • retain-order without upto
  • retain-order with upto
  • upto without retain-order

The 3rd case only differs from the 2nd one in that we want to support the sub-case where there's more records with larger indices remaining than was in the deleted range ("Indices 9998-10000 moved to 9990-9992"). This is covered by the max() subexpression.

msg = f'Indices {index+step}-{max_id} moved to {index}-{max_id-step}'
else:
self.cur.execute('UPDATE bookmarks SET id = ? WHERE id = ?', (index, max_id))
msg = f'Index {max_id} moved to {index}'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single UPDATE should suffice for our purpose

@@ -2114,9 +2138,6 @@ class BukuDb:
return False

if self.delete_rec_all():
with self.lock:
self.cur.execute('VACUUM')
self.conn.commit()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VACUUM is now invoked directly by delete_rec_all() after the commit (same as for other deletion operations)

@@ -6332,22 +6355,17 @@ POSITIONAL ARGUMENTS:
try:
vals = [int(x) for x in args.delete[0].split('-')]
if len(vals) == 2:
bdb.delete_rec(0, vals[0], vals[1], True)
bdb.delete_rec(0, vals[0], vals[1], is_range=True, retain_order=args.retain_order)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to specify explicitly what this True actually stands for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant