-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: overlay type definitions and operations #648
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,264 @@ | ||
//! A utility for managing in-memory overlays. | ||
//! | ||
//! This module exposes two types. The user-facing [`Overlay`] is opaque and frozen. The internal | ||
//! [`LiveOverlay`] is meant to be used within a session only. | ||
//! | ||
//! Overlays contain weak references to all their ancestors. This allows ancestors to be dropped or | ||
//! committed during the lifetime of the overlay. Importantly, this means that memory is cleaned | ||
//! up gracefully as overlays are dropped and committed. | ||
//! | ||
//! However, creating a new [`LiveOverlay`] requires the user to provide strong references to each | ||
//! of the ancestors which are still alive. It is still the user's responsibility to ensure | ||
//! that all live ancestors are provided, or else data will go silently missing. | ||
//! | ||
//! Looking up a value in an overlay does not have a varying cost. First there is a look-up in an | ||
//! index to see which ancestor has the data, and then a query on the ancestor's storage is done. | ||
//! | ||
//! Creating a new overlay is an O(n) operation in the amount of changes relative to the parent, | ||
//! both in terms of new changes and outdated ancestors. | ||
|
||
#![allow(dead_code)] | ||
|
||
use crate::{beatree::ValueChange, page_cache::Page, page_diff::PageDiff}; | ||
use nomt_core::{page_id::PageId, trie::KeyPath}; | ||
|
||
use std::collections::HashMap; | ||
use std::sync::{Arc, Weak}; | ||
|
||
/// An in-memory overlay of merkle tree and b-tree changes. | ||
pub struct Overlay { | ||
inner: Arc<OverlayInner>, | ||
} | ||
|
||
struct OverlayInner { | ||
index: Index, | ||
data: Arc<Data>, | ||
seqn: u64, | ||
// ordered by recency. | ||
ancestor_data: Vec<Weak<Data>>, | ||
} | ||
|
||
// Maps changes to sequence number. | ||
#[derive(Default, Clone)] | ||
struct Index { | ||
pages: imbl::HashMap<PageId, u64>, | ||
values: imbl::OrdMap<KeyPath, u64>, | ||
|
||
// sorted ascending by seqn. | ||
pages_by_seqn: imbl::Vector<(u64, PageId)>, | ||
values_by_seqn: imbl::Vector<(u64, KeyPath)>, | ||
} | ||
|
||
impl Index { | ||
// Prune all items with a sequence number less than the minimum | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing "." at the end? |
||
// O(n) in number of pruned items. | ||
fn prune_below(&mut self, min: u64) { | ||
loop { | ||
match self.pages_by_seqn.pop_front() { | ||
None => break, | ||
Some((seqn, key)) if seqn >= min => { | ||
self.pages_by_seqn.push_front((seqn, key)); | ||
break; | ||
} | ||
Some((seqn, key)) => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to change |
||
if let Some(got_seqn) = self | ||
.pages | ||
.remove(&key) | ||
.filter(|&got_seqn| got_seqn != seqn && got_seqn >= min) | ||
{ | ||
// key has been updated since this point. reinsert. | ||
self.pages.insert(key, got_seqn); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, probably related to a copy-paste being the same code as values_by_seqn pruning |
||
} | ||
} | ||
} | ||
} | ||
|
||
loop { | ||
match self.values_by_seqn.pop_front() { | ||
None => break, | ||
Some((seqn, key)) if seqn >= min => { | ||
self.values_by_seqn.push_front((seqn, key)); | ||
break; | ||
} | ||
Some((seqn, key)) => { | ||
if let Some(got_seqn) = self | ||
.values | ||
.remove(&key) | ||
.filter(|&got_seqn| got_seqn != seqn && got_seqn >= min) | ||
{ | ||
// key has been updated since this point. reinsert. | ||
self.values.insert(key, got_seqn); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
/// Insert all the value keys in the iterator with the given sequence number. | ||
/// | ||
/// The sequence number is assumed to be greater than or equal to the maximum in the vector. | ||
fn insert_pages(&mut self, seqn: u64, page_ids: impl IntoIterator<Item = PageId>) { | ||
for page_id in page_ids { | ||
self.pages_by_seqn.push_back((seqn, page_id.clone())); | ||
self.pages.insert(page_id, seqn); | ||
} | ||
} | ||
|
||
/// Insert all the value keys in the iterator with the given sequence number. | ||
/// | ||
/// The sequence number is assumed to be greater than or equal to the maximum in the vector. | ||
fn insert_values(&mut self, seqn: u64, value_keys: impl IntoIterator<Item = KeyPath>) { | ||
for key in value_keys { | ||
self.values_by_seqn.push_back((seqn, key)); | ||
self.values.insert(key, seqn); | ||
} | ||
} | ||
} | ||
|
||
/// Data associated with a single overlay. | ||
struct Data { | ||
pages: HashMap<PageId, (Page, PageDiff)>, | ||
values: HashMap<KeyPath, ValueChange>, | ||
} | ||
|
||
/// An error type indicating that the ancestors provided did not match. | ||
#[derive(Debug, Clone, Copy, PartialEq)] | ||
pub struct InvalidAncestors; | ||
|
||
/// A live overlay which is being used as a parent. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. used as child maybe? |
||
pub(super) struct LiveOverlay { | ||
parent: Option<Arc<OverlayInner>>, | ||
ancestor_data: Vec<Arc<Data>>, | ||
min_seqn: u64, | ||
} | ||
|
||
impl LiveOverlay { | ||
/// Create a new live overlay based on this iterator of ancestors. | ||
pub(super) fn new<'a>( | ||
live_ancestors: impl IntoIterator<Item = &'a Overlay>, | ||
) -> Result<Self, InvalidAncestors> { | ||
let mut live_ancestors = live_ancestors.into_iter(); | ||
let Some(parent) = live_ancestors.next().map(|p| p.inner.clone()) else { | ||
return Ok(LiveOverlay { | ||
parent: None, | ||
ancestor_data: Vec::new(), | ||
min_seqn: 0, | ||
}); | ||
}; | ||
|
||
let mut ancestor_data = Vec::new(); | ||
for (supposed_ancestor, actual_ancestor) in live_ancestors.zip(parent.ancestor_data.iter()) | ||
{ | ||
let Some(actual_ancestor) = actual_ancestor.upgrade() else { | ||
return Err(InvalidAncestors); | ||
}; | ||
|
||
if !Arc::ptr_eq(&supposed_ancestor.inner.data, &actual_ancestor) { | ||
return Err(InvalidAncestors); | ||
} | ||
|
||
ancestor_data.push(actual_ancestor); | ||
} | ||
|
||
let min_seqn = parent.seqn - ancestor_data.len() as u64; | ||
|
||
Ok(LiveOverlay { | ||
parent: Some(parent), | ||
ancestor_data, | ||
min_seqn, | ||
}) | ||
} | ||
|
||
/// Get a page by ID. | ||
/// | ||
/// `None` indicates that the page is not present in the overlay, not that the page doesn't | ||
/// exist. | ||
pub(super) fn page(&self, page_id: &PageId) -> Option<(Page, PageDiff)> { | ||
self.parent | ||
.as_ref() | ||
.and_then(|parent| parent.index.pages.get(&page_id)) | ||
.and_then(|seqn| seqn.checked_sub(self.min_seqn)) | ||
.map(|seqn_diff| { | ||
if seqn_diff == 0 { | ||
self.parent | ||
.as_ref() | ||
.unwrap() // UNWRAP: parent existence checked above | ||
.data | ||
.pages | ||
.get(page_id) | ||
.unwrap() // UNWRAP: index indicates that data exists. | ||
} else { | ||
self.ancestor_data[seqn_diff as usize - 1] | ||
.pages | ||
.get(page_id) | ||
.unwrap() // UNWRAP: index indicates that data exists. | ||
} | ||
}) | ||
.cloned() | ||
} | ||
|
||
/// Get a value change by ID. | ||
/// | ||
/// `None` indicates that the value has not changed in the overlay, not that the value doesn't | ||
/// exist. | ||
pub(super) fn value(&self, key: &KeyPath) -> Option<ValueChange> { | ||
self.parent | ||
.as_ref() | ||
.and_then(|parent| parent.index.values.get(key)) | ||
.and_then(|seqn| seqn.checked_sub(self.min_seqn)) | ||
.map(|seqn_diff| { | ||
if seqn_diff == 0 { | ||
// UNWRAP: parent existence checked above | ||
// UNWRAP: index indicates that data exists. | ||
self.parent.as_ref().unwrap().data.values.get(key).unwrap() | ||
} else { | ||
// UNWRAP: index indicates that data exists. | ||
self.ancestor_data[seqn_diff as usize - 1] | ||
.values | ||
.get(key) | ||
.unwrap() | ||
} | ||
}) | ||
.cloned() | ||
} | ||
|
||
/// Finish this overlay and transform it into a frozen [`Overlay`]. | ||
pub(super) fn finish( | ||
self, | ||
page_changes: HashMap<PageId, (Page, PageDiff)>, | ||
value_changes: HashMap<KeyPath, ValueChange>, | ||
) -> Overlay { | ||
let new_seqn = self.parent.as_ref().map_or(0, |p| p.seqn + 1); | ||
|
||
// rebuild the index, including the new stuff, and excluding stuff from dead overlays. | ||
let mut index = self | ||
.parent | ||
.as_ref() | ||
.map_or_else(Default::default, |p| p.index.clone()); | ||
index.prune_below(self.min_seqn); | ||
|
||
index.insert_pages(new_seqn, page_changes.keys().cloned()); | ||
index.insert_values(new_seqn, value_changes.keys().cloned()); | ||
|
||
let ancestor_data = self | ||
.parent | ||
.map(|parent| { | ||
std::iter::once(Arc::downgrade(&parent.data)) | ||
.chain(self.ancestor_data.into_iter().map(|d| Arc::downgrade(&d))) | ||
.collect() | ||
}) | ||
.unwrap_or_default(); | ||
|
||
Overlay { | ||
inner: Arc::new(OverlayInner { | ||
index, | ||
data: Arc::new(Data { | ||
pages: page_changes, | ||
values: value_changes, | ||
}), | ||
seqn: new_seqn, | ||
ancestor_data, | ||
}), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to see how a user can create a chain of
Overlay
with some missingOverlay
, leading to silently miss out on some data. I have a some ideas based on how you envisioned the usage in the follow-up pr.The user will be able to start a session and update NOMT with an
Overlay
as an output (first session created, so no overlays are available to be submitted). From now on, new overlays can be created again from the same previous state or on top of fresh new overlays. Each new overlay will contain a vector of weak references to all ancestors, making two things possible:LiveOverlay
, the provided Overlay iterator is checked against the vector of weak references contained in the parent of the new overlay.That is to say, the comment seems not strictly correct to me, the user will probably be able to not commit an overlay and thus may silently lose some data (but probably this could be solved in some way). However, they will not be able to silently create an overlay on top of a broken chain of overlays, because an error will be thrown as a result.