-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BeaconStateMachine
outline
#1373
Conversation
eth/beacon/db/schema.py
Outdated
# | ||
@staticmethod | ||
def make_slot_to_crystallized_state_lookup_key(slot: int) -> bytes: | ||
return b'beacon:slot-to-crystallized-state:%d' % slot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still probably want the v1
namespace to be present somewhere in this string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just add v1
to all lookup keys?
eth/beacon/state_machines/base.py
Outdated
self._active_state = self.get_active_state_class().from_old_active_and_blocks( | ||
old_active_state, | ||
blocks, | ||
self.config.CYCLE_LENGTH * 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that this is the only value that is accessed in self.config
. Does it make sense to just have this be a class level variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, I think there will be more when more becon_chain state_transition.py functions are torn down into StateMachine.
@@ -268,11 +267,14 @@ def _get_shards_and_committees_for_shard_indices( | |||
|
|||
|
|||
@to_tuple | |||
def get_new_shuffling(seed: Hash32, | |||
def get_new_shuffling(*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I just deleted a comment suggesting that you adopt this exact pattern!
One thing to think about. Invoking functions using keyword arguments has a runtime cost. I suspect that for this function, that runtime cost is going to be trivial compared to the execution of the body of the functions, but for any function that needs to be highly performant, this should be taken into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move it into StateMachine, we can reduce some parameters by using self.config.CYCLE_LENGTH
, self.config.MIN_COMMITTEE_SIZE
, self.config.SHARD_COUNT
.
eth/beacon/state_machines/base.py
Outdated
@@ -74,7 +74,7 @@ def logger(self): | |||
@classmethod | |||
def get_block_class(cls) -> Type[BaseBeaconBlock]: | |||
""" | |||
Return the :class:`~eth.beacon.types.blocks.BeaconBlock` class that this | |||
Returns the :class:`~eth.beacon.types.blocks.BeaconBlock` class that this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the original Return
was the proper format we are using for docstrings (cc @cburgdorf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any preferences! 😆 Searched in eth
with this PR, there are 81 Returns
and 64 Return
now. I can change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, we try to streamline docs to always be imperative present tense. I believe that is what most documentation uses but it is at least inline with what the official Python docs use.
Here's a rule of thumb. If this sounds correct, it is correct:
If this method is called it will: <doc-string>
Example:
If this method is called it will return the BeaconBlock
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this is the same for git commit messages which at least makes it easier for me to remember.
eth/beacon/state_machines/base.py
Outdated
def get_prev_blocks(cls, | ||
last_block_hash: Hash32, | ||
chaindb: BaseBeaconChainDB, | ||
searching_depth: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick
I think search_depth
is a more natural name for this. And maybe even max_search_depth
since this seems to be an upper bound.
eth/beacon/state_machines/base.py
Outdated
Returns the previous blocks. | ||
|
||
Since the slot numbers might be not continuous. The searching depth is ``searching_depth`` | ||
and the target slot depth is ``target_slot_number``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rephrasing.
Slot numbers are not guaranteed to be contiguous since it is possible for there
to be no block at a given slot. The search is bounded by two parameters.
- `max_search_depth` - The maximum number of slots below the slot of the block denoted by `last_block_hash` that we should search.
- `min_slot_depth` - The slot number for which we should not search any deeper if reached, though we do include the first block encountered below this slot number before terminating the search.
eth/beacon/state_machines/base.py
Outdated
last_block_hash: Hash32, | ||
chaindb: BaseBeaconChainDB, | ||
searching_depth: int, | ||
target_slot_number: int) -> Iterable[BaseBeaconBlock]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe min_slot_number
since minimum is somewhat of an optional constraint and target
is likely to be interpreted as a firm constraint. With the term target, my intuition would say that this search would continue until it hit this slot number.
eth/beacon/state_machines/base.py
Outdated
yield block | ||
try: | ||
block = chaindb.get_block_by_hash(block.parent_hash) | ||
except (IndexError, BlockNotFound): | ||
break | ||
if block.slot_number < start_slot: | ||
if block.slot_number < target_slot_number: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it interesting and not intuitive that this search will include blocks which have a slot number lower than target_slot_number
. I assume that is part of the spec, but it might be something worth leaving an inline comment to be sure future readers of this code know that behavior is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it interesting and not intuitive that this search will include blocks which have a slot number lower than target_slot_number
Ohh, my goal is making this search return the blocks that are greater than or equal to target_slot_number
(min_slot_number
). It should return at least one block (the block that produced backup_active_state
).
Long story version:
- Startup: Setup empty states
- Apply the genesis block (
B0
) and get the genesis states. Persist these states (C0
andA0
)recent_block_hashes: [ZERO_HASH32] * (CYCLE_LENGTH * 2)
- Apply block
B1
, update active state (A1
)recent_block_hashes = get_new_recent_block_hashes( old_block_hashes=recent_block_hashes, parent_slot=B0.slot_number, current_slot=B1.slot_number, parent_hash=B0.hash )
recent_block_hashes: [ZERO_HASH32] * (CYCLE_LENGTH * 2 - 1) + [B0.hash]
- Apply block
B2
, update active state (A2
)recent_block_hashes = get_new_recent_block_hashes( old_block_hashes=recent_block_hashes, parent_slot=B1.slot_number, current_slot=B2.slot_number, parent_hash=B1.hash )
recent_block_hashes: [ZERO_HASH32] * (CYCLE_LENGTH * 2 - 2) + [B0.hash] + [B1.hash]
- Apply block
B3
, update active state (A3
)
… - Apply block
B64
, update active state (A64
) and crystallized state (C64
), and also persist states.
Assume that we have the backup active state A0
and want to create B3’
(!= B3
), so we have to reproduce A2
. It requires the blocks starting from the parent of “the first block after backup_active_state_slot
”. This parent block should be the block that produced A0
, so I set min_slot_number=backup_active_state_slot
.
Sorry, it's long. Do you think it makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that I was wrong in my first read of the code and yes, this does indeed only return down to block greater than or equal to target_slot_number
eth/beacon/state_machines/base.py
Outdated
searching_depth=searching_depth, | ||
target_slot_number=backup_active_state_slot | ||
) | ||
blocks = blocks[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to go check that this was reversing the blocks
. I think the idiomatic way to do this is to call reversed()
on blocks, or better yet, to do blocks = reversed(self.get_prev_blocks(...))
so that you don't have to define the blocks
variable twice.
pending_attestations = old_active_state.pending_attestations | ||
def from_backup_active_state_and_blocks(cls, | ||
backup_active_state: ActiveState, | ||
blocks: Sequence['BaseBeaconBlock']) -> ActiveState: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to using Sequence
. I think we (and largely me) have been defining these as Tuple
or List
too often which isn't strictly necessary and doesn't provide much extra safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I also used Iterable
for input fields of other functions. Do you think I should change most input fields from Iterable
to Sequence
?
Once I was using value: Iterable[Any]
and doing len(value)
in this function, mypy returned that "len" has incompatible type "Iterable[Any]"; expected "Sized"
. Changing to Sequence
saved me in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When judging between Iterable
and Sequence
my preference is to always use the most flexible type that fulfills the goal.
For example, if you do not need to access len()
(or e.g index access), Iterable
will do it and is more flexible. For instance, it allows you to also pass in an iterator (yield "foo"
).
But if you need to get the length or need to access elements at a specific index, then one has to tighten the type to something like Sequence
because Iterable
simply can't give you that (e.g. it is perfectly valid for an Iterable
to be infinite, so not having an answer to len()
at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cburgdorf thank you for elaborating! I will keep the current Iterable
s.
@@ -3,12 +3,11 @@ | |||
from eth.beacon.state_machines.configs import BeaconConfig | |||
|
|||
|
|||
SENERITY_CONFIG = BeaconConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
# | ||
@abstractmethod | ||
def exists(self, key: bytes) -> bool: | ||
def get_crystallized_state_by_root(self, state_root: Hash32) -> CrystallizedState: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ntpick: I think we recently settled on a preference to just use pass
here instead of raising a NotImplementedError
because the ABC
module already takes care of not allowing these to be called, rendering the raise
obsolete.
# | ||
# Active State | ||
# | ||
def get_active_state_by_root(self, state_root: Hash32) -> ActiveState: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason these aren't @abstractmethod
?
eth/beacon/db/chain.py
Outdated
@staticmethod | ||
def _get_deletable_state_roots(db: BaseDB) -> Tuple[Hash32]: | ||
""" | ||
Returns deletable_state_roots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Returns
eth/beacon/db/chain.py
Outdated
db: BaseDB, | ||
crystallized_state: CrystallizedState) -> None: | ||
""" | ||
Sets a record in the database to allow looking up this block by its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Sets
MIN_COMMITTEE_SIZE=128, # validators | ||
MIN_DYNASTY_LENGTH=256, # slots | ||
SHARD_COUNT=1024, # shards | ||
SLOT_DURATION=8, # seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that rather be named SLOT_DURATION_SECONDS
and SQRT_E_DROP_SECONDS
? Alternatively, instead of assigning int
here, we could assign Second
which would be a NewType
and prevent arbitrary int
values to be assigned. /cc @pipermerriam @carver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming: now these constants (configs) names follow the previous spec - and the spec is changing. Recently Justin proposed that using slot as the common time unit, except for SLOT_DURATION
. e.g., SQRT_E_DROP_TIME
in second became SQRT_E_DROP_TIME
in slot. I will clean up all these recent changes after porting old codebase.
For beacon chain, some other NewType
candidates: Slot
, ShardID
. And with the latest spec: ValidatorStatus
, SpecialRecordType
, ValidatorSetDeltaFlag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
('MIN_COMMITTEE_SIZE', int), | ||
('MIN_DYNASTY_LENGTH', int), | ||
('SHARD_COUNT', int), | ||
('SLOT_DURATION', int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in another comment, the following two are good candidates to use a Second
NewType
rather than using int
.
@pipermerriam @cburgdorf thanks for the review! I fixed PR feedback except for the |
What was wrong?
#1339 - part 3: Introduce
eth.beacon.state_machines.base.BaseStateMachine(Configurable)
How was it fixed?
block.slot_number >= last_state_recalc + CYCLE_LENGTH
, and affects theCrystallizedState
andActiveState
ActiveState
onlyCrystallizedState
is recalculated every cycle (~17 minutes with the latest spec). Now my strategy is simply store it by:crystallized_state_root: Hash32
->crystallized_state: CystallizedState
last_state_recalc
slot number ->crystallized_state_root: Hash32
for canonical chaindeletable_state_roots
DB storage. Could be clean up periodically or when some blocks are finalized.ActiveState
is recalculated every block (~16 seconds).ActiveState
will be backed up whenCrystallizedState
is persisted.CYCLE_LENGTH
is 64 slots, with the backup ActiveState + recent blocks, it can be reproduced.ActiveState
grows to 4 fields now!) I think we can leave the storage optimization later.opcodes
in PoW chain, notconstants
. But now it’s pretty messy.TODO
BeaconConfig
s, needs to be merged.from_old_active_and_blocks
thing.Cute Animal Picture