Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor transition router #16145

Merged
merged 5 commits into from
Oct 1, 2024
Merged

Refactor transition router #16145

merged 5 commits into from
Oct 1, 2024

Conversation

anne-laure-s
Copy link
Member

This PR rebases the following commit of this PR :

Refactor Transition_router

  1. Simplify transition_router: remove unnecessary reader refs
  2. Do not ignore validation_callbacks for transition gossips received
    during bootstrap
  3. Add support for header gossip collection during boostrap

@anne-laure-s anne-laure-s self-assigned this Sep 25, 2024
@anne-laure-s anne-laure-s requested a review from a team as a code owner September 25, 2024 15:29
@anne-laure-s
Copy link
Member Author

!ci-build-me

@anne-laure-s anne-laure-s changed the title Al/transition router Refactor transition router Sep 26, 2024
| Bootstrap_controller.Transition_cache.Block b ->
Some (map x ~f:(const b), vc)
| _ ->
(* TODO: handle headers too *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to note that headers can't be actually received in this PR, so this TODO is safe to be left for future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in the comment.

~persistent_frontier ~initial_root_transition
~best_seen_transition:(Some enveloped_transition)
~catchup_mode )
Fn.const ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace Fn.const () with ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced

@@ -138,34 +138,34 @@ let start_bootstrap_controller ~context:(module Context : CONTEXT) ~trust_system
~pipe_name:name ~logger ?valid_cb )
()
in
transition_reader_ref := bootstrap_controller_reader ;
transition_writer_ref := bootstrap_controller_writer ;
let producer_transition_reader, producer_transition_writer =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on producer_transition_writer_ref: type of this parameter is changed from _ Strict_pipe.Writer.t ref to _ Strict_pipe.Writer.t option ref.

This makes sense because no block production happens when bootstrap is running. This pipe was created just to substitute a value for the type and was never actually used. It could have been read only by the transition frontier controller, and it's not running when bootstrap controller is active.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a comment in the code for the future.

@@ -318,17 +325,8 @@ let run ~context:(module Context : CONTEXT) ~trust_system ~verifier ~network
Sync_ledger.Db.create temp_snarked_ledger ~logger ~trust_system
in
don't_wait_for
(sync_ledger t
~preferred:
( Option.to_list best_seen_transition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been moved to start_bootstrap_controller to highlight that we don't really depend on best seen transition itself, but only on the knowledge of which peer shared it.

type element =
initial_valid_block_or_header Envelope.Incoming.t
* Mina_net2.Validation_callback.t option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: big change in this module is that transition cache may hold not only blocks, but headers too.

Also validation callbacks are not discarded any more

then children
else new_child :: children )
let children', b =
List.fold children ~init:([], false) ~f:(fun (acc, b) child ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: instead of just checking whether a state hash already was among children, we'll merge an existing child with the new transition.

It's needed to maintain consistency about transitions that may come from gossip or other ways (not sure if there could be some, but interfaces seem to allow that from the first glance), we want to preserve some validation callback (and we don't expect two gossips for the same).

Also if we received a block after receiving a header (definitely possible when we'll simultaneously support both old and new gossip topics), we want to preserve a block because it's more than a header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a comment in the code for the future.

List.filter_map collected_transitions ~f:(fun (x, vc) ->
let open Network_peer.Envelope.Incoming in
match data x with
| Bootstrap_controller.Transition_cache.Block b ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing the type containing Block constructor and just use:

[ `Block of Mina_block.t | `Header of Mina_block.Header.t ]

Then we'll no longer need dependency on the bootstrap controller here

Copy link
Member Author

@anne-laure-s anne-laure-s Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in this commit

producer_transition_writer_ref := None ;
let f block =
Strict_pipe.Writer.write bootstrap_controller_writer
(`Block block, `Valid_cb None) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: code below moved here from bootstrap controller.

New function combines writing to boostrap controller writer (which was here already) and extracting preferred peers (freshly moved).

src/lib/transition_router/transition_router.ml Outdated Show resolved Hide resolved
@@ -93,34 +92,35 @@ let start_transition_frontier_controller ~context:(module Context : CONTEXT)
?valid_cb ~pipe_name:name ~logger )
()
in
transition_reader_ref := transition_frontier_controller_reader ;
transition_writer_ref := transition_frontier_controller_writer ;
let transition_writer_ref =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably change of semantics behind transition_writer_ref is the most non-trivial change of the PR.

I strongly advise to extract this change as a separate commit.

Also, I think there is a bug here.

Original intention seemed to have been to remove the need to pass transition_reader_ref everywhere (because where it's used what's inside is always a reader created locally, hence we don't need a ref to use the correct value). And also to remove redundant pipe creation to start initialization (a pipe that gets discarded immediately). For this reason the transition_writer_ref was made an optional argument (and no argument in case of initialize function).

So the code below does it right that it uses ref transition_frontier_controller_writer as the default value. It does it wrong in not setting ref to transition_frontier_controller_writer when transition_writer_ref is not None.

So the code I think should be here is Option.value_map ~f:(fun r -> r := transition_frontier_controller_writer; r) transition_writer_ref ~default:(ref transition_frontier_controller_writer)

And same in start_bootstrap_controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Option.value_map ~f:(fun r -> r := transition_frontier_controller_writer; r) transition_writer_ref ~default:(ref transition_frontier_controller_writer) the same as ref transition_frontier_controller_writer ?

I think I’m missing something because to me this makes the ?transition_writer_ref argument given to the start_transition_frontier_controller function useless, as it is just replaced with transition_frontier_controller_writer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good question!

When ?transition_writer_ref:None is passed, following codes are identical by definition:

(* code 1 *)
Option.value_map ~f:(fun r -> r := transition_frontier_controller_writer; r) transition_writer_ref ~default:(ref transition_frontier_controller_writer)

(* code 2 *)
ref transition_frontier_controller_writer

However when ?transition_writer_ref:(Some myref) is passed, behavior changes. Code that calls start_transition_frontier_controller may use myref in some other thread. In particular we use it the following way:

don't_wait_for
@@ Strict_pipe.Reader.iter_without_pushback valid_transition_reader2
  ~f:(fun (`Block enveloped_transition, `Valid_cb vc) ->
   don't_wait_for @@
   let%map () = (* some code removed for clarity *) in
   Strict_pipe.Writer.write !transition_writer_ref
      (`Block enveloped_transition, `Valid_cb (Some vc)) ) )

We have an infinite loop of start_transition_frontier_controller transitioning to start_bootstrap_controller and then back, depending on state of sync. However the async thread above is started once with some ref. It's important that we write to that same variable instead of creating a new one as the code let transition_writer_ref = ref transition_frontier_controller_writer would do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this explaination, it makes more sense now.

I made a separate commit including your new version : 0cbd947

@@ -93,34 +92,35 @@ let start_transition_frontier_controller ~context:(module Context : CONTEXT)
?valid_cb ~pipe_name:name ~logger )
()
in
transition_reader_ref := transition_frontier_controller_reader ;
transition_writer_ref := transition_frontier_controller_writer ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this line removal wasn't part of original fbc8bf0

@@ -138,34 +138,34 @@ let start_bootstrap_controller ~context:(module Context : CONTEXT) ~trust_system
~pipe_name:name ~logger ?valid_cb )
()
in
transition_reader_ref := bootstrap_controller_reader ;
transition_writer_ref := bootstrap_controller_writer ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this line removal wasn't part of original fbc8bf0

Copy link
Member Author

@anne-laure-s anne-laure-s Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed it because I was confused that the new code was redundant :

  let transition_writer_ref =
    Option.value transition_writer_ref
      ~default:(ref bootstrap_controller_writer)
  in
  transition_writer_ref := bootstrap_controller_writer ;

Was it a mistake ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I had not read the previous comment about transition_writer_ref.

If I understood correctly, whatever the value in transition_writer_ref is when given as an argument, we want to replace it with ref transition_frontier_controller_writer. Is that correct ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reply to https://github.com/MinaProtocol/mina/pull/16145/files#r1781433238, there you formulated your concern excellently!

@georgeee
Copy link
Member

Reviewing this PR was an interesting exercise. This code is more than two years old, and I only faintly remembered the "why", but not the "how" about writing it.

I'd prefer it was split to 6 separate commits, but it's okay. I asked to extract one change as a separate commit because it seems to contain a bug introduced during the rebase process.

@anne-laure-s anne-laure-s force-pushed the al/transition-router branch 3 times, most recently from a72c1e7 to cbb1a38 Compare September 30, 2024 16:35
@georgeee
Copy link
Member

georgeee commented Oct 1, 2024

!ci-build-me

@georgeee
Copy link
Member

georgeee commented Oct 1, 2024

!ci-nightly-me

@anne-laure-s
Copy link
Member Author

!ci-build-me

@coveralls
Copy link

Coverage Status

coverage: 49.792% (+16.4%) from 33.411%
when pulling 69eee47 on al/transition-router
into 3a00659 on compatible.

@svv232 svv232 merged commit 279fea3 into compatible Oct 1, 2024
46 checks passed
@svv232 svv232 deleted the al/transition-router branch October 1, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants