Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Move node key to config directory and enable loading of multiple identities #5592

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d426681
Restructure PostSupervisor to require NodeID on StartSession and Prep…
fasmat Feb 21, 2024
b283295
Load multiple identities during startup
fasmat Feb 21, 2024
6217b6a
Update Recovery for multi-smesher
fasmat Feb 22, 2024
4ffc167
Fix recovery for multi-smeshing
fasmat Feb 22, 2024
1987a82
Add tests
fasmat Feb 22, 2024
900d1d6
Extend checkpoint tests with multi-smeshing setups
fasmat Feb 23, 2024
b1df1b7
Update Node tests
fasmat Feb 24, 2024
7af1f17
Fix node tests
fasmat Feb 24, 2024
30b2db0
Fix failing tests
fasmat Feb 24, 2024
49b26cd
Update CHANGELOG
fasmat Feb 26, 2024
a814bc8
Make sure duplicate keys are detected
fasmat Feb 26, 2024
dae6819
Update post
fasmat Feb 27, 2024
ec8c89b
Review feedback
fasmat Feb 27, 2024
0d92a9a
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 27, 2024
39607a8
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 28, 2024
3047f22
Review feedback
fasmat Feb 28, 2024
23366a6
Use maps instead of slices for deduplication
fasmat Feb 28, 2024
f1ce011
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 28, 2024
801e7b1
Fix failing tests
fasmat Feb 28, 2024
4db9f0e
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 29, 2024
7d26955
Fix failing tests
fasmat Feb 29, 2024
ecaca50
Fix flaky test
fasmat Feb 29, 2024
246599b
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 29, 2024
94d7a12
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 29, 2024
f89edef
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Feb 29, 2024
dfed9a2
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Mar 1, 2024
d775a9c
Fix logging
fasmat Mar 1, 2024
9fdf414
Run test e2e test with random post size
fasmat Mar 1, 2024
cf73444
Update post dependency
fasmat Mar 1, 2024
fe73b3f
Add initial post verification to builder
fasmat Mar 1, 2024
949095c
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Mar 1, 2024
43eff2a
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Mar 1, 2024
62528e5
Fix failing tests
fasmat Mar 1, 2024
ee3cc19
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Mar 1, 2024
b802f65
Update systest makefile
fasmat Mar 1, 2024
e395c8f
Downgrade post-rs
fasmat Mar 1, 2024
c2f9543
Downgrade post
fasmat Mar 1, 2024
d265586
Update CHANGELOG.md
fasmat Mar 4, 2024
310f272
Update CHANGELOG with info that multi-smeshing is still in testing
fasmat Mar 4, 2024
aaf6d26
Merge remote-tracking branch 'origin/develop' into 5089-move-key-to-c…
fasmat Mar 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 72 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ encrypted connection between the post service and the node over insecure connect

Smeshers using the default setup with a supervised post service do not need to make changes to their node configuration.

#### Fully migrated local state into `node_state.sql`

With this release the node has fully migrated its local state into `node_state.sql`. During the first start after the
upgrade the node will migrate the data from disk and store it in the database. This change also allows the PoST data
directory to be set to read only after the migration is complete, as the node will no longer write to it.

#### New poets configuration

Upgrading requires changes in config and in CLI flags (if not using the default).
Expand Down Expand Up @@ -92,17 +98,65 @@ configuration is as follows:
}
```

#### Extend go-spacemesh with option to manage multiple identities/PoST services

**NOTE:** This is a new feature, not yet supported by Smapp and possibly subject to change. Please use with caution.

A node can now manage multiple identities and their life cycle. This reduces the amount of data that is needed to be
broadcasted / fetched from the network and reduces the amount of data that needs to be stored locally, because only one
database is needed for all identities instead of one for each.

To ensure you are eligible for rewards of any given identity, the associated PoST service must be running and connected
to the node during the cyclegap set in the node's configuration. After successfully broadcasting the ATX and registering
at a PoET server the PoST services can be stopped with only the node having to be online.

This change moves the private keys associated for an identity from the PoST data directory to the node's data directory
and into the folder `identities` (i.e. if `state.sql` is in folder `data` the keys will now be stored in `data/identities`).
The node will automatically migrate the `key.bin` file from the PoST data directory during the first startup and copy
it to the new location as `identity.key`. The content of the file stays unchanged (= the private key of the identity hex-encoded).

##### Adding new identities/PoST services to a node

To add a new identity to a node, initialize PoST data with `postcli` and let it generate a new private key for you:

```shell
./postcli -provider=2 -numUnits=4 -datadir=/path/to/data \
-commitmentAtxId=c230c51669d1fcd35860131e438e234726b2bd5f9adbbd91bd88a718e7e98ecb
```

Make sure to replace `provider` with your provider of choice and `numUnits` with the number of PoST units you want to
initialize. The `commitmentAtxId` is the commitment ATX ID for the identity you want to initialize. For details on the
usage of `postcli` please refer to [postcli README](https://github.com/spacemeshos/post/cmd/postcli/README.md).

During initialization `postcli` will generate a new private key and store it in the PoST data directory as `key.bin`.
Copy this file to your `data/identities` directory and rename it to `xxx.key` where `xxx` is a unique identifier for
the identity. The node will automatically pick up the new identity and manage its lifecycle after a restart.

Setup the `post-service` [binary](https://github.com/spacemeshos/post-rs/releases) or
[docker image](https://hub.docker.com/r/spacemeshos/post-service/tags) with the data and configure it to connect to your
node. For details refer to the [post-service README](https://github.com/spacemeshos/post-rs/blob/main/service/README.md).

##### Migrating existing identities/PoST services to a node
fasmat marked this conversation as resolved.
Show resolved Hide resolved

If you have multiple nodes running and want to migrate to use only one node for all identities:

1. Stop all nodes.
2. Copy the `key.bin` files from the PoST data directories of all nodes to the data directory of the node you want to
use for both identities and into the folder `data/identities`. Rename the files to `xxx.key` where `xxx` is a unique
identifier for each identity.
3. Start the node managing the identities.
4. For every identity setup a post service to use the existing PoST data for that identity and connect to the node.
For details refer to the [post-service README](https://github.com/spacemeshos/post-rs/blob/main/service/README.md).

**WARNING:** DO NOT run multiple nodes with the same identity at the same time. This will result in an equivocation
and permanent ineligibility for rewards.

### Highlights

* [#5293](https://github.com/spacemeshos/go-spacemesh/pull/5293) change poet servers configuration
The config now takes the poet server address and its public key. See the [Upgrade Information](#new-poets-configuration)
for details.

* [#5219](https://github.com/spacemeshos/go-spacemesh/pull/5219) Migrate data from `nipost_builder_state.bin` to `node_state.sql`.

The node will automatically migrate the data from disk and store it in the database. The migration will take place at the
first startup after the upgrade.

* [#5390](https://github.com/spacemeshos/go-spacemesh/pull/5390)
Distributed PoST verification.

Expand All @@ -111,12 +165,25 @@ configuration is as follows:
If a node finds a proof invalid, it will report it to the network by
creating a malfeasance proof. The malicious node will then be blacklisted by the network.

* [#5592](https://gihtub.com/spacemeshos/go-spacemesh/pull/5592)
Extend node with option to have multiple PoST services connect. This allows users to run multiple PoST services,
without the need to run multiple nodes. A node can now manage multiple identities and will manage the lifecycle of
those identities.
To collect rewards for every identity, the associated PoST service must be running and connected to the node during
the cyclegap set in the node's configuration.

### Features

### Improvements

* [#5219](https://github.com/spacemeshos/go-spacemesh/pull/5219) Migrate data from `nipost_builder_state.bin` to `node_state.sql`.

The node will automatically migrate the data from disk and store it in the database. The migration will take place at the
first startup after the upgrade.

* [#5418](https://github.com/spacemeshos/go-spacemesh/pull/5418) Add `grpc-post-listener` to separate post service from
`grpc-private-listener` and not require mTLS for the post service.

* [#5465](https://github.com/spacemeshos/go-spacemesh/pull/5465)
Add an option to cache SQL query results. This is useful for nodes with high peer counts.

Expand Down
72 changes: 50 additions & 22 deletions activation/activation.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
// Config defines configuration for Builder.
type Config struct {
GoldenATXID types.ATXID
LabelsPerUnit uint64
RegossipInterval time.Duration
}

Expand All @@ -68,8 +69,7 @@
type Builder struct {
accountLock sync.RWMutex
coinbaseAccount types.Address
goldenATXID types.ATXID
regossipInterval time.Duration
conf Config
cdb *datastore.CachedDB
localDB *localsql.Database
publisher pubsub.Publisher
Expand Down Expand Up @@ -143,8 +143,7 @@
b := &Builder{
parentCtx: context.Background(),
signers: make(map[types.NodeID]*signing.EdSigner),
goldenATXID: conf.GoldenATXID,
regossipInterval: conf.RegossipInterval,
conf: conf,
cdb: cdb,
localDB: localDB,
publisher: publisher,
Expand All @@ -165,11 +164,11 @@
b.smeshingMutex.Lock()
defer b.smeshingMutex.Unlock()
if _, exists := b.signers[sig.NodeID()]; exists {
b.log.Error("signing key already registered", zap.Stringer("id", sig.NodeID()))
b.log.Error("signing key already registered", log.ZShortStringer("id", sig.NodeID()))

Check warning on line 167 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L167

Added line #L167 was not covered by tests
return
}

b.log.Info("registered signing key", zap.Stringer("id", sig.NodeID()))
b.log.Info("registered signing key", log.ZShortStringer("id", sig.NodeID()))
b.signers[sig.NodeID()] = sig

if b.stop != nil {
Expand Down Expand Up @@ -213,11 +212,11 @@
b.run(ctx, sig)
return nil
})
if b.regossipInterval == 0 {
if b.conf.RegossipInterval == 0 {
return
}
b.eg.Go(func() error {
ticker := time.NewTicker(b.regossipInterval)
ticker := time.NewTicker(b.conf.RegossipInterval)
defer ticker.Stop()
for {
select {
Expand Down Expand Up @@ -253,7 +252,7 @@
var resetErr error
for _, sig := range b.signers {
if err := b.nipostBuilder.ResetState(sig.NodeID()); err != nil {
b.log.Error("failed to reset builder state", log.ZShortStringer("nodeId", sig.NodeID()), zap.Error(err))
b.log.Error("failed to reset builder state", log.ZShortStringer("id", sig.NodeID()), zap.Error(err))

Check warning on line 255 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L255

Added line #L255 was not covered by tests
err = fmt.Errorf("reset builder state for id %s: %w", sig.NodeID().ShortString(), err)
resetErr = errors.Join(resetErr, err)
continue
Expand All @@ -277,13 +276,13 @@
return maps.Keys(b.signers)
}

func (b *Builder) buildInitialPost(ctx context.Context, nodeId types.NodeID) error {
func (b *Builder) buildInitialPost(ctx context.Context, nodeID types.NodeID) error {
// Generate the initial POST if we don't have an ATX...
if _, err := b.cdb.GetLastAtx(nodeId); err == nil {
if _, err := b.cdb.GetLastAtx(nodeID); err == nil {
return nil
}
// ...and if we haven't stored an initial post yet.
_, err := nipost.InitialPost(b.localDB, nodeId)
_, err := nipost.InitialPost(b.localDB, nodeID)
switch {
case err == nil:
b.log.Info("load initial post from db")
Expand All @@ -296,14 +295,10 @@

// Create the initial post and save it.
startTime := time.Now()
post, postInfo, err := b.nipostBuilder.Proof(ctx, nodeId, shared.ZeroChallenge)
post, postInfo, err := b.nipostBuilder.Proof(ctx, nodeID, shared.ZeroChallenge)
if err != nil {
return fmt.Errorf("post execution: %w", err)
}
metrics.PostDuration.Set(float64(time.Since(startTime).Nanoseconds()))
public.PostSeconds.Set(float64(time.Since(startTime)))
b.log.Info("created the initial post")

initialPost := nipost.Post{
Nonce: post.Nonce,
Indices: post.Indices,
Expand All @@ -313,7 +308,23 @@
CommitmentATX: postInfo.CommitmentATX,
VRFNonce: *postInfo.Nonce,
}
return nipost.AddInitialPost(b.localDB, nodeId, initialPost)
err = b.validator.Post(ctx, nodeID, postInfo.CommitmentATX, post, &types.PostMetadata{
Challenge: shared.ZeroChallenge,
LabelsPerUnit: postInfo.LabelsPerUnit,
}, postInfo.NumUnits)
if err != nil {
b.log.Error("initial POST is invalid", log.ZShortStringer("smesherID", nodeID), zap.Error(err))
if err := nipost.RemoveInitialPost(b.localDB, nodeID); err != nil {
b.log.Fatal("failed to remove initial post", log.ZShortStringer("smesherID", nodeID), zap.Error(err))

Check warning on line 318 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L316-L318

Added lines #L316 - L318 were not covered by tests
}
return fmt.Errorf("initial POST is invalid: %w", err)

Check warning on line 320 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L320

Added line #L320 was not covered by tests
}

metrics.PostDuration.Set(float64(time.Since(startTime).Nanoseconds()))
public.PostSeconds.Set(float64(time.Since(startTime)))
b.log.Info("created the initial post")

return nipost.AddInitialPost(b.localDB, nodeID, initialPost)
}

func (b *Builder) run(ctx context.Context, sig *signing.EdSigner) {
Expand Down Expand Up @@ -379,7 +390,7 @@
}
}

func (b *Builder) buildNIPostChallenge(ctx context.Context, nodeID types.NodeID) (*types.NIPostChallenge, error) {
func (b *Builder) BuildNIPostChallenge(ctx context.Context, nodeID types.NodeID) (*types.NIPostChallenge, error) {
select {
case <-ctx.Done():
return nil, ctx.Err()
Expand Down Expand Up @@ -451,6 +462,23 @@
if err != nil {
return nil, fmt.Errorf("get initial post: %w", err)
}
b.log.Info("verifying the initial post")
initialPost := &types.Post{
Nonce: post.Nonce,
Indices: post.Indices,
Pow: post.Pow,
}
err = b.validator.Post(ctx, nodeID, post.CommitmentATX, initialPost, &types.PostMetadata{
Challenge: shared.ZeroChallenge,
LabelsPerUnit: b.conf.LabelsPerUnit,
}, post.NumUnits)
if err != nil {
b.log.Error("initial POST is invalid", log.ZShortStringer("smesherID", nodeID), zap.Error(err))
if err := nipost.RemoveInitialPost(b.localDB, nodeID); err != nil {
b.log.Fatal("failed to remove initial post", log.ZShortStringer("smesherID", nodeID), zap.Error(err))

Check warning on line 478 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L476-L478

Added lines #L476 - L478 were not covered by tests
}
return nil, fmt.Errorf("initial POST is invalid: %w", err)

Check warning on line 480 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L480

Added line #L480 was not covered by tests
}
challenge = &types.NIPostChallenge{
PublishEpoch: current + 1,
Sequence: 0,
Expand Down Expand Up @@ -498,7 +526,7 @@

// PublishActivationTx attempts to publish an atx, it returns an error if an atx cannot be created.
func (b *Builder) PublishActivationTx(ctx context.Context, sig *signing.EdSigner) error {
challenge, err := b.buildNIPostChallenge(ctx, sig.NodeID())
challenge, err := b.BuildNIPostChallenge(ctx, sig.NodeID())
if err != nil {
return err
}
Expand Down Expand Up @@ -630,7 +658,7 @@
ctx,
b.cdb,
nodeID,
b.goldenATXID,
b.conf.GoldenATXID,
b.validator,
b.log,
VerifyChainOpts.AssumeValidBefore(time.Now().Add(-b.postValidityDelay)),
Expand All @@ -639,7 +667,7 @@
)
if errors.Is(err, sql.ErrNotFound) {
b.log.Info("using golden atx as positioning atx")
return b.goldenATXID, nil
return b.conf.GoldenATXID, nil
}
return id, err
}
Expand Down
50 changes: 46 additions & 4 deletions activation/activation_multi_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,15 +222,33 @@ func TestRegossip(t *testing.T) {

func Test_Builder_Multi_InitialPost(t *testing.T) {
tab := newTestBuilder(t, 5, WithPoetConfig(PoetConfig{PhaseShift: layerDuration * 4}))

var eg errgroup.Group
for _, sig := range tab.signers {
sig := sig
eg.Go(func() error {
numUnits := uint32(12)

post := &types.Post{
Indices: types.RandomBytes(10),
Nonce: rand.Uint32(),
Pow: rand.Uint64(),
}
meta := &types.PostMetadata{
Challenge: shared.ZeroChallenge,
LabelsPerUnit: tab.conf.LabelsPerUnit,
}

commitmentATX := types.RandomATXID()
tab.mValidator.EXPECT().Post(gomock.Any(), sig.NodeID(), commitmentATX, post, meta, numUnits).Return(nil)
tab.mnipost.EXPECT().Proof(gomock.Any(), sig.NodeID(), shared.ZeroChallenge).Return(
&types.Post{Indices: make([]byte, 10)},
post,
&types.PostInfo{
CommitmentATX: types.RandomATXID(),
CommitmentATX: commitmentATX,
Nonce: new(types.VRFPostIndex),
NumUnits: numUnits,
NodeID: sig.NodeID(),
LabelsPerUnit: tab.conf.LabelsPerUnit,
},
nil,
)
Expand All @@ -249,7 +267,6 @@ func Test_Builder_Multi_InitialPost(t *testing.T) {
func Test_Builder_Multi_HappyPath(t *testing.T) {
layerDuration := 2 * time.Second
tab := newTestBuilder(t, 3, WithPoetConfig(PoetConfig{PhaseShift: layerDuration * 4, CycleGap: layerDuration}))
tab.regossipInterval = 0 // disable regossip for testing

// step 1: build initial posts
initialPostChan := make(chan struct{})
Expand All @@ -264,12 +281,23 @@ func Test_Builder_Multi_HappyPath(t *testing.T) {
Nonce: rand.Uint32(),
Pow: rand.Uint64(),

NumUnits: 4,
NumUnits: uint32(12),
CommitmentATX: types.RandomATXID(),
VRFNonce: types.VRFPostIndex(rand.Uint64()),
}
initialPost[sig.NodeID()] = &nipost

post := &types.Post{
Indices: nipost.Indices,
Nonce: nipost.Nonce,
Pow: nipost.Pow,
}
meta := &types.PostMetadata{
Challenge: shared.ZeroChallenge,
LabelsPerUnit: tab.conf.LabelsPerUnit,
}
tab.mValidator.EXPECT().Post(gomock.Any(), sig.NodeID(), nipost.CommitmentATX, post, meta, nipost.NumUnits).
Return(nil)
tab.mnipost.EXPECT().Proof(gomock.Any(), sig.NodeID(), shared.ZeroChallenge).DoAndReturn(
func(ctx context.Context, _ types.NodeID, _ []byte) (*types.Post, *types.PostInfo, error) {
<-initialPostChan
Expand All @@ -283,6 +311,7 @@ func Test_Builder_Multi_HappyPath(t *testing.T) {
NumUnits: nipost.NumUnits,
CommitmentATX: nipost.CommitmentATX,
Nonce: &nipost.VRFNonce,
LabelsPerUnit: tab.conf.LabelsPerUnit,
}

return post, postInfo, nil
Expand Down Expand Up @@ -315,6 +344,19 @@ func Test_Builder_Multi_HappyPath(t *testing.T) {
return postGenesisEpoch.FirstLayer() + 1
},
)

nipost := initialPost[sig.NodeID()]
post := &types.Post{
Indices: nipost.Indices,
Nonce: nipost.Nonce,
Pow: nipost.Pow,
}
meta := &types.PostMetadata{
Challenge: shared.ZeroChallenge,
LabelsPerUnit: tab.conf.LabelsPerUnit,
}
tab.mValidator.EXPECT().Post(gomock.Any(), sig.NodeID(), nipost.CommitmentATX, post, meta, nipost.NumUnits).
Return(nil)
}

// step 3: create ATX
Expand Down
Loading
Loading