exp/ingest: Ingest Session #1456

bartekn · 2019-06-26T19:05:02Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
[ x This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

Summary

This PR adds two Session implementations (LiveSession and SingleLedgerSession) and a simple horizon-demo tool.

Goal and scope

The goal of this PR is to implement the last missing component of exp/ingest that connects all the existing packages together: Session. Session is connecting to history archives and/or ledger backend and passes data to one or two pipelines (state pipeline and ledger pipeline).

Session supports one of the use cases developers can interact with Stellar ledger. For example: LiveSession initializes the state and then follows the new ledgers and processes transactions (it's running indefinitely). On the contrary SingleLedgerSession processes the state of a single ledger and terminates. More sessions will be added in a future (ex. RangeSession that processes data between two ledgers).

It also contains a simple demo app called horizon-demo (go run ./exp/tools/horizon-demo) that's using LiveSession internally. horizon-demo is reading data from history archives and ledger backend and 1) updates accounts for signers, 2) inserts transactions to a database and 3) updates in-memory orderbook graph.

Close #1310.

Summary of changes

Added Session implementations and horizon-demo app.
Added Reset() method to pipeline processors. It's required to reset internal state of processors when a pipeline is used again (ex. when new ledger is closed). I'm still not sure if this is the best approach - read in the next section.
Updated support/pipeline.Pipeline to be reusable (previously it was a single use only object) and added Shutdown() method.
Added a few common processors to ingest/processors (CSVPrinter is an interesting example because it's possible to use it in both: state and ledger pipelines - it implements both interfaces).
Updated support/pipeline with pre- and post-processing hooks that are useful for opening and committing database transactions but also applying changes to other structures (like orderbook graph). Maybe it's a solution for Find a way for reporting status of pipeline and session #1459?

Known limitations & issues

I'm still not sure if the current approach to resetting pipeline is good. It's elegant but I'm afraid that developers may implement Reset() incorrectly when creating a new processor. The alternative is to create something like a PipelineFactory that would create a completely new pipeline with default struct values.
adapters package will likely be removed. It doesn't really give any value and acts as an unnecessary wrapper around readers.
No docs and tests. Will be improved in another PR.

What shouldn't be reviewed

Please ignore changes to adapters package. It's clear to me right now that we will remove and refactor it (#1405).

bartekn · 2019-06-26T19:18:41Z

exp/ingest/io/ledger_read_closer.go

@@ -92,7 +91,7 @@ func (dblrc *DBLedgerReadCloser) init() error {
 		return errors.Wrap(err, "error reading ledger from backend")
 	}
 	if !exists {
-		return errors.Wrap(err, "ledger was not found")
+		return ErrNotFound
 	}


Interesting note: it was actually always returning nil before that change. err in line 95 was always nil and when nil is passed to errors.Wrap it returns nil as well. I wonder if this is checked by staticcheck because I found very similar instance of this bug earlier this week (#1443 (comment)). If not, would be great to add a rule that catches this.

oh wow, that's nasty. Very interesting

Added it here if you're interested: dominikh/go-tools#529

tamirms · 2019-06-27T16:33:58Z

exp/ingest/io/ledger_read_closer.go

 		sequence: sequence,
 		backend:  backend,
 	}
+
+	var err error
+	reader.initOnce.Do(func() { err = reader.init() })


related to #1433 . because reader is created a few lines above. it seems strange reader initialize it using initOnce.Do() . fixing this might be outside the scope of this PR though

Yes, we'll decide on #1433 next week and change all the instances in another PR.

tamirms · 2019-06-27T16:34:38Z

exp/ingest/live_session.go

+	"github.com/stellar/go/support/errors"
+)
+
+var _ Session = &LiveSession{}


why is this necessary?

It's to ensure interface implementation. It ensures that LiveSession implements all methods of Session interface. Helpful if new methods are added to the interface (compilation will fail).

tamirms · 2019-06-27T16:36:33Z

exp/ingest/live_session.go

+		return errors.Wrap(err, "Error getting the latest ledger sequence")
+	}
+
+	fmt.Printf("Initializing state for ledger=%d\n", s.currentLedger)


when is it appropriate to use the log package vs fmt.Printf ?

You're right this should be removed. Created this issue: #1459

tamirms · 2019-06-27T16:38:13Z

exp/ingest/live_session.go

+
+	historyAdapter := adapters.MakeHistoryArchiveAdapter(s.Archive)
+
+	s.currentLedger, err = historyAdapter.GetLatestLedgerSequence()


can this be a local variable instead of a member variable of LiveSession?

tamirms · 2019-06-27T16:48:18Z

exp/ingest/live_session.go

+				continue
+			}
+
+			return errors.Wrap(err, "Error getting ledger")


it seems like once you run into an error the session is kind of destroyed. there's no way to retry from the last successfully processed ledger, right? I think there might be cases where we encounter transient errors and it would be nice to support different retry strategies.

I don't know if this issue should be addressed in this PR but just wanted to leave a note

Created an issue here: #1473.

tamirms · 2019-06-27T16:50:59Z

exp/ingest/live_session.go

+	return nil
+}
+
+func (s *LiveSession) SetStatePipeline(p *pipeline.StatePipeline) {


it seems like the state pipeline and ledger pipeline should be constructor / struct parameters since we cannot change the pipelines once the session has started running

tamirms · 2019-06-27T16:53:24Z

exp/ingest/main.go

+	Archive       historyarchive.ArchiveInterface
+	LedgerBackend ledgerbackend.LedgerBackend
+
+	// mutex is used to make sure queries across many stores are persistent


I think these 2 lines can be deleted

tamirms · 2019-06-27T16:56:41Z

exp/ingest/main.go

+	Archive        *historyarchive.Archive
+	LedgerSequence uint32
+
+	statePipeline *pipeline.StatePipeline
 }

 type Session interface {


is this interface necessary? I don't think I've seen it used anywhere

It's not used internally but packages that are using ingest can make use of it.

The Golang maintainers have the following opinion on defining interfaces:

"Do not define interfaces before they are used: without a realistic example of usage, it is too difficult to see whether an interface is even necessary, let alone what methods it ought to contain."

From https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Also, the fact that SingleLedgerSession does not support Resume(ledgerSequence uint32) error is another reason which makes me think the interface is not necessary.

That being said, I will defer to your judgement on whether it makes sense to keep the interface

tamirms · 2019-06-27T17:30:56Z

exp/ingest/single_ledger_session.go

+	return nil
+}
+
+func (s *SingleLedgerSession) SetStatePipeline(p *pipeline.StatePipeline) {


I think statePipeline should be a struct / constructor parameter for the same reasons I mentioned in the LiveSession comment

ire-and-curses

Just minor doc comments. It looks good to me.

Regarding Reset: I agree it seems it could be implemented incorrectly if a developer is not considering the reuse scenario for the pipeline. Maybe stick with this for now since it's done, and we can think about it as we finish the demos? I quite like the idea of the factory since it guarantees a clean set-up on each pass.

ire-and-curses · 2019-06-27T16:12:57Z

exp/ingest/io/ledger_read_closer.go

@@ -92,7 +91,7 @@ func (dblrc *DBLedgerReadCloser) init() error {
 		return errors.Wrap(err, "error reading ledger from backend")
 	}
 	if !exists {
-		return errors.Wrap(err, "ledger was not found")
+		return ErrNotFound
 	}


oh wow, that's nasty. Very interesting

ire-and-curses · 2019-06-27T16:14:23Z

exp/ingest/io/main.go

 	"github.com/stellar/go/xdr"
 )

+var ErrNotFound = errors.New("Not found")


nit: should lowercase not found

ire-and-curses · 2019-06-27T16:23:48Z

exp/ingest/main.go

+}
+
+// LiveSession initializes the ledger state using `Archive` and `StatePipeline`,
+// then starts processing ledger data using `ledgerbackend`.


Comment might be clearer as:

// LiveSession initializes the ledger state using `Archive` and `statePipeline`, // then starts processing ledger data using `LedgerBackend` and `ledgerPipeline`.

ire-and-curses · 2019-06-27T16:24:57Z

exp/ingest/main.go

+	LedgerBackend ledgerbackend.LedgerBackend
+
+	// mutex is used to make sure queries across many stores are persistent
+	// mutex         sync.RWMutex


ire-and-curses · 2019-06-27T16:26:26Z

exp/ingest/main.go

+	currentLedger uint32
+}
+
+// SingleLedgerSession initializes the ledger state using `Archive` and `StatePipeline`


Suggested change

// SingleLedgerSession initializes the ledger state using `Archive` and `StatePipeline`

// SingleLedgerSession initializes the ledger state using `Archive` and `statePipeline`

ire-and-curses · 2019-06-27T16:29:34Z

exp/ingest/main.go

+	Archive        *historyarchive.Archive
+	LedgerSequence uint32
+
+	statePipeline *pipeline.StatePipeline
 }

 type Session interface {


docstring for this interface would be helpful

exp/ingest/processors/doc.go

ire-and-curses · 2019-06-27T17:45:13Z

exp/ingest/processors/csv_printer.go

+	return os.Create(p.Filename)
+}
+
+func (p *CSVPrinter) ProcessState(ctx context.Context, store *pipeline.Store, r io.StateReadCloser, w io.StateWriteCloser) error {


Fine for now but we should add docstrings before general release

Co-Authored-By: Eric Saunders <[email protected]>

bartekn · 2019-07-03T16:10:47Z

@ire-and-curses @tamirms you can take a look again. Added a few improvements and horizon-demo is fully functional now.

tamirms · 2019-07-03T16:29:15Z

exp/ingest/io/ledger_transaction.go

+
+	for _, operationMeta := range t.Meta.OperationsMeta() {
+		ledgerEntryChanges := operationMeta.Changes
+		for i := 0; i < len(ledgerEntryChanges); i++ {


could you use a range for loop here?

for i, entryChange := range ledgerEntryChanges {

tamirms · 2019-07-03T16:32:54Z

exp/ingest/main.go

+	// Session user to determine what was the last ledger processed by a
+	// Session as it's stateless (or if Run() should be called first).
+	Resume(ledgerSequence uint32) error
+	GetLatestProcessedLedger() uint32


missing doc string here

tamirms · 2019-07-03T16:47:20Z

exp/tools/horizon-demo/pipelines.go

+		wg.Add(2)
+
+		go func() {
+			err = orderBookGraph.Apply()


why apply the updates here instead of applying the updates at the end of ProcessState() and ProcessLedger() ?

It's to make sure that state (db state and graph state) is consistent (at the same ledger). If we were applying and committing in the pipeline it's possible that for a short time data in both stores (db and memory) would represent state of 2 different ledgers.

I think a lot of this logic should be encapsulated in DatabaseProcessor and OrderbookProcessor. DatabaseProcessor could define Begin() and Commit() functions which are then called in the pre / post processing hooks. Similarly, OrderbookProcessor could define an Appy() function which would be called in the post processing hook.

In general I agree but I think the problem is that we have two DatabaseProcessor instances: one for inserting transactions, the other for updating signers. Then which one we use to call Begin() and Commit(). We can call Begin() only once, it would be confusing if it's called on a single processor. That's why I decided to call it on Database directly.

tamirms · 2019-07-03T16:52:30Z

exp/ingest/io/ledger_transaction.go

+					Post: &created.Data,
+				})
+			case xdr.LedgerEntryChangeTypeLedgerEntryUpdated:
+				state := ledgerEntryChanges[i-1].MustState()


I'm not familiar with the operations meta data format. Could it be possible that ledgerEntryChanges[0] has type xdr.LedgerEntryChangeTypeLedgerEntryUpdated or xdr.LedgerEntryChangeTypeLedgerEntryRemoved ? if that is a possibility then ledgerEntryChanges[i-1] would crash

It has specific format and I checked it in stellar-core. The algorithm is:

If there is an existing entry:

Insert STATE.

Insert UPDATED or REMOVED.

Otherwise insert CREATED.

However, I will confirm it with the core team to be sure.

does that mean LedgerEntryChanges is always constructed such that every UPDATED and REMOVED entry is always preceded with a STATE which represents the state prior to the update / removal?

Yes, it results from the algorithm above. However, I will confirm it with the core team next week.

tamirms · 2019-07-03T18:49:36Z

exp/tools/horizon-demo/orderbook_processor.go

+}
+
+func (p *OrderbookProcessor) IsConcurrent() bool {
+	return true


I don't think this processor should be concurrent because it is important that the offers be added to the graph in order. I think if the processor is concurrent then that means ledger and state entries could be processed out of order

Is it really important that offers must be added in order? For initial processing it doesn't matter. For ledger/transactions processing I think it also doesn't matter because graph is locked for reading during updates. Can you elaborate?

what if someone creates an offer and then later decides to remove the offer? or if someone creates an offer and then the offer is updated because the offer is partially consumed? it's important that those events are processed in order because if you try to remove an offer which doesn't exist the code will panic

OK, you're totally right!

tamirms · 2019-07-03T19:01:32Z

exp/support/pipeline/main.go

+// in structs that embed Pipeline.
+type PipelineInterface interface {
+	SetRoot(rootProcessor *PipelineNode)
+	AddPreProcessingHook(hook func(context.Context) error)


could Reset() be implemented as a pre-processing hook? how are those two operations different?

You mean internal reset() function is always added to preProcessingHooks when pipeline is created? I think we can do it, but when we decide on #1433. If someone creates new pipeline not via constructor, then the reset hook won't be added.

bartekn · 2019-07-05T14:38:58Z

Added some updates connected to feedback. PTAL!

tamirms · 2019-07-05T15:20:57Z

@bartekn it looks good! the only remaining comments I have is that the mutex in orderBookBatchedUpdates is no longer necessary because the orderbook processor is not concurrent. Also, I think it would be better if updates could only be applied in one way. My suggestion is to either

get rid of the AddOffer, RemoveOffer, and Apply methods on OrderBookGraph. Instead, you can create the batch and apply it in the pipeline hooks similar to what you do with the database transaction for the DatabaseProcessor instances

Or

get rid of the BatchedUpdates interface entirely so that you cannot create multiple batches. The only way to apply updates would be to call the AddOffer, RemoveOffer, and Apply methods on OrderBookGraph.

tamirms · 2019-07-05T16:02:43Z

exp/orderbook/batch.go

-	}
-
+// removeOffer will queue an operation to remove the given offer from the order book
+func (tx *orderBookBatchedUpdates) RemoveOffer(offerID xdr.Int64) *orderBookBatchedUpdates {


should be removeOffer

bartekn force-pushed the ingest-session branch from 964f12b to 4b26044 Compare June 26, 2019 19:06

bartekn marked this pull request as ready for review June 26, 2019 19:06

bartekn requested review from tomquisel, ire-and-curses and tamirms June 26, 2019 19:14

bartekn commented Jun 26, 2019

View reviewed changes

tamirms reviewed Jun 27, 2019

View reviewed changes

ire-and-curses approved these changes Jun 27, 2019

View reviewed changes

bartekn and others added 11 commits July 3, 2019 13:50

Ingest session

fa32ff8

Updates

84bb499

Remove RequiresInput()

80f1bdb

Fix tests

cdfddf6

Update root check

3639528

Update exp/ingest/processors/doc.go

1eb61e3

Co-Authored-By: Eric Saunders <[email protected]>

Remove unused

7ce506c

Changes

d378406

Updates

b969d9c

Fix example code

c9e7fc7

Orderbook and bugfixes

2371386

bartekn force-pushed the ingest-session branch from a04b5bf to 2371386 Compare July 3, 2019 14:42

Fix bugs

5d98b52

bartekn mentioned this pull request Jul 3, 2019

Find a way for reporting status of pipeline and session #1459

Closed

bartekn mentioned this pull request Jul 3, 2019

Accounts for Signer endpoint in Horizon using new ingestion system #1472

Closed

11 tasks

tamirms reviewed Jul 3, 2019

View reviewed changes

Fixes and important pipeline tests

c06d856

tamirms reviewed Jul 3, 2019

View reviewed changes

Fix tests

ad63703

bartekn mentioned this pull request Jul 4, 2019

exp/ingest: avoid sync.once when a constructor would suffice #1433

Closed

Updates

10ac842

Changes to orderBookBatchedUpdates

709ad99

tamirms reviewed Jul 5, 2019

View reviewed changes

removeOffer

90440d4

tamirms approved these changes Jul 5, 2019

View reviewed changes

Fix test

6c7cb38

bartekn merged commit 8cd5bd9 into master Jul 5, 2019

bartekn deleted the ingest-session branch July 5, 2019 16:15


		historyAdapter := adapters.MakeHistoryArchiveAdapter(s.Archive)

		s.currentLedger, err = historyAdapter.GetLatestLedgerSequence()

	// SingleLedgerSession initializes the ledger state using `Archive` and `StatePipeline`
	// SingleLedgerSession initializes the ledger state using `Archive` and `statePipeline`

exp/ingest: Ingest Session #1456

exp/ingest: Ingest Session #1456

Conversation

bartekn commented Jun 26, 2019 • edited Loading

PR Structure

Thoroughness

Release planning

Summary

Goal and scope

Summary of changes

Known limitations & issues

What shouldn't be reviewed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ire-and-curses left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn commented Jul 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn Jul 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn commented Jul 5, 2019

tamirms commented Jul 5, 2019

Choose a reason for hiding this comment

bartekn commented Jun 26, 2019 •

edited

Loading

bartekn Jul 3, 2019 •

edited

Loading