Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate /v2/blocks high memory usage #1395

Closed
Eric-Warehime opened this issue Dec 28, 2022 · 8 comments
Closed

Investigate /v2/blocks high memory usage #1395

Eric-Warehime opened this issue Dec 28, 2022 · 8 comments
Labels
new-bug Bug report that needs triage Team Lamprey

Comments

@Eric-Warehime
Copy link
Contributor

Eric-Warehime commented Dec 28, 2022

Using the /v2/blocks endpoint (with Indexer v2.15.0) will currently return an error when the total number of transactions in the returned block is greater than the MaxTransactionsLimit. See here for the implementation.

This becomes a problem when there are a large number of inner transactions since the returned data recursively reconstructs transactions. By that I mean that a root txn will have inner txns nested inside of it, and an inner txn will have a copy of the root txn which also has inner txns.

The result is that not only does the memory usage scale with the number of transactions in a block, but also with the number of inner transactions.

Based on benchmarking, a MaxTransactionsLimit of 1000 will consume ~3GB of memory if the transactions in the block each have 200 inner transactions.

Desired Result:

Identify and implement a solution to limit memory consumption on blocks containing large numbers of inner transactions.

@Eric-Warehime Eric-Warehime added new-bug Bug report that needs triage Team Lamprey labels Dec 28, 2022
@Eric-Warehime
Copy link
Contributor Author

Benchmark results

BenchmarkBlockTransactionsLimit//v2/blocks_txns:_10_innerTxns:_0-10         	     333	   3558479 ns/op	  132312 B/op	     380 allocs/op
BenchmarkBlockTransactionsLimit//v2/blocks_txns:_100_innerTxns:_0-10        	     153	   7806742 ns/op	 1277780 B/op	    3376 allocs/op
BenchmarkBlockTransactionsLimit//v2/blocks_txns:_1000_innerTxns:_0-10       	      24	  46342839 ns/op	12674837 B/op	   33306 allocs/op
BenchmarkBlockTransactionsLimit//v2/blocks_txns:_10_innerTxns:_10-10        	     139	   8603460 ns/op	 1688362 B/op	    1357 allocs/op
BenchmarkBlockTransactionsLimit//v2/blocks_txns:_100_innerTxns:_100-10      	       3	 412493597 ns/op	128043600 B/op	  103353 allocs/op
BenchmarkBlockTransactionsLimit//v2/blocks_txns:_1000_innerTxns:_200-10     	       1	8535142250 ns/op	2976411216 B/op	 2034852 allocs/op

@Eric-Warehime
Copy link
Contributor Author

WIP PR for the benchmarks used to generate the above #1396

@jhawk28
Copy link

jhawk28 commented Jan 6, 2023

Over the last 2 days, our memory usage has continued to increase even though the level of traffic has stayed consistent. This makes it feel like there is a memory leak. It would be useful to have a local pprof HTTP endpoint to be able to grab a dump of the heap.

@Eric-Warehime
Copy link
Contributor Author

You can use the --cpuprofile flag to write a pprof cpu profile to disk via pprof.StartCPUProfile(profFile) https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L278

Are you saying you'd also like to be able to get a heap profile, https://pkg.go.dev/runtime/[email protected]#WriteHeapProfile at any given time via an http endpoint?

Also, do you have more info on the memory usage problems you're seeing? The issues we've been investigating are related to queries using large amounts of memory when deserializing blocks from pgsql which have large numbers of inner txns. We're not currently aware of any places this would leak objects, so if you have any info on that we'd be interested to take a look.

@Eric-Warehime
Copy link
Contributor Author

Also for reference, we've got a couple of fixes in the release pipeline for the memory usage in /v2/blocks/ #1397 and /v2/transactions/ #1402 in case anyone is curious of the fixes being made.

@jhawk28
Copy link

jhawk28 commented Jan 6, 2023

Yes, we are currently tracking the memory due to the /v2/blocks/ issue. Adding a runtime way to get a heap dump would allow us to see if there is any unknown issues over time.

This is what the code could like to add an pprof endpoint

addr := os.Getenv("DEBUG_PPROF_ADDR") //127.0.0.1:7777
if addr != "" {
	go func() {
		mux := http.NewServeMux()
		mux.HandleFunc("/debug/pprof/", pprof.Index)
		mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
		mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
		mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
		mux.HandleFunc("/debug/pprof/trace", pprof.Trace)

		server := &http.Server{
			Addr:    addr,
			Handler: mux,
		}

		err := server.ListenAndServe()
		if err != nil {
			log.WithFields(log.Fields{
				"error": err,
			}).Error("running pprof server failed")
		}
	}()
}

The heap could then be retrieved with a command like this:

curl -o indxer.heap http://localhost:7777/debug/pprof/heap

@algoanne
Copy link
Contributor

algoanne commented May 5, 2023

@Eric-Warehime was this fixed by the PR's above or is there still some concern in this area?

@Eric-Warehime
Copy link
Contributor Author

This has been resolved by the PRs mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-bug Bug report that needs triage Team Lamprey
Projects
None yet
Development

No branches or pull requests

3 participants