Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: merge the partition implementation of s3stream into new trunk branch of automq kafka #879

Merged
merged 285 commits into from
Mar 5, 2024

Conversation

daniel-y
Copy link
Contributor

@daniel-y daniel-y commented Mar 5, 2024

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

TheR1sing3un and others added 30 commits August 29, 2023 11:05
1. add commit wal object test

Signed-off-by: TheR1sing3un <[email protected]>
1. still commit valid stream in wal object

Signed-off-by: TheR1sing3un <[email protected]>
1. replay WalObjectRecord to advance range's end offset

Signed-off-by: TheR1sing3un <[email protected]>
feat(s3): advance range's end offset by WalObjectRecord
1. support new version stream open/close operations

Signed-off-by: TheR1sing3un <[email protected]>
1. pass checkstyle

Signed-off-by: TheR1sing3un <[email protected]>
feat(s3): support new version stream open/close operations
1. support get streams' offset range
2. remove useless
CompactObject
related request/response
3. refactor WalObjectRequest to
contain
compacted objects and stream objects
4. increase all stream
related request/response apiKey

Signed-off-by: TheR1sing3un <[email protected]>
feat(s3): support get streams' offset range
1. complete wal object commit process in controller

Signed-off-by: TheR1sing3un <[email protected]>
feat(s3): complete wal object commit process in controller
1. replay new added state in StreamImage

Signed-off-by: TheR1sing3un <[email protected]>
feat(s3): replay new added state in StreamImage
1. boostrap with controller-metadata-manager

Signed-off-by: TheR1sing3un <[email protected]>
* fix: fix checkstyle and add ci flow for checkstyle and spotbugs (#106)

ci(s3): add checkstyle and spotbugs in ci

1. add checkstyle and spotbugs in ci
2. fix to pass checkstyle

Signed-off-by: TheR1sing3un <[email protected]>

* fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool (#107)

* fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool

Signed-off-by: Curtis Wan <[email protected]>

* remove 'SLOW_FETCH_TIMEOUT_MILLIS - 1' case in quickFetch test and change 2 -> 'SLOW_FETCH_TIMEOUT_MILLIS / 2' in slowFetch test

Signed-off-by: Curtis Wan <[email protected]>

* refactor: add more comments to make logic more clear

Signed-off-by: Curtis Wan <[email protected]>

---------

Signed-off-by: Curtis Wan <[email protected]>

* refactor: close #105; add more threads for partition opening or closing (#109)

Signed-off-by: Curtis Wan <[email protected]>

* feat(es): client factory SPI (#112)

Signed-off-by: Robin Han <[email protected]>

* fix: close #108; make sure topics are all cleaned at the end of the test (#115)

Signed-off-by: Curtis Wan <[email protected]>

* fix: close #110; shutdown additional thread pools right now when broker is in shutdown (#111)

* fix: close #110; shutdown additional thread pools right now when broker is in shutdown

Signed-off-by: Curtis Wan <[email protected]>

* fix: move thread pools into AlwaysSuccessClient

Signed-off-by: Curtis Wan <[email protected]>

* fix: add more logs; fix partition leading or following

Signed-off-by: Curtis Wan <[email protected]>

---------

Signed-off-by: Curtis Wan <[email protected]>

* fix: close #117; fix return too early; add stack for log closing (#118)

Signed-off-by: Curtis Wan <[email protected]>

* fix: close #114; use position rather than offset as nextOffset for indexes; close metastream when log is closing (#116)

* wip: add more logs

Signed-off-by: Curtis Wan <[email protected]>

* fix: add ExceptionUtil; use position rather than offset as nextOffset for indexes

Signed-off-by: Curtis Wan <[email protected]>

* fix: fix style check

Signed-off-by: Curtis Wan <[email protected]>

---------

Signed-off-by: Curtis Wan <[email protected]>

* fix: close #119; skip deleting segments when metaStream is closed (#120)

Signed-off-by: Curtis Wan <[email protected]>

* feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK (#124)

* feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK. The corresponding partitions will be offline; Add context info to pass unit tests for
cases that fetching just follows appending.

Signed-off-by: Curtis Wan <[email protected]>

* feat: enhancement codes refered in discussions

Signed-off-by: Curtis Wan <[email protected]>

* fix: style problem

Signed-off-by: Curtis Wan <[email protected]>

* test: wait for more time for quick/slow fetch tests

Signed-off-by: Curtis Wan <[email protected]>

* test: start fetching after appending finished

Signed-off-by: Curtis Wan <[email protected]>

---------

Signed-off-by: Curtis Wan <[email protected]>

* test: add slowFetchDelay (#135)

Signed-off-by: Curtis Wan <[email protected]>

* feat(core): Add implementation of AutoBalancer components and unit tests  (#134)

* feat(core): Add customized MetricsReporter and partition-level metrics

- add partition level metrics for BytesInPerSec and BytesOutPerSec
- implement CruiseControlMetrics to monitor and report interested Yammer
metrics

Closes #78

Signed-off-by: sc.nieh <[email protected]>

* feat(core): AutoBalancerMetricsReporter optimization

- pre-aggregate for broker and partition level metrics
- fill with empty value for probably missing metrics

#78

Signed-off-by: sc.nieh <[email protected]>

* feat(core): Implement load retriever for auto balancer

#77

Signed-off-by: sc.nieh <[email protected]>

* feat(core): Implement AutoBalancerManager and AnomalyDetector

#77

Signed-off-by: sc.nieh <[email protected]>

* fix(core): Fix inconsistent in-flight request check

Closes #121

Signed-off-by: sc.nieh <[email protected]>

* feat(core): Create metrics reporter topic on controller

Signed-off-by: sc.nieh <[email protected]>

* fix(core): fix execute interval of ExecutionManager

Signed-off-by: sc.nieh <[email protected]>

* fix: add esUnit tag to unit tests of autobalancer

Signed-off-by: sc.nieh <[email protected]>

---------

Signed-off-by: sc.nieh <[email protected]>

---------

Signed-off-by: TheR1sing3un <[email protected]>
Signed-off-by: Curtis Wan <[email protected]>
Signed-off-by: Robin Han <[email protected]>
Signed-off-by: sc.nieh <[email protected]>
Co-authored-by: TheR1sing3un <[email protected]>
Co-authored-by: Robin Han <[email protected]>
Co-authored-by: Shichao Nie <[email protected]>
1. support controller-kv client
2. change some logs' level from info to
trace

Signed-off-by: TheR1sing3un <[email protected]>
1. replace object-id with order-id for WAL

Signed-off-by: TheR1sing3un <[email protected]>
1. return first object id in preparedObjectResponse

Signed-off-by: TheR1sing3un <[email protected]>
…ager` (#64)

* refactor(s3): remove inflight wal objects

1. remove inflight wal objects
2. delete redundant classes

Signed-off-by: TheR1sing3un <[email protected]>

* feat(s3): support blocking `getObjects` in `StreamMetadataManager`

1. support blocking `getObjects` in `StreamMetadataManager`
2. refactor
`getObjects`
3. change the unit of `s3.cache.size` from `MB`
to `B`

Signed-off-by: TheR1sing3un <[email protected]>

* feat(s3): more suitable log level

1. more suitable log level

Signed-off-by: TheR1sing3un <[email protected]>

* fix(s3): add more concurrent protection

1. add more concurrent protection

Signed-off-by: TheR1sing3un <[email protected]>

---------

Signed-off-by: TheR1sing3un <[email protected]>
1. handle stream objects commit in controller

Signed-off-by: TheR1sing3un <[email protected]>
1. handle invalid commit request

Signed-off-by: TheR1sing3un <[email protected]>
* feat: rename GetStreamsOffset to GetOpeningStreams

Signed-off-by: Robin Han <[email protected]>

* feat: get broker opening streams

Signed-off-by: Robin Han <[email protected]>

---------

Signed-off-by: Robin Han <[email protected]>
* feat(s3): unify S3 object metadata

1. unify S3 object metadata

Signed-off-by: TheR1sing3un <[email protected]>

* fix(s3): fix after rebasing

1. fix after rebasing

Signed-off-by: TheR1sing3un <[email protected]>

* feat(s3): log stream-objects' info when commit wal object

1. log stream-objects' info when commit wal object

Signed-off-by: TheR1sing3un <[email protected]>

---------

Signed-off-by: TheR1sing3un <[email protected]>
1. support retry to-controller request

Signed-off-by: TheR1sing3un <[email protected]>
* feat: get broker opening streams

Signed-off-by: Robin Han <[email protected]>

* feat: recover from crash (WAL buggy version)

Signed-off-by: Robin Han <[email protected]>

---------

Signed-off-by: Robin Han <[email protected]>
* fix(s3): fixed bugs on uploading WAL object during compaction

- eliminate race condition on uploading WAL object by using sequential
writing
- prevent blocking on multipart upload when part size is less
than MIN_PART_SIZE

Signed-off-by: Shichao Nie <[email protected]>

* fix(s3): used pooled buffer in DataBlockWriter; add newFixedThreadPool to Threads

Signed-off-by: Shichao Nie <[email protected]>

---------

Signed-off-by: Shichao Nie <[email protected]>
SCNieh and others added 21 commits February 26, 2024 18:12
* feat(metrics): metrics on fetch limiters

Signed-off-by: Ning Yu <[email protected]>

* feat(metrics): metrics on fetch executors' queue size

Signed-off-by: Ning Yu <[email protected]>

* style: fix check style

Signed-off-by: Ning Yu <[email protected]>

---------

Signed-off-by: Ning Yu <[email protected]>
- deduplication for consumer lag calculation on partition reassignment
- add get/put object avg latency panels

Signed-off-by: Shichao Nie <[email protected]>
* fix(telemetry): add missing percentile metrics

Signed-off-by: Shichao Nie <[email protected]>

* fix(telemetry): increase metric expiration time for collector

Signed-off-by: Shichao Nie <[email protected]>

---------

Signed-off-by: Shichao Nie <[email protected]>
@CLAassistant
Copy link

CLAassistant commented Mar 5, 2024

CLA assistant check
All committers have signed the CLA.

@daniel-y daniel-y force-pushed the s3stream-to-trunk branch from d044671 to ba76df0 Compare March 5, 2024 04:50
@superhx superhx merged commit bcfd816 into merge_trunk Mar 5, 2024
1 check passed
@superhx superhx deleted the s3stream-to-trunk branch March 5, 2024 04:51
@daniel-y daniel-y restored the s3stream-to-trunk branch March 14, 2024 08:50
daniel-y pushed a commit that referenced this pull request Mar 14, 2024
ShadowySpirits added a commit that referenced this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants