Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a new disk-based WAL implementation for standalone deployment #1552

Merged
merged 10 commits into from
Aug 16, 2024

Conversation

dracoooooo
Copy link
Contributor

@dracoooooo dracoooooo commented Aug 4, 2024

Rationale

#1279

Detailed Changes

  1. Added a struct Segment responsible for reading and writing segment files, and it records the offset of each record.
  2. Add a struct SegmentManager responsible for managing all segments, including:
    1. Reading all segments from the folder upon creation.
    2. Writing only to the segment with the largest ID.
    3. Maintaining a cache where segments not in the cache are closed, while segments in the cache have their files open and are memory-mapped using mmap.
  3. Implement the WalManager trait.

Test Plan

Unit tests.

Todos

  • Implement a disk-based WAL that can pass the existing unit tests.
    • write
    • read
    • scan
    • delete
    • multiple segments
  • Remove unwarp and handle errors.
  • Add unit tests for the new code.
  • Test on large-scale data.
  • Compare with the existing RocksDB WAL implementation and optimize performance.

@github-actions github-actions bot added the feature New feature or request label Aug 4, 2024
@jiacai2050 jiacai2050 self-requested a review August 7, 2024 02:49
@dracoooooo dracoooooo marked this pull request as ready for review August 9, 2024 07:02
src/wal/src/local_storage_impl/wal_manager.rs Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
@dracoooooo dracoooooo requested a review from jiacai2050 August 15, 2024 01:13
src/wal/src/local_storage_impl/record_encoding.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
@dracoooooo dracoooooo requested a review from jiacai2050 August 15, 2024 13:59
Copy link
Contributor

@jiacai2050 jiacai2050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacai2050 jiacai2050 merged commit 8a28840 into apache:main Aug 16, 2024
9 checks passed
jiacai2050 pushed a commit that referenced this pull request Sep 4, 2024
…sk (#1556)

## Rationale
Improving WAL based on local disk.

This is a follow-up task for #1552.

## Detailed Changes
1. Make MAX_FILE_SIZE configurable.
2. Allocate enough space when creating a segment to avoid remapping when
appending to the segment.​
3. Add `MultiSegmentLogIterator` to enable cross-segment reading.
4. When writing, if the current segment has insufficient space, create a
new segment and write to the new segment.​

## Test Plan
Unit test.
jiacai2050 pushed a commit that referenced this pull request Sep 14, 2024
## Rationale

Currently the WAL based on the local disk does not support the delete
function. This PR implements that functionality.

This is a follow-up task of #1552 and #1556.

## Detailed Changes

1. For each `Segment`, add a hashmap to record the minimum and maximum
sequence numbers of all tables within that segment. During `delete` and
`write` operations, this hashmap will be updated. During read
operations, logs will be filtered based on this hashmap.

2. During the `delete` operation, based on the aforementioned hashmap,
if all logs of all tables in a read-only segment (a segment that is not
currently being written to) are marked as deleted, the segment file will
be physically deleted from the disk.

## Test Plan

Unit test, TSBS and running a script locally that repeatedly inserts
data, forcibly kills, and restarts the database process to test
persistence.
LeslieKid pushed a commit to LeslieKid/horaedb that referenced this pull request Sep 25, 2024
…nt (apache#1552)

## Rationale

apache#1279

## Detailed Changes

1. Added a struct `Segment` responsible for reading and writing segment
files, and it records the offset of each record.
2. Add a struct SegmentManager responsible for managing all segments,
including:
	1.	Reading all segments from the folder upon creation.
	2.	Writing only to the segment with the largest ID.
3. Maintaining a cache where segments not in the cache are closed, while
segments in the cache have their files open and are memory-mapped using
mmap.
3. Implement the `WalManager` trait.

## Test Plan

Unit tests.
LeslieKid pushed a commit to LeslieKid/horaedb that referenced this pull request Sep 25, 2024
…sk (apache#1556)

## Rationale
Improving WAL based on local disk.

This is a follow-up task for apache#1552.

## Detailed Changes
1. Make MAX_FILE_SIZE configurable.
2. Allocate enough space when creating a segment to avoid remapping when
appending to the segment.​
3. Add `MultiSegmentLogIterator` to enable cross-segment reading.
4. When writing, if the current segment has insufficient space, create a
new segment and write to the new segment.​

## Test Plan
Unit test.
LeslieKid pushed a commit to LeslieKid/horaedb that referenced this pull request Sep 25, 2024
…he#1566)

## Rationale

Currently the WAL based on the local disk does not support the delete
function. This PR implements that functionality.

This is a follow-up task of apache#1552 and apache#1556.

## Detailed Changes

1. For each `Segment`, add a hashmap to record the minimum and maximum
sequence numbers of all tables within that segment. During `delete` and
`write` operations, this hashmap will be updated. During read
operations, logs will be filtered based on this hashmap.

2. During the `delete` operation, based on the aforementioned hashmap,
if all logs of all tables in a read-only segment (a segment that is not
currently being written to) are marked as deleted, the segment file will
be physically deleted from the disk.

## Test Plan

Unit test, TSBS and running a script locally that repeatedly inserts
data, forcibly kills, and restarts the database process to test
persistence.
LeslieKid added a commit to LeslieKid/horaedb that referenced this pull request Sep 27, 2024
refactor: partitioned_lock's elaboration (apache#1540)

Extended the `try_new` interface while keeping the old one for
compatibility.

* Implemented the `try_new_suggest_cap` method, while changing the old
`try_new` method to `try_new_bit_len` to ensure compatibility.
* Modified structs and functions that call old interfaces.

* Added new unit tests
* Passed CI test

---------

Co-authored-by: chunhao.ch <[email protected]>

feat: support INSERT INTO SELECT (apache#1536)

Close  apache#557.

When generating the insert logical plan, alse generate the select logical plan and store it in the insert plan. Then execute the select logical plan in the insert interpreter, convert the result records into RowGroup and then insert it.

CI

refactor: insert select to stream mode (apache#1544)

Close apache#1542

Do select and insert procedure in stream way.

CI test.

---------

Co-authored-by: jiacai2050 <[email protected]>

fix(comment): update error documentation comment for remote engine service (apache#1548)

Updating an error comment in the code to reflect the correct service
name is needed.

No need

refactor: manifest error code (apache#1546)

fix: sequence overflow when dropping a table using a message queue as WAL (apache#1550)

Fix the issue of sequence overflow when dropping a table using a message
queue as WAL.
close apache#1543

Check the maximum value of sequence to prevent overflow.

CI.

feat: Add a new disk-based WAL implementation for standalone deployment (apache#1552)

1. Added a struct `Segment` responsible for reading and writing segment
files, and it records the offset of each record.
2. Add a struct SegmentManager responsible for managing all segments,
including:
	1.	Reading all segments from the folder upon creation.
	2.	Writing only to the segment with the largest ID.
3. Maintaining a cache where segments not in the cache are closed, while
segments in the cache have their files open and are memory-mapped using
mmap.
3. Implement the `WalManager` trait.

Unit tests.

chore: upgrade object store version (apache#1541)

The object store version is upgraded to 0.10.1 to prepare for access to
opendal

- Impl AsyncWrite for ObjectStoreMultiUpload
- Impl MultipartUpload for ObkvMultiPartUpload
- Adapt new api on query writing path

- Existing tests

---------

Co-authored-by: jiacai2050 <[email protected]>

feat: use opendal to access  underlying storage (apache#1557)

Use opendal to access the object store, thus unifying the access method
of the underlying storage.

- use opendal to access s3/oss/local file

- Existed tests

feat: add metric engine rfc (apache#1558)

RFC for next metric engine.

No need.

chore: update link (apache#1561)

I noticed that the previous repository has been archived, maybe it would
be better to update the new link

chore(horaemeta): add building docs (apache#1562)

feat: Implementing cross-segment read/write for WAL based on local disk (apache#1556)

Improving WAL based on local disk.

This is a follow-up task for apache#1552.

1. Make MAX_FILE_SIZE configurable.
2. Allocate enough space when creating a segment to avoid remapping when
appending to the segment.​
3. Add `MultiSegmentLogIterator` to enable cross-segment reading.
4. When writing, if the current segment has insufficient space, create a
new segment and write to the new segment.​

Unit test.

chore: fix doc links (apache#1565)

fix: disable layered memtable in overwrite mode (apache#1533)

Layered memtable is only designed for append mode table now, and it
shouldn't be used in overwrite mode table.

- Make default values in config used.
- Add `enable` field to control layered memtable's on/off.
- Add check to prevent invalid options during table create/alter.
- Add related it cases.

Test manually.

Following cases are considered:

Check and intercept the invalid table options during table create/alter
- enable layered memtable but mutable switch threshold is 0
- enable layered memtable for overwrite mode table

Table options new field `layered_enable`'s default value when it is not
found in pb
- false, when whole `layered_memtable_options` not exist
- false, when `layered_memtable_options` exist, and
`mutable_segment_switch_threshold` == 0
- true, when `layered_memtable_options` exist, and
`mutable_segment_switch_threshold` > 0

feat: init metric engine structure (apache#1554)

See apache#1558

Add a new sub directory `horaedb`, all source codes for metric engine
are under it.

Add a new ci.

feat: Implement delete operation for WAL based on local storage (apache#1566)

Currently the WAL based on the local disk does not support the delete
function. This PR implements that functionality.

This is a follow-up task of apache#1552 and apache#1556.

1. For each `Segment`, add a hashmap to record the minimum and maximum
sequence numbers of all tables within that segment. During `delete` and
`write` operations, this hashmap will be updated. During read
operations, logs will be filtered based on this hashmap.

2. During the `delete` operation, based on the aforementioned hashmap,
if all logs of all tables in a read-only segment (a segment that is not
currently being written to) are marked as deleted, the segment file will
be physically deleted from the disk.

Unit test, TSBS and running a script locally that repeatedly inserts
data, forcibly kills, and restarts the database process to test
persistence.

fix: support to compat the old layered memtable options (apache#1568)

We introduce the explicit flag to control should we enable layered
memtable, but it has some compatibility problem when upgrading from old
version.
This pr add an option to support compating the old layered memtable
on/off control method.

Add an option to support compating the old layered memtable on/off
control method.

Manually.

chore: record replay cost in log (apache#1569)

1. Add replay cost in log
2. Remove verbose http log
3. Recover default to shard based, which is faster in most wal
implementation.

fix: logs might be missed during RegionBased replay in the WAL based on local disk (apache#1570)

In RegionBased replay, a batch of logs is first scanned from the WAL,
and then replayed on various tables using multiple threads. This
approach works fine for WALs based on tables, as the logs for each table
are clustered together. However, in a WAL based on local disk, the logs
for each table may be scattered across different positions within the
batch. During multi-threaded replay, it is possible that for a given
table, log2 is replayed before log1, resulting in missed logs.

1. Modify `split_log_batch_by_table` function to aggregate all logs for
a table together.
2. Modify `tableBatch` struct to change a single range into a
`Vec<Range>`.

Manual testing.

fix format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants