Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from apache:main #55

Open
wants to merge 90 commits into
base: main
Choose a base branch
from
Open

[pull] main from apache:main #55

wants to merge 90 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Apr 17, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

chunshao90 and others added 2 commits April 17, 2024 14:12
## Rationale
Update protected_tags in .asf.yaml.

## Detailed Changes
Temporarily ignore tag protection in order to remove erroneous tags
v2.0.0 .

## Test Plan
No need.
@pull pull bot added the ⤵️ pull label Apr 18, 2024
chunshao90 and others added 27 commits April 18, 2024 17:21
## Rationale
Check whether the PR title is valid.

## Detailed Changes
Valid PR title start with: `feat|fix|refactor|chore|docs`

## Test Plan
Manual test.
## Rationale
Refer to
https://github.com/apache/incubator-horaedb/actions/runs/8779713550
```
thehanimo/[email protected] is not allowed to be used in apache/incubator-horaedb. Actions in this workflow must be: within a repository owned by apache, created by GitHub, verified in the GitHub Marketplace
```

## Detailed Changes
Use custom script to run.

## Test Plan
CI.
#1522)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.22.0 to
0.23.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/golang/net/commit/c48da131589f122489348be5dfbcb6457640046f"><code>c48da13</code></a>
http2: fix TestServerContinuationFlood flakes</li>
<li><a
href="https://github.com/golang/net/commit/762b58d1cf6e0779780decad89c6c1523386638d"><code>762b58d</code></a>
http2: fix tipos in comment</li>
<li><a
href="https://github.com/golang/net/commit/ba872109ef2dc8f1da778651bd1fd3792d0e4587"><code>ba87210</code></a>
http2: close connections when receiving too many headers</li>
<li><a
href="https://github.com/golang/net/commit/ebc8168ac8ac742194df729305175940790c55a2"><code>ebc8168</code></a>
all: fix some typos</li>
<li><a
href="https://github.com/golang/net/commit/3678185f8a652e52864c44049a9ea96b7bcc066a"><code>3678185</code></a>
http2: make TestCanonicalHeaderCacheGrowth faster</li>
<li><a
href="https://github.com/golang/net/commit/448c44f9287b6745f958d74aa2a17ec7761c2f13"><code>448c44f</code></a>
http2: remove clientTester</li>
<li><a
href="https://github.com/golang/net/commit/c7877ac4213b2f859831366f5a35b353e0dc9f66"><code>c7877ac</code></a>
http2: convert the remaining clientTester tests to testClientConn</li>
<li><a
href="https://github.com/golang/net/commit/d8870b0bf2f2426fc8d19a9332f652da5c25418f"><code>d8870b0</code></a>
http2: use synthetic time in TestIdleConnTimeout</li>
<li><a
href="https://github.com/golang/net/commit/d73acffdc9493532acb85777105bb4a351eea702"><code>d73acff</code></a>
http2: only set up deadline when Server.IdleTimeout is positive</li>
<li><a
href="https://github.com/golang/net/commit/89f602b7bbf237abe0467031a18b42fc742ced08"><code>89f602b</code></a>
http2: validate client/outgoing trailers</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/net/compare/v0.22.0...v0.23.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.22.0&new-version=0.23.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/apache/incubator-horaedb/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ion_tests/sdk/go (#1521)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.17.0 to
0.23.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/golang/net/commit/c48da131589f122489348be5dfbcb6457640046f"><code>c48da13</code></a>
http2: fix TestServerContinuationFlood flakes</li>
<li><a
href="https://github.com/golang/net/commit/762b58d1cf6e0779780decad89c6c1523386638d"><code>762b58d</code></a>
http2: fix tipos in comment</li>
<li><a
href="https://github.com/golang/net/commit/ba872109ef2dc8f1da778651bd1fd3792d0e4587"><code>ba87210</code></a>
http2: close connections when receiving too many headers</li>
<li><a
href="https://github.com/golang/net/commit/ebc8168ac8ac742194df729305175940790c55a2"><code>ebc8168</code></a>
all: fix some typos</li>
<li><a
href="https://github.com/golang/net/commit/3678185f8a652e52864c44049a9ea96b7bcc066a"><code>3678185</code></a>
http2: make TestCanonicalHeaderCacheGrowth faster</li>
<li><a
href="https://github.com/golang/net/commit/448c44f9287b6745f958d74aa2a17ec7761c2f13"><code>448c44f</code></a>
http2: remove clientTester</li>
<li><a
href="https://github.com/golang/net/commit/c7877ac4213b2f859831366f5a35b353e0dc9f66"><code>c7877ac</code></a>
http2: convert the remaining clientTester tests to testClientConn</li>
<li><a
href="https://github.com/golang/net/commit/d8870b0bf2f2426fc8d19a9332f652da5c25418f"><code>d8870b0</code></a>
http2: use synthetic time in TestIdleConnTimeout</li>
<li><a
href="https://github.com/golang/net/commit/d73acffdc9493532acb85777105bb4a351eea702"><code>d73acff</code></a>
http2: only set up deadline when Server.IdleTimeout is positive</li>
<li><a
href="https://github.com/golang/net/commit/89f602b7bbf237abe0467031a18b42fc742ced08"><code>89f602b</code></a>
http2: validate client/outgoing trailers</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/net/compare/v0.17.0...v0.23.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.17.0&new-version=0.23.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/apache/incubator-horaedb/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Rationale
Automatically label the pr based on the pr title.

## Detailed Changes
Automatically label the pr based on the pr title.

## Test Plan
Manual test.
## Rationale
Currently, not suppport to delete a partitioned table through HTTP API,
as it results in a "shard not found" error.

## Detailed Changes
Since the partition table is not assigned to a shard, deleting the table
metadata directly.

## Test Plan
 Manual test.
## Rationale
Part of #1513

## Detailed Changes
Replace snafu-based Error to thiserror-based for memtable module.

## Test Plan
CI

---------

Co-authored-by: chunshao.rcs <[email protected]>
## Rationale
Currently, disable wal in standalone mode will directly cause panic.
This behavior is incorrect. Sometimes we also need to disable wal in
standalone mode.

## Detailed Changes
* Disable wal in stand-alone mode will not panic and print a log to warn
the user.

## Test Plan
Pass CI.
## Rationale
Add dingtalk group for better communication with users in Chinese.

## Detailed Changes


## Test Plan
No need.
## Rationale
After donate ceresdb to ASF, we should publish docker image under
apache.
- https://hub.docker.com/r/apache/horaemeta-server
- https://hub.docker.com/r/apache/horaedb-server

## Detailed Changes
Use `DOCKERHUB_USER ` `DOCKERHUB_TOKEN` to publish image. See details:
https://issues.apache.org/jira/browse/INFRA-25736

## Test Plan
Manually.
-
https://github.com/jiacai2050/incubator-horaedb/actions/runs/9029607730

---------

Co-authored-by: chunshao.rcs <[email protected]>
## Rationale

Close #1285 

Install mysql-client and grafana when building docker image.

## Detailed Changes

1. Install mysql-client and grafana in Dockerfile.
2. Add docker/datasource.yml as grafana default datasource.
3. Start grafana server in entrypoint.sh.

### mysql-client

In the docker container:

<img width="640" alt="image"
src="https://github.com/apache/incubator-horaedb/assets/55609330/5c59fd23-c54e-4761-8833-51355a81fada">

### grafana

Start the container and access http://<your_ip>:3000

<img width="1438" alt="image"
src="https://github.com/apache/incubator-horaedb/assets/55609330/34a2718e-3803-4e01-b384-39ca64dea7b7">

## Test Plan

None.
## Rationale
Close #929

## Detailed Changes
- Added file authentication
- Modify the query and write paths, and add authentication

## Test Plan
- Existed tests
- Manual tests

---------

Co-authored-by: jiacai2050 <[email protected]>
## Rationale
We just panic but log nothing when found two tables with the same table
name, it turns debugging into a disaster...

## Detailed Changes
Log the needed table infos when found two tables with the same table
name before panic.

## Test Plan
Test maually.
## Rationale
Extended the `try_new` interface while keeping the old one for
compatibility.

## Detailed Changes
* Implemented the `try_new_suggest_cap` method, while changing the old
`try_new` method to `try_new_bit_len` to ensure compatibility.
* Modified structs and functions that call old interfaces.

## Test Plan
* Added new unit tests
* Passed CI test

---------

Co-authored-by: chunhao.ch <[email protected]>
## Rationale

Close  #557.

## Detailed Changes

When generating the insert logical plan, alse generate the select logical plan and store it in the insert plan. Then execute the select logical plan in the insert interpreter, convert the result records into RowGroup and then insert it.

## Test Plan

CI
## Rationale
Close #1542 

## Detailed Changes
Do select and insert procedure in stream way.

## Test Plan
CI test.

---------

Co-authored-by: jiacai2050 <[email protected]>
…rvice (#1548)

## Rationale
Updating an error comment in the code to reflect the correct service
name is needed.

## Test Plan
No need
… WAL (#1550)

## Rationale
Fix the issue of sequence overflow when dropping a table using a message
queue as WAL.
close #1543 

## Detailed Changes
Check the maximum value of sequence to prevent overflow.

## Test Plan
CI.
…nt (#1552)

## Rationale

#1279

## Detailed Changes

1. Added a struct `Segment` responsible for reading and writing segment
files, and it records the offset of each record.
2. Add a struct SegmentManager responsible for managing all segments,
including:
	1.	Reading all segments from the folder upon creation.
	2.	Writing only to the segment with the largest ID.
3. Maintaining a cache where segments not in the cache are closed, while
segments in the cache have their files open and are memory-mapped using
mmap.
3. Implement the `WalManager` trait.

## Test Plan

Unit tests.
## Rationale
The object store version is upgraded to 0.10.1 to prepare for access to
opendal

## Detailed Changes
- Impl AsyncWrite for ObjectStoreMultiUpload
- Impl MultipartUpload for ObkvMultiPartUpload
- Adapt new api on query writing path

## Test Plan
- Existing tests

---------

Co-authored-by: jiacai2050 <[email protected]>
## Rationale
Use opendal to access the object store, thus unifying the access method
of the underlying storage.

## Detailed Changes
- use opendal to access s3/oss/local file

## Test Plan
- Existed tests
## Rationale

RFC for next metric engine.

## Detailed Changes


## Test Plan

No need.
## Rationale
I noticed that the previous repository has been archived, maybe it would
be better to update the new link

## Detailed Changes


## Test Plan
zealchen and others added 30 commits November 25, 2024 17:35
## Rationale
Close #1441 

## Detailed Changes
### TLDR
The performance issue with inlist queries is due to the extra overhead
from bloom-filter-like directory lookups when scanning each SST file for
rows. The solution is to create a separate predicate for each partition,
containing only the keys relevant to that partition. Since the current
partition filter only supports BinaryExpr(Column, operator, Literal) and
non-negated InList expressions, this solution will address only those
specific cases.

### Changes
1. During the scan building process, when identifying the partitions for
a query, we create a PartitionedFilterKeyIndex variable to store the
predicate key indices for each expression.
2. In the compute_partition_for_keys_group function, we use a
HashMap<partition_id, HashMap<filter_index, BTreeSet<key_index>>> to
record the indices of keys involved in partition computation for each
group.
3. In the partitioned_predicates function, we construct the final
predicates for each partition.
4. In resolve_partitioned_scan_internal, we generate separate requests
for each partition.

e.g.
conditions:
1. table schema: col_ts, col1, col2, in which col1 and col2 are both
keys,
     and with two partitions
2. sql: select * from table where col1 = '33' and col2 in ("aa", "bb",
     "cc", "dd")

partition expectations:
   yield two predicates
      p0: col1 = '33' and col2 in ("aa", "bb", "cc");
      p1: col1 = '33' and col2 in ("dd")

### Other issues discovered
When the inlist key args length is less than three, Expr will be
refactored to nested BinaryExpr which bypasses the FilterExtractor.

e.g.
SQL: select * from table where col1 in ("aa", "bb") and col2 in
(1,2,3,4,5...1000)
Since ("aa", "bb") has fewer than three elements, the col1 key filter is
not included in partition computation, which interrupts the partitioning
process in the get_candidate_partition_keys_groups function, as
contains_empty_filter is set to true.


## Test Plan
1. UT: test_partitioned_predicate
2. Manual test.

---------

Co-authored-by: jiacai2050 <[email protected]>
## Rationale
Setup the basic structure for compaction

## Detailed Changes


## Test Plan
Old CI, not scheduler has no tests now.
## Rationale
Implement pick_candidate for compaction

## Detailed Changes


## Test Plan
UT

---------

Co-authored-by: jiacai2050 <[email protected]>
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.21.0 to 0.31.0.
- [Commits](golang/crypto@v0.21.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
…golang.org/x/crypto-0.31.0

chore(deps): bump golang.org/x/crypto from 0.21.0 to 0.31.0 in /horaemeta
## Rationale
Close #1600 

## Detailed Changes
Creating a new data format to represent the manifest snapshot file.
```text
| magic(u32) | version(u8) |  flags(u8) | length(u64) | Record(N) ... |

The Magic field (u32) is used to ensure the validity of the data source.
The Flags field (u8) is reserved for future extensibility, such as enabling compression or supporting additional features.
The length field (u64) represents the total length of the subsequent records and serves as a straightforward method for verifying their integrity. (length = record_length * record_count)

# Record is a self-descriptive message
| id(u64) | time_range(i64*2)| size(u32) |  num_rows(u32)|
```

In do_merge, the snapshot data handle is like:

```text
Old data flow in do_merge:
                                      delta_sstmetas
                                             | (extend vec)
                                             V                                
object_store -> org_bytes -> org_pb -> Vec<sstmeta> -> dst_pb -> dst_bytes -> object_store

New data flow in do_merge:
               delta_sstmetas -> bytes
                                  | (append)
                                  V                                
object_store -> org_bytes -> dst_bytes -> object_store
````

Specifically, I create the SnapshotHeader and SnapshotRecordV1 to
represent the corresponding data in snapshot bytes. Before merging delta
sstfiles into new bytes, we allocate a larger Vec `<u8>` and copy each
segment (header, old records, new records) into it.

This RP DOES NOT address format upgrade logic which can be resolved in a
separate PR. As for the upgrade, we could define a new SnapshotRecord
format and perform data migration in Manifest::try_new.


## Test Plan
UT

---------

Co-authored-by: jiacai2050 <[email protected]>
## Rationale
Close #1583

For rows with same primary key, we need to choose which value to use,
the answer is MergeOperator

## Detailed Changes
- Add MergeOperator trait, and add two implementations.

## Test Plan
CI
## Rationale

To maintain system stability and code clarity, it would be advisable to
relocate the legacy engine to a separate branch for dedicated
maintenance, preventing potential complications from mixing old and new
engine implementations.


## Detailed Changes


## Test Plan
CI
## Rationale
Compaction runner is responsible for compact old sst & delete expired
sst.

## Detailed Changes


## Test Plan
CI
## Rationale
When compact finished, we need to delete the old input sst and expired
sst.

## Detailed Changes
- The delta file use `ManifestUpdate` struct.
- Refactor compact scheduler, to make it more modular.
## Test Plan
CI
## Rationale
Followup #1610, to make manifest more modular.

## Detailed Changes


## Test Plan
CI
## Rationale


## Detailed Changes


## Test Plan
CI
## Rationale

#1600 


## Detailed Changes

Additional benchmarks for this issue

## Test Plan

Append 100 new delta sstfiles to snapshot with 1000 records.

```sh
Benchmarking bench_encoding/new_format_encoding/0: Collecting 10 samples in estimated 5.0005 s (494k iterations)
bench_encoding/new_format_encoding/0
                        time:   [10.232 µs 10.611 µs 10.900 µs]
                        change: [-1.5176% +2.1745% +5.9315%] (p = 0.30 > 0.05)
                        No change in performance detected.
```

---------

Co-authored-by: jiacai2050 <[email protected]>
## Rationale
In order to test `TimeMergeStorage`, I plan to generate random arrow
data and trigger compaction via http API

## Detailed Changes


## Test Plan
Not required.
## Rationale


## Detailed Changes
- Server can read config from cli args
- Start 4 write worker to bench write.

## Test Plan
CI
## Rationale
Fix bugs found in local write bench.

## Detailed Changes
- When manifest starts up, the delta num may overflow since it's
initialized to 0.
- When manifest merge_update, since the deltas is unsorted, so we may
first delete a non-existing file.

## Test Plan
CI
## Rationale
- support scan with predicate

## Detailed Changes


## Test Plan
CI
## Rationale


## Detailed Changes
- Refactor package metadata
- ParquetReader add `keep_sequence` args, and set it to `true`for
compaction, `false` for query.

## Test Plan

CI
```
 cargo publish --dry-run --registry crates-io
```
This commands run successfully.
## Rationale


## Detailed Changes
- add Clone & PartialEq in config
- use ReadableDuration & ReadableSize

## Test Plan
CI

---------

Co-authored-by: Jiacai Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.