Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FORMAT option in LOAD FROM clause. #4613

Merged
merged 6 commits into from
Dec 10, 2024
Merged

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Dec 9, 2024

This PR implements the LOAD FROM clause with FORMAT option as described in #4597 .
resolves #4597

Usage:

  1. If the format option is not given, kuzu tries to sniff the file format using the file extension. e.g. test.csv => kuzu treats it as CSV format since it ends with .CSV.
  2. if the format option is given, kuzu uses the format option instead of doing the sniffing.

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 87.40%. Comparing base (a9cd504) to head (828b354).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/binder/bind/bind_file_scan.cpp 83.33% 2 Missing ⚠️
src/binder/binder.cpp 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4613      +/-   ##
==========================================
+ Coverage   87.38%   87.40%   +0.02%     
==========================================
  Files        1363     1363              
  Lines       57608    57614       +6     
  Branches     7184     7185       +1     
==========================================
+ Hits        50340    50357      +17     
+ Misses       7102     7091      -11     
  Partials      166      166              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Dec 9, 2024

Benchmark Result

Master commit hash: aa102a0a41877a13e882c1439b6f1a2c5ef7d247
Branch commit hash: bf2ba0a68c047a215926174cb5b79162a9bb1694

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 635.75 787.80 -152.06 (-19.30%)
aggregation q28 11936.58 12815.54 -878.95 (-6.86%)
filter q14 119.04 131.23 -12.19 (-9.29%)
filter q15 120.83 142.66 -21.83 (-15.30%)
filter q16 305.22 304.62 0.60 (0.20%)
filter q17 435.89 492.50 -56.61 (-11.50%)
filter q18 1885.15 1976.33 -91.18 (-4.61%)
filter zonemap-node 78.16 93.40 -15.24 (-16.31%)
filter zonemap-node-lhs-cast 80.23 92.67 -12.44 (-13.43%)
filter zonemap-rel 5461.25 5613.99 -152.74 (-2.72%)
fixed_size_expr_evaluator q07 563.11 617.17 -54.06 (-8.76%)
fixed_size_expr_evaluator q08 795.80 815.22 -19.41 (-2.38%)
fixed_size_expr_evaluator q09 793.66 864.29 -70.64 (-8.17%)
fixed_size_expr_evaluator q10 228.89 253.96 -25.06 (-9.87%)
fixed_size_expr_evaluator q11 219.96 246.73 -26.77 (-10.85%)
fixed_size_expr_evaluator q12 217.78 242.79 -25.00 (-10.30%)
fixed_size_expr_evaluator q13 1446.99 1473.64 -26.65 (-1.81%)
fixed_size_seq_scan q23 104.35 129.23 -24.88 (-19.25%)
join q29 577.80 628.61 -50.81 (-8.08%)
join q30 1496.13 1597.06 -100.93 (-6.32%)
join q31 4.63 4.57 0.05 (1.18%)
ldbc_snb_ic q35 2664.24 2613.42 50.81 (1.94%)
ldbc_snb_ic q36 527.17 513.30 13.87 (2.70%)
ldbc_snb_is q32 5.68 5.30 0.38 (7.27%)
ldbc_snb_is q33 12.59 10.99 1.61 (14.61%)
ldbc_snb_is q34 1.24 0.99 0.25 (25.02%)
multi-rel multi-rel-large-scan 1293.34 1210.94 82.40 (6.80%)
multi-rel multi-rel-lookup 18.90 18.35 0.54 (2.96%)
multi-rel multi-rel-small-scan 82.32 94.84 -12.52 (-13.21%)
order_by q25 124.86 140.32 -15.45 (-11.01%)
order_by q26 440.69 458.84 -18.15 (-3.96%)
order_by q27 1447.42 1474.79 -27.37 (-1.86%)
recursive_join recursive-join-bidirection 314.49 283.36 31.13 (10.99%)
recursive_join recursive-join-dense 7366.18 7407.54 -41.36 (-0.56%)
recursive_join recursive-join-path 23804.13 23694.37 109.76 (0.46%)
recursive_join recursive-join-sparse 14408.97 14837.02 -428.04 (-2.88%)
recursive_join recursive-join-trail 7329.62 7422.58 -92.96 (-1.25%)
scan_after_filter q01 162.33 179.56 -17.23 (-9.60%)
scan_after_filter q02 148.10 163.61 -15.52 (-9.48%)
shortest_path_ldbc100 q37 90.68 92.28 -1.60 (-1.73%)
shortest_path_ldbc100 q38 356.51 303.53 52.98 (17.45%)
shortest_path_ldbc100 q39 58.32 67.89 -9.57 (-14.10%)
shortest_path_ldbc100 q40 410.56 391.46 19.10 (4.88%)
var_size_expr_evaluator q03 2036.55 2112.28 -75.73 (-3.59%)
var_size_expr_evaluator q04 2188.82 2245.94 -57.11 (-2.54%)
var_size_expr_evaluator q05 2581.59 2588.09 -6.50 (-0.25%)
var_size_expr_evaluator q06 1342.27 1335.93 6.34 (0.47%)
var_size_seq_scan q19 1426.45 1493.51 -67.06 (-4.49%)
var_size_seq_scan q20 2430.08 2786.99 -356.90 (-12.81%)
var_size_seq_scan q21 2287.46 2362.38 -74.92 (-3.17%)
var_size_seq_scan q22 122.27 126.07 -3.79 (-3.01%)

} catch (...) {
if (typeInfo.fileTypeStr == "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cover

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already covered by extension test which doesn't show up correctly in codecov

@acquamarin acquamarin merged commit 0a68e24 into master Dec 10, 2024
25 checks passed
@acquamarin acquamarin deleted the load-from-type branch December 10, 2024 03:43
ray6080 pushed a commit that referenced this pull request Dec 17, 2024
ray6080 pushed a commit that referenced this pull request Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Extend LOAD FROM clause with TYPE option
2 participants