Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Unicode \u and \U parsing to the cli #4492

Merged
merged 4 commits into from
Nov 19, 2024
Merged

Conversation

MSebanc
Copy link
Collaborator

@MSebanc MSebanc commented Nov 9, 2024

Description

Added Unicode \u and \U parsing for cli input.

Opening the database at path: test in read-write mode.
Enter ":help" for usage hints.
kuzu> CREATE NODE TABLE IF NOT EXISTS `B\u00fccher` (title STRING, price INT64, PRIMARY KEY (title));
┌────────────────────────────────┐
│ result                         │
│ STRING                         │
├────────────────────────────────┤
│ Table Bücher has been created. │
└────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.23ms (compiling), 14.05ms (executing)
kuzu> CREATE (n:`B\u00fccher` {title: 'Der Thron der Sieben Königreiche'}) SET n.price = 20;
(0 tuples)
(0 columns)
Time: 0.36ms (compiling), 4.00ms (executing)
kuzu> MATCH (n:Bücher) RETURN label(n);
┌─────────┐
│ Bücher  │
│ STRING  │
├─────────┤
│ Bücher  │
└─────────┘
(1 tuple)
(1 column)
Time: 0.27ms (compiling), 6.72ms (executing)
kuzu> return "\uD83D\uDE01";
┌────────┐
│ 😁     │
│ STRING │
├────────┤
│ 😁     │
└────────┘
(1 tuple)
(1 column)
Time: 0.07ms (compiling), 7.46ms (executing)
kuzu> return "\U0001F601";
┌────────┐
│ 😁     │
│ STRING │
├────────┤
│ 😁     │
└────────┘
(1 tuple)
(1 column)
Time: 0.07ms (compiling), 8.10ms (executing)

Fixes #4068

Contributor agreement

@MSebanc MSebanc requested a review from andyfengHKU November 9, 2024 02:34
Copy link

github-actions bot commented Nov 9, 2024

Benchmark Result

Master commit hash: 5a91794ce893ac44acc50a1322a5b348a052b5d6
Branch commit hash: 4298cdce861f6ae49bf4933d770d164873408a89

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 636.60 636.06 0.54 (0.08%)
aggregation q28 12182.38 11295.23 887.15 (7.85%)
filter q14 118.66 117.30 1.37 (1.17%)
filter q15 117.22 119.14 -1.93 (-1.62%)
filter q16 300.90 296.61 4.29 (1.45%)
filter q17 439.80 439.60 0.19 (0.04%)
filter q18 1914.69 1909.99 4.70 (0.25%)
filter zonemap-node 77.79 78.25 -0.46 (-0.59%)
filter zonemap-node-lhs-cast 78.83 79.10 -0.27 (-0.35%)
filter zonemap-rel 5398.79 5406.31 -7.52 (-0.14%)
fixed_size_expr_evaluator q07 535.91 536.86 -0.95 (-0.18%)
fixed_size_expr_evaluator q08 752.04 748.36 3.68 (0.49%)
fixed_size_expr_evaluator q09 750.67 755.89 -5.22 (-0.69%)
fixed_size_expr_evaluator q10 232.24 229.91 2.33 (1.01%)
fixed_size_expr_evaluator q11 228.57 225.39 3.18 (1.41%)
fixed_size_expr_evaluator q12 224.24 226.67 -2.43 (-1.07%)
fixed_size_expr_evaluator q13 1469.71 1471.80 -2.09 (-0.14%)
fixed_size_seq_scan q23 105.94 108.88 -2.94 (-2.70%)
join q29 582.91 638.20 -55.29 (-8.66%)
join q30 1357.67 1436.32 -78.64 (-5.48%)
join q31 1.59 4.10 -2.52 (-61.35%)
ldbc_snb_ic q35 416.99 431.58 -14.59 (-3.38%)
ldbc_snb_ic q36 110.19 100.13 10.05 (10.04%)
ldbc_snb_is q32 3.32 5.62 -2.30 (-40.85%)
ldbc_snb_is q33 10.44 10.80 -0.36 (-3.35%)
ldbc_snb_is q34 1.43 1.77 -0.34 (-19.30%)
multi-rel multi-rel-large-scan 1619.45 1776.05 -156.60 (-8.82%)
multi-rel multi-rel-lookup 31.66 41.42 -9.76 (-23.56%)
multi-rel multi-rel-small-scan 88.56 98.14 -9.58 (-9.76%)
order_by q25 120.67 123.04 -2.37 (-1.93%)
order_by q26 444.04 442.00 2.04 (0.46%)
order_by q27 1457.77 1444.33 13.43 (0.93%)
scan_after_filter q01 162.08 161.23 0.85 (0.53%)
scan_after_filter q02 147.40 150.37 -2.97 (-1.98%)
shortest_path_ldbc100 q37 83.62 77.47 6.14 (7.93%)
shortest_path_ldbc100 q38 459.79 450.99 8.80 (1.95%)
shortest_path_ldbc100 q39 59.97 66.36 -6.38 (-9.62%)
shortest_path_ldbc100 q40 527.45 536.40 -8.96 (-1.67%)
var_size_expr_evaluator q03 2051.06 2061.70 -10.65 (-0.52%)
var_size_expr_evaluator q04 2220.15 2225.12 -4.96 (-0.22%)
var_size_expr_evaluator q05 2608.90 2605.54 3.37 (0.13%)
var_size_expr_evaluator q06 1330.17 1326.50 3.67 (0.28%)
var_size_seq_scan q19 1437.88 1446.15 -8.27 (-0.57%)
var_size_seq_scan q20 2461.14 2476.03 -14.89 (-0.60%)
var_size_seq_scan q21 2271.08 2262.20 8.88 (0.39%)
var_size_seq_scan q22 126.00 125.74 0.27 (0.21%)

Copy link

codecov bot commented Nov 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.36%. Comparing base (5a91794) to head (36a721f).
Report is 23 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4492      +/-   ##
==========================================
- Coverage   87.37%   87.36%   -0.01%     
==========================================
  Files        1347     1348       +1     
  Lines       56364    56490     +126     
  Branches     7086     7107      +21     
==========================================
+ Hits        49249    49355     +106     
- Misses       6947     6967      +20     
  Partials      168      168              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Copy link

Benchmark Result

Master commit hash: 01bcd31b3e618545c7c49211af7ebaf3a78957d3
Branch commit hash: 695b9277fe2142f608d2506f1d1f2fcce20185ab

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 641.44 642.58 -1.14 (-0.18%)
aggregation q28 11457.50 11971.42 -513.92 (-4.29%)
filter q14 126.71 125.77 0.93 (0.74%)
filter q15 126.55 128.26 -1.71 (-1.34%)
filter q16 315.20 311.85 3.35 (1.07%)
filter q17 445.25 445.27 -0.02 (-0.00%)
filter q18 1920.26 1929.92 -9.66 (-0.50%)
filter zonemap-node 86.54 86.47 0.06 (0.07%)
filter zonemap-node-lhs-cast 87.13 88.31 -1.18 (-1.34%)
filter zonemap-rel 5314.44 5345.59 -31.15 (-0.58%)
fixed_size_expr_evaluator q07 543.12 544.44 -1.32 (-0.24%)
fixed_size_expr_evaluator q08 758.01 758.10 -0.09 (-0.01%)
fixed_size_expr_evaluator q09 756.14 756.69 -0.54 (-0.07%)
fixed_size_expr_evaluator q10 239.22 240.01 -0.79 (-0.33%)
fixed_size_expr_evaluator q11 232.92 234.68 -1.76 (-0.75%)
fixed_size_expr_evaluator q12 235.23 232.71 2.52 (1.08%)
fixed_size_expr_evaluator q13 1478.01 1469.16 8.86 (0.60%)
fixed_size_seq_scan q23 118.30 116.04 2.26 (1.95%)
join q29 649.54 592.00 57.55 (9.72%)
join q30 1422.53 1390.70 31.82 (2.29%)
join q31 3.66 2.44 1.22 (49.90%)
ldbc_snb_ic q35 411.45 405.49 5.96 (1.47%)
ldbc_snb_ic q36 127.18 130.35 -3.17 (-2.43%)
ldbc_snb_is q32 3.35 3.36 -0.02 (-0.45%)
ldbc_snb_is q33 10.32 10.76 -0.44 (-4.09%)
ldbc_snb_is q34 1.53 1.43 0.10 (6.79%)
multi-rel multi-rel-large-scan 2013.27 1996.29 16.98 (0.85%)
multi-rel multi-rel-lookup 11.43 31.89 -20.46 (-64.17%)
multi-rel multi-rel-small-scan 91.99 79.91 12.07 (15.11%)
order_by q25 138.00 132.92 5.08 (3.82%)
order_by q26 447.98 446.59 1.39 (0.31%)
order_by q27 1460.90 1465.47 -4.57 (-0.31%)
scan_after_filter q01 169.23 168.10 1.13 (0.67%)
scan_after_filter q02 157.87 158.42 -0.55 (-0.35%)
shortest_path_ldbc100 q37 80.52 88.79 -8.27 (-9.31%)
shortest_path_ldbc100 q38 463.36 459.13 4.23 (0.92%)
shortest_path_ldbc100 q39 63.46 60.27 3.18 (5.28%)
shortest_path_ldbc100 q40 543.61 530.73 12.88 (2.43%)
var_size_expr_evaluator q03 2065.83 2053.91 11.91 (0.58%)
var_size_expr_evaluator q04 2240.08 2241.45 -1.37 (-0.06%)
var_size_expr_evaluator q05 2582.13 2590.59 -8.47 (-0.33%)
var_size_expr_evaluator q06 1322.97 1324.78 -1.81 (-0.14%)
var_size_seq_scan q19 1452.63 1454.31 -1.68 (-0.12%)
var_size_seq_scan q20 2365.80 2366.67 -0.87 (-0.04%)
var_size_seq_scan q21 2267.71 2269.72 -2.00 (-0.09%)
var_size_seq_scan q22 127.87 128.07 -0.20 (-0.16%)

Copy link

Benchmark Result

Master commit hash: ad2f6ad754974ce509159afc57c765a2464e0634
Branch commit hash: 54d6091d3f6cfc118f31faa0c0004f38f328e303

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 643.23 644.93 -1.70 (-0.26%)
aggregation q28 11838.45 11477.81 360.65 (3.14%)
filter q14 129.18 125.39 3.79 (3.02%)
filter q15 127.35 127.01 0.35 (0.27%)
filter q16 308.48 308.78 -0.29 (-0.10%)
filter q17 446.77 444.88 1.89 (0.43%)
filter q18 1923.18 1930.87 -7.69 (-0.40%)
filter zonemap-node 86.46 86.50 -0.03 (-0.04%)
filter zonemap-node-lhs-cast 86.66 87.37 -0.71 (-0.81%)
filter zonemap-rel 5496.48 5466.53 29.95 (0.55%)
fixed_size_expr_evaluator q07 544.60 545.21 -0.61 (-0.11%)
fixed_size_expr_evaluator q08 761.73 761.74 -0.01 (-0.00%)
fixed_size_expr_evaluator q09 754.97 763.28 -8.31 (-1.09%)
fixed_size_expr_evaluator q10 238.70 239.60 -0.90 (-0.38%)
fixed_size_expr_evaluator q11 238.59 234.11 4.47 (1.91%)
fixed_size_expr_evaluator q12 231.76 232.59 -0.83 (-0.36%)
fixed_size_expr_evaluator q13 1469.95 1466.02 3.93 (0.27%)
fixed_size_seq_scan q23 116.80 113.67 3.13 (2.75%)
join q29 663.61 636.02 27.60 (4.34%)
join q30 1464.52 1405.18 59.34 (4.22%)
join q31 3.64 5.00 -1.36 (-27.27%)
ldbc_snb_ic q35 406.49 543.69 -137.20 (-25.24%)
ldbc_snb_ic q36 128.57 126.15 2.42 (1.92%)
ldbc_snb_is q32 4.34 5.77 -1.43 (-24.72%)
ldbc_snb_is q33 12.05 10.00 2.04 (20.45%)
ldbc_snb_is q34 1.45 1.78 -0.34 (-18.84%)
multi-rel multi-rel-large-scan 1830.19 1785.95 44.25 (2.48%)
multi-rel multi-rel-lookup 9.60 17.95 -8.36 (-46.54%)
multi-rel multi-rel-small-scan 87.09 67.77 19.32 (28.51%)
order_by q25 129.57 130.10 -0.53 (-0.41%)
order_by q26 447.20 449.31 -2.11 (-0.47%)
order_by q27 1481.06 1481.94 -0.88 (-0.06%)
scan_after_filter q01 169.14 169.66 -0.52 (-0.31%)
scan_after_filter q02 158.10 158.26 -0.16 (-0.10%)
shortest_path_ldbc100 q37 93.02 92.76 0.26 (0.28%)
shortest_path_ldbc100 q38 473.04 432.85 40.19 (9.28%)
shortest_path_ldbc100 q39 61.22 64.44 -3.22 (-5.00%)
shortest_path_ldbc100 q40 551.45 528.18 23.27 (4.41%)
var_size_expr_evaluator q03 2054.15 2059.42 -5.27 (-0.26%)
var_size_expr_evaluator q04 2253.49 2239.92 13.57 (0.61%)
var_size_expr_evaluator q05 2619.40 2618.91 0.49 (0.02%)
var_size_expr_evaluator q06 1330.75 1324.85 5.89 (0.44%)
var_size_seq_scan q19 1454.19 1456.36 -2.17 (-0.15%)
var_size_seq_scan q20 2528.19 2528.44 -0.25 (-0.01%)
var_size_seq_scan q21 2293.14 2268.98 24.16 (1.06%)
var_size_seq_scan q22 127.79 128.07 -0.28 (-0.22%)

@andyfengHKU andyfengHKU merged commit 9d5afd1 into master Nov 19, 2024
25 checks passed
@andyfengHKU andyfengHKU deleted the cli-unicode-input branch November 19, 2024 13:56
ray6080 pushed a commit that referenced this pull request Dec 17, 2024
* Added unicode \u and \U parsing to the cli

* Added tests

* Minor test fixes

* Minor fixes
ray6080 pushed a commit that referenced this pull request Dec 18, 2024
* Added unicode \u and \U parsing to the cli

* Added tests

* Minor test fixes

* Minor fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: CLI parsing with unicode strings is erratic
2 participants