You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.
default_database=# CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,R_NAME CHAR(25) NOT NULL,R_COMMENT VARCHAR(152));
CREATE TABLE
default_database=# copy region from '/Users/liuliuzhipeng/tpch-dbgen/region.tbl' with(delimiter '|', null '');
COPY 5
default_database=# select * from region;
r_regionkey | r_name | r_comment
-------------+-------------+---------------------------------------------------------------------------------------------------------------------
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
2 | ASIA | ges. thinly even pinto beans ca
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
1 | AMERICA | hs use ironic, even requests. s
3 | EUROPE | ly final courts cajole furiously final excuse
(5 rows)
default_database=# select * from region where r_name='ASIA';
r_regionkey | r_name | r_comment
-------------+--------+-----------
(0 rows)
default_database=# select * from region where r_regionkey=2;
r_regionkey | r_name | r_comment
-------------+--------+---------------------------------
2 | ASIA | ges. thinly even pinto beans ca
(1 row)
The text was updated successfully, but these errors were encountered:
I reproduce the issue @happyjoblzp you faced, and analyze the internals of Peloton to solve the issue.
The problem is originated from how Peloton handles the end value of a string, i.e., varchar column.
The basic mechanism for every char-type value is to add '\0' (00000000 in a bitstring) at the end of string value.
For instance, a binary string for selection predicate, r_name = 'ASIA', in a query becomes:
01000001 01010011 01001001 01000001 00000000 (ASIA'\0')
In addition, its length is also increased by one, i.e., not four but five.
In this case, if we read a varchar value with increased length by one, i.e., read five characters instead of four character, scanned char value becomes just a char value + its delimiter.
For instance, typical TPC-H and TPC-DS dataset generators use '|' as their column delimiter.
Thus, Peloton compares 'ASIA|' (by scanning column value of a table) to 'ASIA\0' (by parsing SQL query), and thus string comparator always returns false (please refer static inline int CompareStrings(const char* str1, int len1, const char* str2, int len2) { ... } function in (https://github.com/cmu-db/peloton/blob/master/src/include/type/type_util.h).
I create PR #1490 for solving the issue, and thus please refer the simple solution.
To solve the issue, as a simple test, I just replace all delimiters in CSV to '\0' while scanning CSV.
Then, the result of a query becomes correct.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: