You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When kuzu scans from files, file extension is used to sniff the file format.
For example: LOAD FROM 'test.csv',
Kuzu uses the file extension .csv to infer the file type of test is in CSV format.
Issue caused:
CSV files may end with a different extension name other than .CSV . (e.g. .tsv, '.xlsx') LOAD FROM 'test.tsv'. => Exception
Certain file formats store data in a directory rather than a single file. Inferring those file formats is quite difficult. (e.g. delta lake, iceberg) LOAD FROM 'person-delta-lake'
Solution:
I propose that we extend our LOAD FROM grammar to accept an additional option: TYPE, which indicates the type of the file in the LOAD FROM clause.
Grammar:
Load from
LOAD FROM <file_path> (type=<file_format>)
RETURN XXXX
Copy from
COPY FROM <file_path> (type=<file_format>)
Example:
LOAD FROM '/tmp/student' (type=delta)
RETURN name, age
COPY student FROM '/tmp/student' (type=delta)
We can make TYPE optional, meaning that kuzu still tries to infer the file format when the TYPE option is not provided. So:
LOAD FROM '/tmp/university.csv'
RETURN name, age
still works. However, kuzu should throw an exception when the type inference is impossible:
LOAD FROM '/tmp/student'
RETURN name, age
The text was updated successfully, but these errors were encountered:
API
C++
Description
Motivation:
Current Implementation:
When kuzu scans from files, file extension is used to sniff the file format.
For example:
LOAD FROM 'test.csv'
,Kuzu uses the file extension
.csv
to infer the file type oftest
is inCSV
format.Issue caused:
.CSV
. (e.g..tsv
, '.xlsx')LOAD FROM 'test.tsv'
. => ExceptionLOAD FROM 'person-delta-lake'
Solution:
I propose that we extend our
LOAD FROM
grammar to accept an additional option:TYPE
, which indicates the type of the file in theLOAD FROM
clause.Grammar:
Example:
We can make
TYPE
optional, meaning that kuzu still tries to infer the file format when theTYPE
option is not provided. So:still works. However, kuzu should throw an exception when the type inference is impossible:
The text was updated successfully, but these errors were encountered: