Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Extend LOAD FROM clause with TYPE option #4597

Closed
acquamarin opened this issue Dec 4, 2024 · 0 comments · Fixed by #4613
Closed

Feature: Extend LOAD FROM clause with TYPE option #4597

acquamarin opened this issue Dec 4, 2024 · 0 comments · Fixed by #4613
Assignees
Labels
feature New features or missing components of existing features frontend Frontend, i.e., binder, parser, query planning-related issues

Comments

@acquamarin
Copy link
Collaborator

acquamarin commented Dec 4, 2024

API

C++

Description

Motivation:

Current Implementation:

When kuzu scans from files, file extension is used to sniff the file format.
For example: LOAD FROM 'test.csv',
Kuzu uses the file extension .csv to infer the file type of test is in CSV format.

Issue caused:

  1. CSV files may end with a different extension name other than .CSV . (e.g. .tsv, '.xlsx')
    LOAD FROM 'test.tsv'. => Exception
  2. Certain file formats store data in a directory rather than a single file. Inferring those file formats is quite difficult. (e.g. delta lake, iceberg)
    LOAD FROM 'person-delta-lake'

Solution:

I propose that we extend our LOAD FROM grammar to accept an additional option: TYPE, which indicates the type of the file in the LOAD FROM clause.

Grammar:

  1. Load from
LOAD FROM <file_path> (type=<file_format>)
RETURN XXXX
  1. Copy from
COPY FROM <file_path> (type=<file_format>)

Example:

LOAD FROM '/tmp/student' (type=delta)
RETURN name, age
COPY student FROM '/tmp/student' (type=delta)

We can make TYPE optional, meaning that kuzu still tries to infer the file format when the TYPE option is not provided. So:

LOAD FROM '/tmp/university.csv'
RETURN name, age

still works. However, kuzu should throw an exception when the type inference is impossible:

LOAD FROM '/tmp/student'
RETURN name, age
@acquamarin acquamarin added the feature New features or missing components of existing features label Dec 4, 2024
@acquamarin acquamarin self-assigned this Dec 4, 2024
@acquamarin acquamarin added the frontend Frontend, i.e., binder, parser, query planning-related issues label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features frontend Frontend, i.e., binder, parser, query planning-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant