Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable duckdb extension to build/run on CUDA-enabled pytorch #273

Merged
merged 9 commits into from
Nov 1, 2022

Conversation

changhiskhan
Copy link
Contributor

@changhiskhan changhiskhan commented Oct 30, 2022

create_pytorch('name', '/path/to/model.pth', 'cuda')
OR
create_pytorch('name', '/path/to/model.pth', 'cpu')

inputs are automatically moved to corresopnding device

@changhiskhan changhiskhan requested a review from eddyxu October 30, 2022 06:27
@changhiskhan changhiskhan force-pushed the changhiskhan/duckdb-gpu branch from 1476687 to 92bff76 Compare October 30, 2022 06:28
integration/duckdb/CMakeLists.txt Outdated Show resolved Hide resolved
list(APPEND available_contents Torch)
endif()
endif()

FetchContent_MakeAvailable(${available_contents})

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD 17)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why downgrade CXX standard here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvcc doesn't support C++20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if this is set to 20, cmake complains about not supporting CUDA20

if(LANCE_BUILD_CUDA)
FetchContent_Declare(
Torch
URL https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.0%2Bcu117.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a place to pin pytorch version? Or mention it in README or somewhere?

Btw, do we want to support more than 1 cuda version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can pin pytorch in lance, but we can't control if they pip install something different. Since 1.13 just got released, we should be ok for a little while. We can also check the version when doing import lance and print out a warning or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok actually there's no good place to pin it. pytorch is not currently a dependency. i'll add a thing in README.

@@ -38,7 +38,8 @@ ModelCatalog* ModelCatalog::Get() {
}

bool ModelCatalog::Load(const std::string& name, const std::string& uri) {
if (models_.contains(name)) {
// nvcc doesn't support C++20 yet so can't use std::map::contains()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it have impact to including lance codebase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you thinking we're going to want to merge this with main lance codebase at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking if we provide native reader/writer ext from duckdb extension, similar to parquet extension.

But we can leave it to later discussion ofc.

@changhiskhan changhiskhan changed the title builds on gpu Enable duckdb extension to build/run on CUDA-enabled pytorch Oct 31, 2022
for (int i = 0; i < softmax.size(0); i++) {
values.emplace_back(::duckdb::Value::FLOAT(*softmax[i].data_ptr<float>()));
auto fmat = ReadImageFromDuckDBValue(args.data[1].GetValue(i));
if(fmat.has_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this format seems off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a space after if and ran reformat code, no other changes were made by Gateway?

module_.eval();
auto params = module_.named_parameters();
for (const auto &item : params) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copypasta

}

torch::Tensor RunInference(cv::Mat fmat);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be const cv::Mat& to avoid one copy, or do you expect move a mat here.

Should the interface design against to torch::Tensor instead of cv::Mat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, good catch.

interface: this is a private member so i wasn't thinking about changing the interface. Maybe it makes sense to call this ImageToTensor or something and it'll do everything but the call to forward


def create_model(db, model_path, device):
db.execute(f"CALL create_pytorch_model('resnet', '{str(model_path)}', '{device}');")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to address in this PR. but in long term , should the third parameter (conf / params) be a map<string, ANY>?
Or use PRAGMA to customize behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess that depends on whether there's more model-specific settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few settings in mind, i.e., batch size, num of multiprocesses, and other parameters that a DataLoader can support?

@changhiskhan changhiskhan merged commit bb57d55 into main Nov 1, 2022
@changhiskhan changhiskhan deleted the changhiskhan/duckdb-gpu branch November 1, 2022 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants