-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable duckdb extension to build/run on CUDA-enabled pytorch #273
Conversation
1476687
to
92bff76
Compare
list(APPEND available_contents Torch) | ||
endif() | ||
endif() | ||
|
||
FetchContent_MakeAvailable(${available_contents}) | ||
|
||
set(CMAKE_CXX_STANDARD 20) | ||
set(CMAKE_CXX_STANDARD 17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why downgrade CXX standard here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvcc doesn't support C++20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if this is set to 20, cmake complains about not supporting CUDA20
if(LANCE_BUILD_CUDA) | ||
FetchContent_Declare( | ||
Torch | ||
URL https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.0%2Bcu117.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a place to pin pytorch version? Or mention it in README or somewhere?
Btw, do we want to support more than 1 cuda version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pin pytorch in lance, but we can't control if they pip install something different. Since 1.13 just got released, we should be ok for a little while. We can also check the version when doing import lance
and print out a warning or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok actually there's no good place to pin it. pytorch is not currently a dependency. i'll add a thing in README.
@@ -38,7 +38,8 @@ ModelCatalog* ModelCatalog::Get() { | |||
} | |||
|
|||
bool ModelCatalog::Load(const std::string& name, const std::string& uri) { | |||
if (models_.contains(name)) { | |||
// nvcc doesn't support C++20 yet so can't use std::map::contains() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it have impact to including lance codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you thinking we're going to want to merge this with main lance codebase at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking if we provide native reader/writer ext from duckdb extension, similar to parquet extension.
But we can leave it to later discussion ofc.
for (int i = 0; i < softmax.size(0); i++) { | ||
values.emplace_back(::duckdb::Value::FLOAT(*softmax[i].data_ptr<float>())); | ||
auto fmat = ReadImageFromDuckDBValue(args.data[1].GetValue(i)); | ||
if(fmat.has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this format seems off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a space after if and ran reformat code, no other changes were made by Gateway?
module_.eval(); | ||
auto params = module_.named_parameters(); | ||
for (const auto &item : params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copypasta
} | ||
|
||
torch::Tensor RunInference(cv::Mat fmat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be const cv::Mat&
to avoid one copy, or do you expect move
a mat here.
Should the interface design against to torch::Tensor
instead of cv::Mat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, good catch.
interface: this is a private member so i wasn't thinking about changing the interface. Maybe it makes sense to call this ImageToTensor or something and it'll do everything but the call to forward
|
||
def create_model(db, model_path, device): | ||
db.execute(f"CALL create_pytorch_model('resnet', '{str(model_path)}', '{device}');") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to address in this PR. but in long term , should the third parameter (conf / params) be a map<string, ANY>
?
Or use PRAGMA to customize behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess that depends on whether there's more model-specific settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few settings in mind, i.e., batch size, num of multiprocesses, and other parameters that a DataLoader
can support?
create_pytorch('name', '/path/to/model.pth', 'cuda')
OR
create_pytorch('name', '/path/to/model.pth', 'cpu')
inputs are automatically moved to corresopnding device