Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activation_kernel.cu(21):error:__host__ or __device__ annotation on lambda requires --expt-extended-lambda nvcc flag #251

Closed
MSC19950601 opened this issue Apr 6, 2020 · 11 comments · Fixed by #305

Comments

@MSC19950601
Copy link

activation_kernel.cu(21):error:host or device annotation on lambda requires --expt-extended-lambda nvcc flag

what is the problem?

Originally posted by @zhuizhunew in #66 (comment)

same issue!

@zhanghang1989
Copy link
Owner

I am making a new version. which is in PR #256 with new setup instructions.

This new PR will be merged soon. Let me know if you still have the issue.

@MSC19950601
Copy link
Author

I'm sorry, I still met some bugs. My env is Ubuntu 18.04, torch 1.4.0, CUDA 10.1.
I install torch-encoding by github source, the whole install progress is fine. But when I install lib/gpu (enclib_gpu) manually, I meet some bugs. Here is the log.

running install
running bdist_egg
running egg_info
writing enclib_gpu.egg-info/PKG-INFO
writing dependency_links to enclib_gpu.egg-info/dependency_links.txt
writing top-level names to enclib_gpu.egg-info/top_level.txt
reading manifest file 'enclib_gpu.egg-info/SOURCES.txt'
writing manifest file 'enclib_gpu.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'enclib_gpu' extension
gcc -pthread -B /home/kururu/anaconda3/envs/kururudev-torchdev/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/TH -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/include/python3.6m -c operator.cpp -o build/temp.linux-x86_64-3.6/operator.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=enclib_gpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/TH -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/include/python3.6m -c activation_kernel.cu -o build/temp.linux-x86_64-3.6/activation_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=enclib_gpu -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11
/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(14): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(18): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(23): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/profiler.h(97): warning: attribute "visibility" does not apply here

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/profiler.h(112): warning: attribute "visibility" does not apply here

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/enum.h(179): warning: statement is unreachable

activation_kernel.cu(20): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(21): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(23): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(24): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->__nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->__nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->__nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->__nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=double]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->__nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->__nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(100): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(214): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(21): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->__nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->__nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified
detected during:
instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>
(926): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
(1077): here
instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::__parallel_for::ParallelForAgent<thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, _1=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here
instantiation of "cudaError_t thrust::cuda_cub::__parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here
instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::__transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here
instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policythrust::cuda_cub::tag, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::__transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here
instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(80): here
instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
/usr/local/cuda/include/thrust/detail/transform.inl(188): here
instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]"
activation_kernel.cu(24): here
instantiation of "void ::leaky_relu_backward_impl(T *, T *, float, int64_t) [with T=float]"
activation_kernel.cu(36): here

20 errors detected in the compilation of "/tmp/tmpxft_00006a79_00000000-6_activation_kernel.cpp1.ii".
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

@MSC19950601
Copy link
Author

MSC19950601 commented Apr 20, 2020

Also when I only use encoding.nn.SyncBatchNorm for model test, my process of python script hangs on, fill GPU memory bu no computing use.

@MSC19950601
Copy link
Author

BTW, I only install ninja 1.8.2 in my python env but not install in my system, is it matter?

@zhanghang1989
Copy link
Owner

I am not expert in system setup. I haven't tried ubuntu 18.04 or cuda 10.1.

My setting is ubuntu 16.04 and cuda 10.0 with pytorch 1.4.0.
With the same setup, you may follow the setup steps here:
https://hangzhang.org/PyTorch-Encoding/notes/compile.html

@MSC19950601
Copy link
Author

Thank you for your patient explanation.
However, when I follow your installation, the situation didn't change. When I only use encoding.nn.SyncBatchNorm for model test, my process of python script hangs on, fill GPU memory bu no computing use.

@zhanghang1989
Copy link
Owner

That's wired. https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/syncbn.py#L175-L176

The eval mode should use the standard BN forward.

@zhanghang1989
Copy link
Owner

Are you using the most recent version of the code?

@zhanghang1989
Copy link
Owner

Could you try

pip install torch-encoding --pre

which installs the most recent version

@MSC19950601
Copy link
Author

Thanks for your patient explanation. What I used is the most recent version.

@zhanghang1989
Copy link
Owner

Is your issue related to PyCharm like this #260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants