Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support set operations #11043

Merged
merged 298 commits into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
298 commits
Select commit Hold shift + click to select a range
882a67a
Complete `StringKeyColumn` tests
ttnghia Jun 8, 2022
cdae2ac
Fix tests
ttnghia Jun 8, 2022
5b21d88
Fix tests
ttnghia Jun 8, 2022
3f18057
Rename function
ttnghia Jun 9, 2022
21456e7
Add `NonNullTable` tests
ttnghia Jun 9, 2022
8a17581
Add `SlicedNonNullTable` tests
ttnghia Jun 9, 2022
e05ad48
Add `InputWithNulls` tests
ttnghia Jun 9, 2022
03fb093
Change variable
ttnghia Jun 9, 2022
3c12942
Refactor
ttnghia Jun 9, 2022
6ab9673
Add `BasicList` tests
ttnghia Jun 9, 2022
37dfdcb
Add `NullableLists` tests
ttnghia Jun 9, 2022
b78cf5b
Add `ListsOfStructs` tests
ttnghia Jun 9, 2022
8de0948
Add `SlicedStructsOfLists` tests
ttnghia Jun 9, 2022
9e8c4a5
Misc
ttnghia Jun 9, 2022
7fa65ee
Add `ListsOfEmptyStructs` tests
ttnghia Jun 9, 2022
ff6e03e
Modify `EmptyDeepList` tests
ttnghia Jun 9, 2022
374545a
Add `StructsOfLists` tests
ttnghia Jun 9, 2022
9bf540a
Use `distinct` in Cython
ttnghia Jun 9, 2022
e1c3cd5
Merge branch 'branch-22.08' into refactor_stream_compaction
ttnghia Jun 9, 2022
70d3164
Fix Python style
ttnghia Jun 9, 2022
bba15c2
Revert "Fix Python style"
ttnghia Jun 9, 2022
d895f48
Revert "Use `distinct` in Cython"
ttnghia Jun 9, 2022
56e791c
Fix compiling errors due to merging
ttnghia Jun 10, 2022
fff65c1
Add doxygen group
ttnghia Jun 10, 2022
6ffc9b0
Fix doxygen
ttnghia Jun 10, 2022
dd8c845
Rewrite comment and rename variable
ttnghia Jun 10, 2022
0dcff06
Use customized cuco
ttnghia Jun 10, 2022
4d2ce5c
Cleanup
ttnghia Jun 10, 2022
c06f1b9
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 10, 2022
361464f
Reimplement `set_overlap`
ttnghia Jun 10, 2022
f74c3c8
Reimplement `set_intersect`
ttnghia Jun 10, 2022
27aaa6e
Reimplement `set_difference`
ttnghia Jun 10, 2022
1a41c0d
Fix all compile errors
ttnghia Jun 11, 2022
dc2754f
Support `nan_equality` in `create_map`
ttnghia Jun 11, 2022
e7b3022
Support `nan_equality` in `check_contains`
ttnghia Jun 11, 2022
a0046f5
Drop duplicate from the results
ttnghia Jun 11, 2022
22f38c0
Support most functionalities
ttnghia Jun 12, 2022
1963a3c
Use `pair_contains`
ttnghia Jun 12, 2022
b4a5dc6
Unify function
ttnghia Jun 12, 2022
b77beb9
Add comments
ttnghia Jun 12, 2022
450f638
Reorganize code
ttnghia Jun 12, 2022
cb8119b
Fixing null mask
ttnghia Jun 12, 2022
e60c81c
Avoid inserting nulls if compare unequal
ttnghia Jun 12, 2022
79f7906
Remove added code
ttnghia Jun 12, 2022
326f8d4
Add member function interface
ttnghia Jun 13, 2022
26958d5
Fix stale comment
ttnghia Jun 13, 2022
5a22c2b
Initial implementation
ttnghia Jun 13, 2022
910e05f
Switch to use new implementation
ttnghia Jun 14, 2022
e4622b1
All test passed
ttnghia Jun 14, 2022
be85cd2
Add public and detail API
ttnghia Jun 14, 2022
f299f4f
Cleanup and add comments
ttnghia Jun 14, 2022
82dc340
Fix style
ttnghia Jun 14, 2022
15c8daf
Rename function and variables
ttnghia Jun 14, 2022
5ae3ef8
Fix a serious bug
ttnghia Jun 14, 2022
58fb9d7
Optimize null insertion
ttnghia Jun 14, 2022
933d650
Remove constructor
ttnghia Jun 14, 2022
ee77c27
Misc
ttnghia Jun 14, 2022
3599820
WIP
ttnghia Jun 14, 2022
edc7897
Fix a bug in accumulating nested columns
ttnghia Jun 14, 2022
cb28355
Fix error that makes tests failed
ttnghia Jun 15, 2022
d9c0ab9
Address review comments
ttnghia Jun 15, 2022
7770265
Remove one overload
ttnghia Jun 15, 2022
96a36c4
Fix benchmark
ttnghia Jun 16, 2022
0210228
Rename struct, and use CTAD
ttnghia Jun 16, 2022
65190cc
Add comment
ttnghia Jun 16, 2022
a4db720
Rename variable
ttnghia Jun 16, 2022
7e0315b
Remove added code
ttnghia Jun 16, 2022
126886b
Update cuco
ttnghia Jun 16, 2022
b5a7450
Reverse changes
ttnghia Jun 16, 2022
6c90c53
Add a parameter
ttnghia Jun 16, 2022
4cc2f2e
Fix compiling errors
ttnghia Jun 16, 2022
55895e7
WIP
ttnghia Jun 16, 2022
df05dc8
Misc
ttnghia Jun 16, 2022
ec48856
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 16, 2022
5f7d778
WIP
ttnghia Jun 16, 2022
154645a
Rewrite doxygen
ttnghia Jun 16, 2022
8f04d50
Remove `keys` parameter from `get_distinct_indices`
ttnghia Jun 16, 2022
d806278
Rewrite doxygen
ttnghia Jun 16, 2022
3734344
Use another version of `gather`
ttnghia Jun 16, 2022
f731d35
Fix wrong doxygen
ttnghia Jun 16, 2022
e44c85d
Fix wrong doxygen again
ttnghia Jun 16, 2022
a74f71e
Misc
ttnghia Jun 16, 2022
f9de181
Update doxygen
ttnghia Jun 16, 2022
a339d83
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 16, 2022
68652f4
Implementation is complete
ttnghia Jun 16, 2022
6cec1eb
Define hash_map and add todo
ttnghia Jun 16, 2022
661400a
Rename variable
ttnghia Jun 17, 2022
700e465
Fix doxygen
ttnghia Jun 17, 2022
1c783e8
Rename tests
ttnghia Jun 17, 2022
64e03f6
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 17, 2022
13ad653
Add `NoNullsTableWithNans` test
ttnghia Jun 17, 2022
7811611
Add `InputWithNullsAndNaNs` tests
ttnghia Jun 17, 2022
47c5eec
Fix a bug when comparing nulls as unequal
ttnghia Jun 18, 2022
4db34db
Add `InputWithNullsUnequal` tests
ttnghia Jun 19, 2022
fab367b
Add `ListsWithNullsUnequal` tests
ttnghia Jun 19, 2022
9ec27af
Rewrite doxygen
ttnghia Jun 19, 2022
1359ee0
Rewrite doxygen for `duplicate_keep_option` and add back performance …
ttnghia Jun 19, 2022
aa0a4ed
Remove redundant docsc
ttnghia Jun 19, 2022
01e03b6
Rename functor
ttnghia Jun 20, 2022
cdc3000
Modify comments
ttnghia Jun 20, 2022
45dec2a
Merge branch 'branch-22.08' into refactor_stream_compaction
ttnghia Jun 20, 2022
16ba20c
Add header
ttnghia Jun 20, 2022
37a23e4
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 20, 2022
38603fc
Merge remote-tracking branch 'nghia/fix_compile_errors' into distinct…
ttnghia Jun 20, 2022
e32daf4
Add `InputWithNaNs*` tests
ttnghia Jun 20, 2022
cba4759
Merge branch 'branch-22.08' into refactor_stream_compaction
ttnghia Jun 20, 2022
7247101
Attempt to split files, not yet cleanup
ttnghia Jun 20, 2022
120377b
Cleanup
ttnghia Jun 20, 2022
68133d4
Change functor name
ttnghia Jun 20, 2022
aefdadf
Add doxygen
ttnghia Jun 20, 2022
faf6778
Reorganize code
ttnghia Jun 20, 2022
e839323
Fix headers
ttnghia Jun 20, 2022
f5646b3
Fix header
ttnghia Jun 20, 2022
538ff08
Fix `mr` usage, and rewrite some comments
ttnghia Jun 21, 2022
8201835
Reverse `join.hpp` files
ttnghia Jun 21, 2022
54d6e35
Write doxygen
ttnghia Jun 21, 2022
8149a08
Add new source file
ttnghia Jun 21, 2022
0cd9bd6
Complete implementation
ttnghia Jun 21, 2022
9254612
Cleanup headers
ttnghia Jun 21, 2022
e456b0b
Add headers
ttnghia Jun 21, 2022
bb703c6
Temporary use a cuco commit
ttnghia Jun 21, 2022
a755bea
Pass `std::shared_ptr` by value
ttnghia Jun 21, 2022
61df0ac
Rename lambda
ttnghia Jun 21, 2022
136b490
Merge branch 'branch-22.08' into refactor_semijoin
ttnghia Jun 21, 2022
9c2fb25
Draft for doxygen
ttnghia Jun 21, 2022
b8d43dc
Implement `check_compatibility`
ttnghia Jun 21, 2022
a2db48b
Using `pair_contains_if`
ttnghia Jun 21, 2022
d66a213
Update cuco
ttnghia Jun 21, 2022
adf8965
Fix null handling
ttnghia Jun 22, 2022
f0ee266
Fix doxygen and change function name
ttnghia Jun 22, 2022
0b35671
Update doxygen
ttnghia Jun 22, 2022
9320cf3
Fix nan handling
ttnghia Jun 22, 2022
29599b4
Merge branch 'branch-22.08' into refactor_semijoin
ttnghia Jun 22, 2022
db886ea
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 22, 2022
1ac6501
Merge branch 'branch-22.08' into refactor_stream_compaction
ttnghia Jun 22, 2022
d0af0e6
Merge branch 'refactor_stream_compaction' into distinct_with_nans_equ…
ttnghia Jun 22, 2022
9ddcc93
Add column into benchmark
ttnghia Jun 22, 2022
f712db6
Set benchmark min time
ttnghia Jun 22, 2022
29d15d4
Don't check for nulls of the needles table
ttnghia Jun 22, 2022
a4d15d6
Use asterisk
ttnghia Jun 22, 2022
ec00f0a
Remove redundant variable
ttnghia Jun 22, 2022
be3b2fe
Merge branch 'branch-22.08' into distinct_with_nans_equality
ttnghia Jun 22, 2022
c121268
Remove redundant declaration
ttnghia Jun 22, 2022
2410c08
Change default behavior
ttnghia Jun 22, 2022
489060f
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 22, 2022
7ee00ad
Rename function
ttnghia Jun 22, 2022
d3c404e
Merge branch 'distinct_with_nans_equality' into set_operations
ttnghia Jun 22, 2022
a3b2539
Fix compile errors
ttnghia Jun 22, 2022
c2c9a30
Remove temporary function
ttnghia Jun 22, 2022
fc40b55
Merge branch 'refactor_semijoin' into set_operations
ttnghia Jun 22, 2022
137d2ae
Remove all temporary functions
ttnghia Jun 22, 2022
58d36df
Rewrite `list_distinct`
ttnghia Jun 22, 2022
9a29248
Rewrite `list_overlap`
ttnghia Jun 22, 2022
84df3c5
Rewrite `set_intersect`
ttnghia Jun 22, 2022
c72b406
Rewrite `set_union`
ttnghia Jun 22, 2022
10e26b0
Rewrite all
ttnghia Jun 22, 2022
25d2635
Add detail header
ttnghia Jun 22, 2022
4fd850b
Change default value for nan comparison
ttnghia Jun 22, 2022
6436a54
Fix compile error
ttnghia Jun 22, 2022
f488969
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 23, 2022
613e9ba
Update meta.yaml
ttnghia Jun 23, 2022
75e567d
Write more doxygen
ttnghia Jun 23, 2022
acf7bc9
Misc
ttnghia Jun 23, 2022
f5769ae
Rename file
ttnghia Jun 23, 2022
4354312
Add headers for test files
ttnghia Jun 23, 2022
f181ac2
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 23, 2022
14d4f68
Add `TrivialTest` tests
ttnghia Jun 23, 2022
6925c02
Fix label generation
ttnghia Jun 23, 2022
541ebf9
Generate labels with nullmask
ttnghia Jun 23, 2022
da3f525
Revert "Generate labels with nullmask"
ttnghia Jun 23, 2022
6a28643
Fix validity check
ttnghia Jun 23, 2022
f901e39
Fix non-empty null lists
ttnghia Jun 23, 2022
814522d
All tests pass
ttnghia Jun 23, 2022
aecdf31
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 23, 2022
aebf434
Add comments
ttnghia Jun 24, 2022
9266be4
Add doxygen
ttnghia Jun 24, 2022
4571c6d
Rewrite doxygen
ttnghia Jun 24, 2022
c0ac406
Rename function
ttnghia Jun 24, 2022
3a32498
Rename function
ttnghia Jun 24, 2022
f3a4e84
Add `utilities.*` files
ttnghia Jun 24, 2022
2b7386c
Extract `distinct`
ttnghia Jun 24, 2022
c8f3da0
Add test file for `cudf::lists::distinct`
ttnghia Jun 24, 2022
1d7e8e0
Add new implementation and test files
ttnghia Jun 24, 2022
51b80db
Fix compile error
ttnghia Jun 24, 2022
08a76ad
Rename function
ttnghia Jun 27, 2022
16101f7
Implement `cudf::detail::stable_distinct` and `lists::distinct`
ttnghia Jun 27, 2022
5ec13d6
Rewrite doxygen
ttnghia Jun 27, 2022
6c5b738
Rename variable
ttnghia Jun 27, 2022
5b70eee
Rewrite comment
ttnghia Jun 27, 2022
238248d
Rename files
ttnghia Jun 27, 2022
ba6bf6b
Implement float tests
ttnghia Jun 27, 2022
3845c95
Implement string tests
ttnghia Jun 27, 2022
507c82d
Implement tests for `ListDistinctTypedTest`
ttnghia Jun 28, 2022
2cb8347
Complete the remaining tests
ttnghia Jun 28, 2022
7efdea0
Merge branch 'branch-22.08' into add_lists_distinct
ttnghia Jun 28, 2022
4388637
Rewrite doxygen
ttnghia Jun 28, 2022
277c110
Merge branch 'add_lists_distinct' into set_operations
ttnghia Jun 28, 2022
1992561
Rewrite all
ttnghia Jun 28, 2022
818c85f
Fix compatibility check
ttnghia Jun 28, 2022
66e28ca
Misc
ttnghia Jun 28, 2022
279f04e
Remove files
ttnghia Jun 28, 2022
ee334c1
Implement floating point tests
ttnghia Jun 28, 2022
434d35c
Fix a bug
ttnghia Jun 28, 2022
99d526b
Add string tests
ttnghia Jun 28, 2022
66fce07
Implement typed tests
ttnghia Jun 28, 2022
8e2ff3d
Implement nested structs tests
ttnghia Jun 28, 2022
d4b7d6c
Cleanup
ttnghia Jun 28, 2022
669aa9e
Implement floating point tests and string tests
ttnghia Jun 29, 2022
f42a976
Implement typed tests
ttnghia Jun 29, 2022
e63b27d
Misc
ttnghia Jun 29, 2022
f0d839a
Implement nested structs tests
ttnghia Jun 29, 2022
21654db
Add blank lines
ttnghia Jun 29, 2022
ae36981
Implement `set_intersect_tests`
ttnghia Jun 29, 2022
b087720
Misc
ttnghia Jun 29, 2022
7a07e2c
Implement `set_union_tests`
ttnghia Jun 29, 2022
0f8f8e2
Remove files
ttnghia Jun 29, 2022
38b50bc
Rename files
ttnghia Jun 29, 2022
55fc6e5
Update default stream
ttnghia Jun 30, 2022
fa286a6
Add identity tests
ttnghia Jun 30, 2022
5764ae0
Merge branch 'branch-22.08' into set_operations
ttnghia Jun 30, 2022
0bb5798
Change default behavior for `list_overlap`
ttnghia Jul 5, 2022
2cc2220
Fix doxygen
ttnghia Jul 5, 2022
e93ae8a
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 14, 2022
1b2251f
Reverse changes from merge conflict
ttnghia Jul 14, 2022
8c1f4a3
Fix merge conflict
ttnghia Jul 14, 2022
d75bb2c
Rename `list_overlap` into `have_overlap`
ttnghia Jul 14, 2022
e431fbb
Rewrite doxygen
ttnghia Jul 14, 2022
6a6b2b1
Misc
ttnghia Jul 14, 2022
b593f07
Rename functions
ttnghia Jul 18, 2022
2411173
Change headers
ttnghia Jul 18, 2022
a0018f0
Rewrite doxygen
ttnghia Jul 18, 2022
73de4fc
Fix typo
ttnghia Jul 18, 2022
5554f1f
Remove template disambiguator
ttnghia Jul 19, 2022
65e39ce
Store number of input lists into variable
ttnghia Jul 19, 2022
30a334c
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 19, 2022
96ec9ec
Add `mr` parameter with default value
ttnghia Jul 21, 2022
3dd926f
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 21, 2022
4e57a02
Rewrite doxygen
ttnghia Jul 22, 2022
81eda61
Add comments
ttnghia Jul 22, 2022
e3696a2
Rename test functions
ttnghia Jul 22, 2022
7e3ac0f
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 22, 2022
5b0b6de
Change default `null_equality` value for `have_overlap`
ttnghia Jul 22, 2022
ecc1734
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 22, 2022
17d8f95
Fix doxygen
ttnghia Jul 26, 2022
8565755
Fix header
ttnghia Jul 26, 2022
a6e1dd6
Change variable name
ttnghia Jul 26, 2022
c03dccd
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 26, 2022
4d1ebb0
Add headers
ttnghia Jul 26, 2022
016edbc
Merge branch 'branch-22.08' into set_operations
ttnghia Jul 26, 2022
1349b83
Update cpp/src/lists/set_operations.cu
PointKernel Jul 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ outputs:
- test -f $PREFIX/include/cudf/lists/detail/extract.hpp
- test -f $PREFIX/include/cudf/lists/detail/interleave_columns.hpp
- test -f $PREFIX/include/cudf/lists/detail/scatter_helper.cuh
- test -f $PREFIX/include/cudf/lists/detail/set_operations.hpp
- test -f $PREFIX/include/cudf/lists/detail/sorting.hpp
- test -f $PREFIX/include/cudf/lists/detail/stream_compaction.hpp
- test -f $PREFIX/include/cudf/lists/explode.hpp
Expand All @@ -178,6 +179,7 @@ outputs:
- test -f $PREFIX/include/cudf/lists/list_view.hpp
- test -f $PREFIX/include/cudf/lists/lists_column_factories.hpp
- test -f $PREFIX/include/cudf/lists/lists_column_view.hpp
- test -f $PREFIX/include/cudf/lists/set_operations.hpp
- test -f $PREFIX/include/cudf/lists/sorting.hpp
- test -f $PREFIX/include/cudf/lists/stream_compaction.hpp
- test -f $PREFIX/include/cudf/merge.hpp
Expand Down
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,7 @@ add_library(
src/lists/lists_column_view.cu
src/lists/segmented_sort.cu
src/lists/sequences.cu
src/lists/set_operations.cu
src/lists/stream_compaction/apply_boolean_mask.cu
src/lists/stream_compaction/distinct.cu
src/lists/utilities.cu
Expand Down
81 changes: 81 additions & 0 deletions cpp/include/cudf/lists/detail/set_operations.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
* Copyright (c) 2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cudf/column/column.hpp>
#include <cudf/lists/lists_column_view.hpp>
#include <cudf/types.hpp>
#include <rmm/cuda_stream_view.hpp>

#include <rmm/mr/device/device_memory_resource.hpp>

namespace cudf::lists::detail {

/**
* @copydoc cudf::list::have_overlap
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> have_overlap(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal,
nan_equality nans_equal,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::list::intersect_distinct
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> intersect_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal,
nan_equality nans_equal,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::list::union_distinct
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> union_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal,
nan_equality nans_equal,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::list::difference_distinct
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> difference_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal,
nan_equality nans_equal,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace cudf::lists::detail
171 changes: 171 additions & 0 deletions cpp/include/cudf/lists/set_operations.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
/*
* Copyright (c) 2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cudf/column/column.hpp>
#include <cudf/lists/lists_column_view.hpp>
#include <cudf/types.hpp>

#include <rmm/mr/device/device_memory_resource.hpp>

namespace cudf::lists {
/**
* @addtogroup set_operations
* @{
* @file
*/

/**
* @brief Check if lists at each row of the given lists columns overlap.
*
* Given two input lists columns, each list row in one column is checked if it has any common
* elements with the corresponding row of the other column.
*
* A null input row in any of the input lists columns will result in a null output row.
*
* @throw cudf::logic_error if the input lists columns have different sizes.
* @throw cudf::logic_error if children of the input lists columns have different data types.
*
* Example:
* @code{.pseudo}
* lhs = { {0, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
* rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
* result = { true, false, null, true }
* @endcode
*
* @param lhs The input lists column for one side
* @param rhs The input lists column for the other side
* @param nulls_equal Flag to specify whether null elements should be considered as equal, default
* to be `UNEQUAL` which means only non-null elements are checked for overlapping
* @param nans_equal Flag to specify whether floating-point NaNs should be considered as equal
* @param mr Device memory resource used to allocate the returned object
* @return A column of type BOOL containing the check results
*/
std::unique_ptr<column> have_overlap(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal = null_equality::EQUAL,
nan_equality nans_equal = nan_equality::ALL_EQUAL,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Create a lists column of distinct elements common to two input lists columns.
*
* Given two input lists columns `lhs` and `rhs`, an output lists column is created in a way such
* that each of its row `i` contains a list of distinct elements that can be found in both `lhs[i]`
* and `rhs[i]`.
*
* The order of distinct elements in the output rows is unspecified.
*
* A null input row in any of the input lists columns will result in a null output row.
*
* @throw cudf::logic_error if the input lists columns have different sizes.
* @throw cudf::logic_error if children of the input lists columns have different data types.
*
* Example:
* @code{.pseudo}
* lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
* rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
* result = { {1, 2}, {}, null, {null} }
* @endcode
*
* @param lhs The input lists column for one side
* @param rhs The input lists column for the other side
* @param nulls_equal Flag to specify whether null elements should be considered as equal
* @param nans_equal Flag to specify whether floating-point NaNs should be considered as equal
* @param mr Device memory resource used to allocate the returned object
* @return A lists column containing the intersection results
*/
std::unique_ptr<column> intersect_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal = null_equality::EQUAL,
nan_equality nans_equal = nan_equality::ALL_EQUAL,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
@brief Create a lists column of distinct elements found in either of two input lists columns.
*
* Given two input lists columns `lhs` and `rhs`, an output lists column is created in a way such
* that each of its row `i` contains a list of distinct elements that can be found in either
* `lhs[i]` or `rhs[i]`.
*
* The order of distinct elements in the output rows is unspecified.
*
* A null input row in any of the input lists columns will result in a null output row.
*
* @throw cudf::logic_error if the input lists columns have different sizes.
* @throw cudf::logic_error if children of the input lists columns have different data types.
*
* Example:
* @code{.pseudo}
* lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
* rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
* result = { {1, 2, 3}, {1, 2, 3, 4, 5}, null, {4, null, 5} }
* @endcode
*
* @param lhs The input lists column for one side
* @param rhs The input lists column for the other side
* @param nulls_equal Flag to specify whether null elements should be considered as equal
* @param nans_equal Flag to specify whether floating-point NaNs should be considered as equal
* @param mr Device memory resource used to allocate the returned object
* @return A lists column containing the union results
*/
std::unique_ptr<column> union_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal = null_equality::EQUAL,
nan_equality nans_equal = nan_equality::ALL_EQUAL,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Create a lists column of distinct elements found only in the left input column.
*
* Given two input lists columns `lhs` and `rhs`, an output lists column is created in a way such
* that each of its row `i` contains a list of distinct elements that can be found in `lhs[i]` but
* are not found in `rhs[i]`.
*
* The order of distinct elements in the output rows is unspecified.
*
* A null input row in any of the input lists columns will result in a null output row.
*
* @throw cudf::logic_error if the input lists columns have different sizes.
* @throw cudf::logic_error if children of the input lists columns have different data types.
*
* Example:
* @code{.pseudo}
* lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
* rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
* result = { {}, {1, 2, 3}, null, {4, 5} }
* @endcode
*
* @param lhs The input lists column of elements that may be included
* @param rhs The input lists column of elements to exclude
* @param nulls_equal Flag to specify whether null elements should be considered as equal
* @param nans_equal Flag to specify whether floating-point NaNs should be considered as equal
* @param mr Device memory resource used to allocate the returned object
* @return A lists column containing the difference results
*/
std::unique_ptr<column> difference_distinct(
lists_column_view const& lhs,
lists_column_view const& rhs,
null_equality nulls_equal = null_equality::EQUAL,
nan_equality nans_equal = nan_equality::ALL_EQUAL,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace cudf::lists
1 change: 1 addition & 0 deletions cpp/include/doxygen_groups.h
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@
* @defgroup lists_elements Counting
* @defgroup lists_drop_duplicates Filtering
* @defgroup lists_sort Sorting
* @defgroup set_operations Set Operations
* @}
* @defgroup nvtext_apis NVText
* @{
Expand Down
Loading