-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update C API graph creation function signatures #3982
Update C API graph creation function signatures #3982
Conversation
* @param [in] vertices Optional device array containing a list of vertex ids | ||
* (specify NULL if we should create vertex ids from the | ||
* unique contents of @p src and @p dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does NULL vertices behave with and w/o renumbering? What if edge data only contain unique ids of e.g. 1, 3, and 5? Are 0, 2, and 4 implied if renumber is False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If vertices
is NULL, this will provide the same behavior as today. That is:
- Renumber enabled will ignore vertices 0, 2, 4... they will not exist in the graph. We will create a graph with 3 vertices which will be internally numbered 0, 1, 2. Any function that is called will translate any vertex id inputs to the mapping 0, 1, 2 (e.g. if you call BFS and specify a seed of 1, it will be mapped to whichever id that vertex 1 was mapped to), and any vertex ids outputs will be renumbered from the internal mapping (0, 1, 2) to the external mapping (1, 3, 5)
- Renumber disabled will use the vertex ids exactly as specified, so vertices 0, 2 and 4 will be isolated vertices in the graph. However, if your graph also had a vertex 6... that vertex would be lost even with renumber disabled, since there is no mechanism to discover that vertex 6 exists in your graph with the current (and proposed API).
cpp/include/cugraph_c/graph.h
Outdated
* integer values from 0 to num_vertices. | ||
* @param [in] do_expensive_check If true, do expensive checks to validate the input data | ||
* is consistent with software assumptions. If false bypass these checks. | ||
* @param [in] properties Properties of the graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
properties is defined twice for this function (and also twice for the function below).
* @param [in] edge_type_ids Device array containing the edge types for each edge. Optional | ||
argument that can be NULL if edge types are not used. | ||
* @param [in] store_transposed If true create the graph initially in transposed format | ||
* @param [in] renumber If true, renumber vertices to make an efficient data structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does renumber=True
work for CSR? Wouldn't offsets
already imply 0..N-1 vertex ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renumber=True
would still work for CSR input. Yes, offsets
does imply vertex ids in the range [0, N)
. But our renumbering logic adds some performance optimizations by putting the vertex ids in an order that makes the CUDA kernels more efficient. So there is value in specifying renumber=True
even if the vertex ids are densely packed.
I will be fixing the C API implementation of this function so that it will properly reflect the isolated vertices that can be implied in CSR format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should update the documentation to better reflect that renumbering is more than just packing integers and highly recommended in most practical use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated documentation for each of the renumber
parameters in this file in the next push.
cpp/include/cugraph_c/graph.h
Outdated
* @param [in] store_transposed If true create the graph initially in transposed format | ||
* @param [in] renumber If true, renumber vertices to make an efficient data structure. | ||
* If false, do not renumber. Renumbering is required if the vertices are not sequential | ||
* integer values from 0 to num_vertices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this documentation is a bit misleading. In cuGraph, renumbering is more than just mapping integers to consecutive integers starting from 0. cuGraph renumbers vertex IDs in a specific way for various optimizations. This documentation may be interpreted that renumbering is unnecessary if vertex IDs are consecutive integers starting from 0. AFAIK, renumber = false's only remaining use case is for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated documentation for each of the renumber
parameters in this file in the next push.
* @param [in] edge_type_ids Device array containing the edge types for each edge. Optional | ||
argument that can be NULL if edge types are not used. | ||
* @param [in] store_transposed If true create the graph initially in transposed format | ||
* @param [in] renumber If true, renumber vertices to make an efficient data structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should update the documentation to better reflect that renumbering is more than just packing integers and highly recommended in most practical use cases.
cpp/include/cugraph_c/graph.h
Outdated
* | ||
* @param [in] graph A pointer to the graph object to destroy | ||
*/ | ||
void cugraph_graph_free(cugraph_graph_t* graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this function works for both SG&MG graphs, right? Then, should we place this function after SG & MG graph creation functions? I was searching for a new graph deletion function for MG (replacing the deprecated function) and it took sometime to find that this function works for both SG & MG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all of the free functions so that they are together in the file.
@@ -96,49 +135,27 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { | |||
edge_type_id_t>> | |||
new_edge_types{std::nullopt}; | |||
|
|||
rmm::device_uvector<vertex_t> edgelist_srcs(src_->size_, handle_.get_stream()); | |||
rmm::device_uvector<vertex_t> edgelist_dsts(dst_->size_, handle_.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something outside the scope of this PR, but are we planning to shuffle edges in the C-API in the future? If we perform shuffle in the C-API, should we still need to take multiple arrays from the python layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do shuffle the edges in the C API currently.
Perhaps @VibhuJawa can offer some insight into the need for multiple arrays, but I believe this is an artifact of dask and how the arrays are distributed on the workers. It is possible for there to be multiple device arrays on the same worker. This new mechanism allows the concatenation of these multiple arrays to occur in the C API rather than in dask/cudf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in dask-cudf
too once we remove the num_edges
signature (which this PR does).
cugraph/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py
Line 134 in 8c104a5
num_edges=num_edges, |
If we don't have to get num_edges
, we can just do this in cuDF/cuPY
one array at a time and the resulting overhead will be 2x(array_memory)
instead of 2x(dataframe_memory)
.
cpp/src/c_api/graph_mg.cpp
Outdated
|
||
// FIXME: Need to handle the case where a GPU gets a NULL pointer but other GPUs | ||
// Get values. Override vertex_type/weight_type/edge_id_type but don't | ||
// fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle this instead of leaving this as FIXME. This indeed happened in Louvain in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this as an update to this PR. I wanted to get the PR up for review rather than waiting for a little more validation.
… for graph creation (SG and MG)
cpp/src/c_api/graph_mg.cpp
Outdated
CAPI_EXPECTS( | ||
std::count_if(vertex_types.begin(), | ||
vertex_types.end(), | ||
[vertex_type](auto t) { return vertex_type != static_cast<int>(t); }) == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor point, but is std::all_of
more specific for this? (https://en.cppreference.com/w/cpp/algorithm/all_any_none_of). I guess all_of may better document the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address when I get to the FIXME, which might change this logic anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push.
cpp/src/c_api/graph_mg.cpp
Outdated
[weight_type](auto t) { return weight_type != static_cast<int>(t); }) == 0, | ||
CUGRAPH_INVALID_INPUT, | ||
"different weight type used on different GPUs", | ||
*error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor point, but is std::all_of
more specific for this? (https://en.cppreference.com/w/cpp/algorithm/all_any_none_of). I guess all_of may better document the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
cpp/src/c_api/graph_sg.cpp
Outdated
if (drop_multi_edges_) { | ||
std::cout << "calling drop_multi_edges" << std::endl; | ||
raft::print_device_vector( | ||
" edgelist_srcs", edgelist_srcs.data(), edgelist_srcs.size(), std::cout); | ||
raft::print_device_vector( | ||
" edgelist_dsts", edgelist_dsts.data(), edgelist_dsts.size(), std::cout); | ||
|
||
std::tie( | ||
edgelist_srcs, edgelist_dsts, edgelist_weights, edgelist_edge_ids, edgelist_edge_types) = | ||
cugraph::sort_and_remove_multi_edges(handle_, | ||
std::move(edgelist_srcs), | ||
std::move(edgelist_dsts), | ||
std::move(edgelist_weights), | ||
std::move(edgelist_edge_ids), | ||
std::move(edgelist_edge_types)); | ||
|
||
std::cout << "after drop_multi_edges" << std::endl; | ||
raft::print_device_vector( | ||
" edgelist_srcs", edgelist_srcs.data(), edgelist_srcs.size(), std::cout); | ||
raft::print_device_vector( | ||
" edgelist_dsts", edgelist_dsts.data(), edgelist_dsts.size(), std::cout); | ||
} | ||
|
||
if (drop_self_loops_) { | ||
std::tie( | ||
edgelist_srcs, edgelist_dsts, edgelist_weights, edgelist_edge_ids, edgelist_edge_types) = | ||
cugraph::remove_self_loops(handle_, | ||
std::move(edgelist_srcs), | ||
std::move(edgelist_dsts), | ||
std::move(edgelist_weights), | ||
std::move(edgelist_edge_ids), | ||
std::move(edgelist_edge_types)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking but if drop_self_loops
appear before drop_multi_edges
in the input parameters, should we better perform drop_self_loops() before drop_multi_edges()? This consistency may make the code easier to read and slightly easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address in next push
{ | ||
auto edge_first = | ||
thrust::make_zip_iterator(thrust::make_tuple(edgelist_srcs.begin(), edgelist_dsts.begin())); | ||
thrust::sort(handle.get_thrust_policy(), edge_first, edge_first + edgelist_srcs.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require 2x the size of edge list and will increase the peak memory requirement. Our goal is to keep the peak memory footprint in graph creation to 1.5x edge list + per vertex data.
See the code in
We can call mem_frugal_partitoin (which requires 0.5x the input array) and process the resulting two partitions one by one if the input array size is larger than the certain threshold.
thrust::remove_if(handle.get_thrust_policy(), | ||
edge_first, | ||
edge_first + edgelist_srcs.size(), | ||
[] __device__(auto e) { return thrust::get<0>(e) == thrust::get<1>(e); })), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 70% sure thrust::remove_if requires 2x the input array size memory. If this becomes a memory bottleneck (e.g. the input edge list size is large compared to the device memory size, see https://github.com/rapidsai/cugraph/blob/branch-23.12/cpp/src/structure/create_graph_from_edgelist_impl.cuh#L523C23-L523C23), we may create a bitmap storing whether to remove an edge or not. And based on this bitmap, we can call remove_if on srcs, dsts, weights, edge_ids, and edge_types, separately. Then, we can keep the peak memory requirement under our goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also eliminate the need for the ugliness you mention below. I'll refactor.
void remove_self_loops(raft::handle_t const& handle, | ||
rmm::device_uvector<vertex_t>& edgelist_srcs /* [INOUT] */, | ||
rmm::device_uvector<vertex_t>& edgelist_dsts /* [INOUT] */, | ||
rmm::device_uvector<A>& edgelist_a /* [INOUT] */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we need three separate functions that takes 1, 2, or 3 additional variables in addition to edgelist_srcs & edgelist_dsts.
Then, we can set
edge_first = thrust::make_zip_iterator(edgelist_srcs.begin(), edgelist_dsts.begin(), get_dataframe_buffer_begin());
And the remaining code will be same. Same for the sort_and_remove_multi_edges function.
This will be more important if we bring additional code to limit peak memory footprint.
…memory footprint when graphs are large relative to GPU memory
// FIXME: This should probably just be cugraph_graph_free | ||
// but didn't want to confuse with original cugraph_free_graph | ||
void cugraph_sg_graph_free(cugraph_graph_t* graph); | ||
cugraph_error_code_t cugraph_graph_create_sg_from_csr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cugraph_graph_create_sg
supports flags to drop both self loops and multi edges through the methods cugraph::remove_self_loops
and cugraph::sort_and_remove_multi_edges
, shouldn't cugraph_graph_create_sg_from_csr
also support those flags? In fact, cugraph_graph_create_sg_from_csr
extracts the COO
from the CSR
before calling cugraph::create_graph_from_edgelist
therefore, we can drop self loops and multi edges (if set to True) right before the graph question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add that feature. I was assuming that if you already have a CSR as input then you have already done whatever ETL steps are desired.
cpp/src/c_api/graph_mg.cpp
Outdated
"Invalid input arguments: all vertex types must match", | ||
*error); | ||
|
||
CAPI_EXPECTS(p_weights[i]->type_ == weight_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to check if weights
was not a nullptr
otherwise we get a segfault
. I changed it locally to this
CAPI_EXPECTS((weights == nullptr) || (p_weights[i]->type_ == weight_type),
CUGRAPH_INVALID_INPUT,
"Invalid input arguments: all weight types must match",
*error);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
std::optional<rmm::device_uvector<weight_t>>, | ||
std::optional<rmm::device_uvector<edge_t>>, | ||
std::optional<rmm::device_uvector<edge_type_t>>> | ||
sort_and_remove_multi_edges(raft::handle_t const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized this but now it seems like "sort" here is a misnomer. We are not really globally sorting anymore, so I think we'd better rename this function to remove_multi_edges
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And may better document that this function is not stable (which means this does not preserve the order in contrast to std::remove or thrust::remove).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types); | ||
|
||
/** | ||
* @brief Sort the edges and remove all but one edge when a multi-edge exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is no longer correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
T const* array_; | ||
|
||
T operator() __host__ __device__(size_t index) { return array_[index]; } | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use
template <typename index_t, typename Iterator>
struct indirection_t {
Iterator first{};
__device__ typename thrust::iterator_traits<Iterator>::value_type operator()(index_t i) const
{
return *(first + i);
}
};
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
thrust::fill(handle.get_thrust_policy(), | ||
remove_flags.begin(), | ||
remove_flags.end(), | ||
cugraph::packed_bool_empty_mask()); | ||
|
||
size_t remove_count = thrust::count_if( | ||
handle.get_thrust_policy(), | ||
thrust::make_counting_iterator(size_t{0}), | ||
thrust::make_counting_iterator(src.size()), | ||
[comparison, d_remove_flags = remove_flags.data()] __device__(size_t i) { | ||
if (comparison(i)) { | ||
atomicOr(d_remove_flags + cugraph::packed_bool_offset(i), cugraph::packed_bool_mask(i)); | ||
return true; | ||
} | ||
return false; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we better call this remove_flags.size()
? We can avoid atomics and no need to call thrust::fill.
Something like
thrust::tabulate(
handle.get_thrust_policy(),
remove_flags.begin(),
remove_flags.end(),
[edge_first, comparison]__device__(auto i) {
auto word = packed_bool_empty_mask();
...
return word;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rework this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
template <typename T> | ||
rmm::device_uvector<T> remove_flagged_elements(raft::handle_t const& handle, | ||
rmm::device_uvector<T>&& vector, | ||
rmm::device_uvector<uint32_t> const& remove_flags, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use raft::device_span for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better remove sort_
from the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better remove sort_
from the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
// Tuning parameter to address high frequency multi-edges | ||
size_t num_hash_buckets{16}; | ||
|
||
auto hash_counts = compute_hash_sizes(pair_first, | ||
pair_first + edgelist_srcs.size(), | ||
num_hash_buckets, | ||
hash_src_dst_pair<vertex_t>{}, | ||
handle.get_stream()); | ||
|
||
auto pivot = | ||
static_cast<int32_t>(thrust::distance(hash_counts.begin(), | ||
thrust::lower_bound(handle.get_thrust_policy(), | ||
hash_counts.begin(), | ||
hash_counts.end(), | ||
edgelist_srcs.size() / 2))); | ||
|
||
auto second_first = detail::mem_frugal_partition(pair_first, | ||
pair_first + edgelist_srcs.size(), | ||
hash_src_dst_pair<vertex_t>{}, | ||
pivot, | ||
handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use https://github.com/rapidsai/cugraph/blob/branch-23.12/cpp/include/cugraph/utilities/shuffle_comm.cuh#L782 with num_groups = 2. Then you just need to provide a key_to_group_id
functor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with setting num_groups = 2 was that we might get a pretty bad skew in the multi-edges. Consider an rmat graph. We're going to get certain edges that occur with great frequency, and other edges that occur with lesser frequency. I thought that by first grouping by some larger number and then finding the median break point we'll be closer to a 50/50 split at the pivot.
I certainly could use groupby_and_count
instead of recreating it here (my compute_hash_sizes
function is essentially the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean there can be millions (or billions) of edges between a unique (src, dst) pair? Yeah... this might be possible if there are two really big institutes and there are a huge number of transactions between the two institutes.
But I don't think this will happen in synthetic R-mat graphs (we are not partitioning based on just src or dst but src, dst pair, so high degree vertices won't have much impact on partitioning quality).
Yeah... and calling with num_groups = 2 vs some large number can be a compute vs solution robustness trade-offs. num_goups = 2 will be cheaper than than num_groups = 128 (2^1 vs 2^7). And in the theoretical worst case (i.e. all the edges are between the same src, dst pair), we will need 2x memory of the edge list vs 1.5x. And if there are very high multiplicity edge pairs, we may be sometimes able to combine those edges and create a weighted graph. Just a thought material. Not sure what's the right trade-off point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked in detail. My recollection of our RMAT generator is that we will get a higher multiplicity of edge pairs in certain quadrants of the graph. So there will be many more instances of (0,1), (0,2), (0,3)
created than vertex pairs with high vertex id values. I agree that after we scramble, the partitioning will take care of much of this. But we're still likely to end up with vertex pairs on each GPU that have high multiplicity and vertex pairs that have low multiplicity.
I think I will leave this code in there to support this issue, but I'll default the number of groups to 2 and just leave a FIXME for us to investigate it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
size_t num_hash_buckets{16}; | ||
|
||
auto hash_counts = compute_hash_sizes(pair_first, | ||
pair_first + edgelist_srcs.size(), | ||
num_hash_buckets, | ||
hash_src_dst_pair<vertex_t>{}, | ||
handle.get_stream()); | ||
|
||
auto pivot = | ||
static_cast<int32_t>(thrust::distance(hash_counts.begin(), | ||
thrust::lower_bound(handle.get_thrust_policy(), | ||
hash_counts.begin(), | ||
hash_counts.end(), | ||
edgelist_srcs.size() / 2))); | ||
|
||
auto second_first = detail::mem_frugal_partition(pair_first, | ||
pair_first + edgelist_srcs.size(), | ||
get_dataframe_buffer_begin(edgelist_values), | ||
hash_src_dst_pair<vertex_t>{}, | ||
pivot, | ||
handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
thrust::unique(handle.get_thrust_policy(), | ||
pair_first, | ||
pair_first + edgelist_srcs.size(), | ||
[] __device__(auto lhs, auto rhs) { | ||
return (thrust::get<0>(lhs) == thrust::get<0>(rhs)) && | ||
(thrust::get<1>(lhs) == thrust::get<1>(rhs)); | ||
})), | ||
handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked thrust::unique does not use a temporary memory buffer (size = original data size). We can test this by creating an array with random data close to 70% of the total memory size and this unique call succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll run that test. I have an idea to work around that. I'll work on it over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this code to use the same technique as the remove self-loops logic: Create a bit mask of entries to delete. Then delete them one array at a time. Should use at most 1.55x memory footprint.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
Updating the C API graph creation functions to support the following:
This PR implements these features. Some simple tests exist to check for isolate vertices (by running pagerank which generates a different result if the graph has isolated vertices). A simple test for multiple input arrays exists for the MG case.
Closes #3947
Closes #3974