[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

VibhuJawa · 2022-11-09T21:07:21Z

Version

22.12

Which installation method(s) does this occur on?

Conda

Describe the bug.

We create cudf series using the type and then categorize it in our PG implimentation while we should do it the other-way as this introduces memory overhead and also runs into cudf's int64 size_t limitations.

cugraph/python/cugraph/cugraph/structure/property_graph.py

Lines 434 to 436 in 7387fbc

    
           tmp_df[TCN] = cudf.Series( 
        
               np.repeat(type_name, len(tmp_df)), index=tmp_df.index, dtype=cat_dtype 
        
           )

CC: @eriknw

Minimum reproducible example

import cupy as cp
import cudf
from cugraph.experimental import PropertyGraph
pg = PropertyGraph()
n_rows = 120_000_000
src = cp.arange(n_rows)
dst = src-1
df = cudf.DataFrame({'src':src, 'dst':dst})
pg.add_edge_data(df,['src','dst'],type_name="('_N', '_E', '_N')")

Relevant log output

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 9
      7 dst = src-1
      8 df = cudf.DataFrame({'src':src, 'dst':dst})
----> 9 pg.add_edge_data(df,['src','dst'],type_name="('_N', '_E', '_N')")

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cugraph-22.12.0a0+78.g9633224fd.dirty-py3.9-linux-x86_64.egg/cugraph/structure/property_graph.py:696, in EXPERIMENTAL__PropertyGraph.add_edge_data(self, dataframe, vertex_col_names, edge_id_col_name, type_name, property_columns)
    690 cat_dtype = self.__update_categorical_dtype(
    691     self.__edge_prop_dataframe, TCN, type_name
    692 )
    694 if self.__series_type is cudf.Series:
    695     # cudf does not yet support initialization with a scalar
--> 696     tmp_df[TCN] = cudf.Series(
    697         np.repeat(type_name, len(tmp_df)), index=tmp_df.index, dtype=cat_dtype
    698     )
    699 else:
    700     # pandas is oddly slow if dtype is passed to the constructor here
    701     tmp_df[TCN] = pd.Series(type_name, index=tmp_df.index).astype(cat_dtype)

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/series.py:536, in Series.__init__(self, data, index, dtype, name, nan_as_null)
    533         data = {}
    535 if not isinstance(data, ColumnBase):
--> 536     data = column.as_column(data, nan_as_null=nan_as_null, dtype=dtype)
    537 else:
    538     if dtype is not None:

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:1958, in as_column(arbitrary, nan_as_null, dtype, length)
   1956         data = data.astype(dtype)
   1957 elif arb_dtype.kind in ("O", "U"):
-> 1958     data = as_column(
   1959         pa.Array.from_pandas(arbitrary), dtype=arbitrary.dtype
   1960     )
   1961     # There is no cast operation available for pa.Array from int to
   1962     # str, Hence instead of handling in pa.Array block, we
   1963     # will have to type-cast here.
   1964     if dtype is not None:

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:1789, in as_column(arbitrary, nan_as_null, dtype, length)
   1783 if isinstance(arbitrary, pa.lib.HalfFloatArray):
   1784     raise NotImplementedError(
   1785         "Type casting from `float16` to `float32` is not "
   1786         "yet supported in pyarrow, see: "
   1787         "https://issues.apache.org/jira/browse/ARROW-3802"
   1788     )
-> 1789 col = ColumnBase.from_arrow(arbitrary)
   1791 if isinstance(arbitrary, pa.NullArray):
   1792     new_dtype = cudf.dtype(arbitrary.type.to_pandas_dtype())

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:302, in ColumnBase.from_arrow(cls, array)
    299 elif isinstance(array.type, ArrowIntervalType):
    300     return cudf.core.column.IntervalColumn.from_arrow(array)
--> 302 result = libcudf.interop.from_arrow(data)[0]
    304 return result._with_type_metadata(cudf_dtype_from_pa_type(array.type))

File interop.pyx:178, in cudf._lib.interop.from_arrow()

RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/copying/concatenate.cu:402: Total number of concatenated chars exceeds size_type range

Environment details

Other/Misc.

This impacts our ability to scale on ogbn-products using our cugraph-store as that runs into this error.

from ogb.nodeproppred import DglNodePropPredDataset
from dgl.contrib.cugraph.convert import cugraph_storage_from_heterograph


dataset = DglNodePropPredDataset(name="ogbn-products", root='/datasets/vjawa/gnn/')
g, labels = dataset[0]        
g = cugraph_storage_from_heterograph(g)

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

This PR fixes #2903 . We reduce the memory foot print by `3.5x` and speeds up the add_data by `557x` and also allows us to not be limited in the size of edges we can save. (Time is in seconds vs ms) Before PR: ```python3 Name (time in s, mem in bytes) Mean GPU mem GPU Leaked mem Rounds GPU Rounds ---------------------------------------------------------------------------------------------------------------------------------------------- bench_add_edge_data[15000000] 2.3044 (1.0) 2,160,000,064 (1.0) 0 (1.0) 1 1 bench_add_edge_data[30000000] 4.7941 (2.08) 4,320,000,064 (2.00) 0 (1.0) 1 1 bench_add_edge_data[60000000] 8.7235 (3.79) 8,640,000,064 (4.00) 0 (1.0) 1 1 bench_add_edge_data[120000000] FAILED ---------------------------------------------------------------------------------------------------------------------------------------------- ``` After PR ```python -------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------- Name (time in ms, mem in bytes) Mean GPU mem GPU Leaked mem Rounds GPU Rounds ------------------------------------------------------------------------------------------------------------------------------------------------ bench_add_edge_data[15000000] 16.3785 (1.0) 615,000,080 (1.0) 0 (1.0) 1 1 bench_add_edge_data[30000000] 17.3631 (1.06) 1,230,000,080 (2.00) 0 (1.0) 1 1 bench_add_edge_data[60000000] 22.2947 (1.36) 2,460,000,080 (4.00) 0 (1.0) 1 1 bench_add_edge_data[120000000] 26.9747 (1.65) 4,920,000,080 (8.00) 0 (1.0) 1 1 ------------------------------------------------------------------------------------------------------------------------------------------------ ``` Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Joseph Nke (https://github.com/jnke2016) - Brad Rees (https://github.com/BradReesWork) URL: #2924

VibhuJawa added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 9, 2022

VibhuJawa added this to the 22.12 milestone Nov 9, 2022

VibhuJawa removed the ? - Needs Triage Need team to review and classify label Nov 9, 2022

VibhuJawa assigned eriknw and VibhuJawa Nov 9, 2022

VibhuJawa mentioned this issue Nov 15, 2022

[REVIEW] Optimize PG.add_data #2924

Merged

rapids-bot bot closed this as completed in #2924 Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

VibhuJawa commented Nov 9, 2022 •

edited

Loading

[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

Comments

VibhuJawa commented Nov 9, 2022 • edited Loading

Version

Which installation method(s) does this occur on?

Describe the bug.

Minimum reproducible example

Relevant log output

Environment details

Other/Misc.

Code of Conduct

VibhuJawa commented Nov 9, 2022 •

edited

Loading