Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: PG creates string series instead of categorical series leading to memory overhead #2903

Closed
2 tasks done
VibhuJawa opened this issue Nov 9, 2022 · 0 comments · Fixed by #2924
Closed
2 tasks done
Assignees
Labels
bug Something isn't working
Milestone

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Nov 9, 2022

Version

22.12

Which installation method(s) does this occur on?

Conda

Describe the bug.

We create cudf series using the type and then categorize it in our PG implimentation while we should do it the other-way as this introduces memory overhead and also runs into cudf's int64 size_t limitations.

tmp_df[TCN] = cudf.Series(
np.repeat(type_name, len(tmp_df)), index=tmp_df.index, dtype=cat_dtype
)

CC: @eriknw

Minimum reproducible example

import cupy as cp
import cudf
from cugraph.experimental import PropertyGraph
pg = PropertyGraph()
n_rows = 120_000_000
src = cp.arange(n_rows)
dst = src-1
df = cudf.DataFrame({'src':src, 'dst':dst})
pg.add_edge_data(df,['src','dst'],type_name="('_N', '_E', '_N')")

Relevant log output

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 9
      7 dst = src-1
      8 df = cudf.DataFrame({'src':src, 'dst':dst})
----> 9 pg.add_edge_data(df,['src','dst'],type_name="('_N', '_E', '_N')")

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cugraph-22.12.0a0+78.g9633224fd.dirty-py3.9-linux-x86_64.egg/cugraph/structure/property_graph.py:696, in EXPERIMENTAL__PropertyGraph.add_edge_data(self, dataframe, vertex_col_names, edge_id_col_name, type_name, property_columns)
    690 cat_dtype = self.__update_categorical_dtype(
    691     self.__edge_prop_dataframe, TCN, type_name
    692 )
    694 if self.__series_type is cudf.Series:
    695     # cudf does not yet support initialization with a scalar
--> 696     tmp_df[TCN] = cudf.Series(
    697         np.repeat(type_name, len(tmp_df)), index=tmp_df.index, dtype=cat_dtype
    698     )
    699 else:
    700     # pandas is oddly slow if dtype is passed to the constructor here
    701     tmp_df[TCN] = pd.Series(type_name, index=tmp_df.index).astype(cat_dtype)

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/series.py:536, in Series.__init__(self, data, index, dtype, name, nan_as_null)
    533         data = {}
    535 if not isinstance(data, ColumnBase):
--> 536     data = column.as_column(data, nan_as_null=nan_as_null, dtype=dtype)
    537 else:
    538     if dtype is not None:

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:1958, in as_column(arbitrary, nan_as_null, dtype, length)
   1956         data = data.astype(dtype)
   1957 elif arb_dtype.kind in ("O", "U"):
-> 1958     data = as_column(
   1959         pa.Array.from_pandas(arbitrary), dtype=arbitrary.dtype
   1960     )
   1961     # There is no cast operation available for pa.Array from int to
   1962     # str, Hence instead of handling in pa.Array block, we
   1963     # will have to type-cast here.
   1964     if dtype is not None:

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:1789, in as_column(arbitrary, nan_as_null, dtype, length)
   1783 if isinstance(arbitrary, pa.lib.HalfFloatArray):
   1784     raise NotImplementedError(
   1785         "Type casting from `float16` to `float32` is not "
   1786         "yet supported in pyarrow, see: "
   1787         "https://issues.apache.org/jira/browse/ARROW-3802"
   1788     )
-> 1789 col = ColumnBase.from_arrow(arbitrary)
   1791 if isinstance(arbitrary, pa.NullArray):
   1792     new_dtype = cudf.dtype(arbitrary.type.to_pandas_dtype())

File /datasets/vjawa/miniconda3/envs/cugraph_dgl_22_12/lib/python3.9/site-packages/cudf/core/column/column.py:302, in ColumnBase.from_arrow(cls, array)
    299 elif isinstance(array.type, ArrowIntervalType):
    300     return cudf.core.column.IntervalColumn.from_arrow(array)
--> 302 result = libcudf.interop.from_arrow(data)[0]
    304 return result._with_type_metadata(cudf_dtype_from_pa_type(array.type))

File interop.pyx:178, in cudf._lib.interop.from_arrow()

RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/copying/concatenate.cu:402: Total number of concatenated chars exceeds size_type range

Environment details

Other/Misc.

This impacts our ability to scale on ogbn-products using our cugraph-store as that runs into this error.

from ogb.nodeproppred import DglNodePropPredDataset
from dgl.contrib.cugraph.convert import cugraph_storage_from_heterograph


dataset = DglNodePropPredDataset(name="ogbn-products", root='/datasets/vjawa/gnn/')
g, labels = dataset[0]        
g = cugraph_storage_from_heterograph(g)

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@VibhuJawa VibhuJawa added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 9, 2022
@VibhuJawa VibhuJawa added this to the 22.12 milestone Nov 9, 2022
@VibhuJawa VibhuJawa removed the ? - Needs Triage Need team to review and classify label Nov 9, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 15, 2022
This PR fixes #2903  . 

We reduce the memory foot print by  `3.5x` and speeds up the add_data by `557x` and also allows us to not be limited in the size of edges we can save.  (Time is in seconds vs ms) 

Before PR:
```python3
Name (time in s, mem in bytes)       Mean                  GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
----------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edge_data[15000000]      2.3044 (1.0)      2,160,000,064 (1.0)                   0 (1.0)           1           1
bench_add_edge_data[30000000]      4.7941 (2.08)     4,320,000,064 (2.00)                  0 (1.0)           1           1
bench_add_edge_data[60000000]      8.7235 (3.79)     8,640,000,064 (4.00)                  0 (1.0)           1           1
bench_add_edge_data[120000000]  FAILED
----------------------------------------------------------------------------------------------------------------------------------------------
```


After PR
```python
-------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------
Name (time in ms, mem in bytes)        Mean                  GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
------------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edge_data[15000000]       16.3785 (1.0)        615,000,080 (1.0)                   0 (1.0)           1           1
bench_add_edge_data[30000000]       17.3631 (1.06)     1,230,000,080 (2.00)                  0 (1.0)           1           1
bench_add_edge_data[60000000]       22.2947 (1.36)     2,460,000,080 (4.00)                  0 (1.0)           1           1
bench_add_edge_data[120000000]      26.9747 (1.65)     4,920,000,080 (8.00)                  0 (1.0)           1           1
------------------------------------------------------------------------------------------------------------------------------------------------
```

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)

Approvers:
  - Joseph Nke (https://github.com/jnke2016)
  - Brad Rees (https://github.com/BradReesWork)

URL: #2924
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants