Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to update Table.column value in libcudf Java API #17652

Open
hiroki-nishimoto-fixstars opened this issue Dec 23, 2024 · 2 comments
Open
Labels
question Further information is requested

Comments

@hiroki-nishimoto-fixstars
Copy link

hiroki-nishimoto-fixstars commented Dec 23, 2024

What is your question?

2 Questions.

  1. How can I write bellow cudf(Python) code to libcudf(Java) code?
  2. Are there any introduction or documents or example codes for libcudf(Java)?

Hi. I'm trying to rewrite cudf(Python) code to libcudf(Java) code.

In my understanding, they are implemented diffrenct concept. So, I do not believe that they are completely compatible, of course.

How can I write a process in libcudf(Java) that corresponds to the following process of updating a Series in a cudf.DataFrame?

libcudf(Java) prohibit to access Table.columns and don't have update column API. ex) setColumn(index, value)

import pandas as pd
df = pd.DataFrame({"a":[1,2,3], "b" : [4,5,6]})
df["a"] = [9,8,7]
print(df)


'''
output
  a b
0 9 4
1 8 5
2 7 6
'''

A simple implementation would yield the following code, which does not seem very smart.

Table df = ... // omission

ColumnVector update_a = ColumnVector.fromInts(9,8,7);
int numCols = df.getNumberOfColumns();
ColumnVector cols = new ColumnVector[numCols];
for( int i = 0 ; i < numCols; i++ ){
    if( i != 0 ){
        cols[i] = df.getColumn[i];
    }
    else {
        cols[i] = update_a;
    }
}
df = new Table(cols);

Also, I know there is libcudf(C++) documentation, but I could not find any Java documentation or example code. Is there somewhere I can find it?

Now I'm trying to implement with reading Test code and API implementation...

@hiroki-nishimoto-fixstars hiroki-nishimoto-fixstars added the question Further information is requested label Dec 23, 2024
@hiroki-nishimoto-fixstars hiroki-nishimoto-fixstars changed the title [QST] How to update Table.column value in libcudf JAVA API [QST] How to update Table.column value in libcudf Java API Dec 23, 2024
@GregoryKimball
Copy link
Contributor

@ttnghia would you please share your thoughts on the best way to update a column using the libcudf Java API?

@ttnghia
Copy link
Contributor

ttnghia commented Jan 6, 2025

libcudf(Java) prohibit to access Table.columns

No, cudf Java provides the function getColumn(int index) that allows accessing to the table columns and you already used it in your example. Your example above is almost correct. Look at the documentation of Table.getColumn:

   * .....If you want to keep a reference to
   * the column around past the life time of the table, you will need to increment the reference
   * count on the column yourself.

That means, this is needed when calling getColumn:

if( i != 0 ){
        cols[i] = df.getColumn[i];
        cols[i].incRefCount();
    }

Add that second line to your example. That's all you need. When the original table is destroyed, the reused columns are not destroyed but moved to be managed by the new table because their reference counts have been increased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants