Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI: ArrowArray::try_from_raw shouldn't clone #1425

Closed
wangfenjin opened this issue Mar 11, 2022 · 19 comments · Fixed by #1449
Closed

FFI: ArrowArray::try_from_raw shouldn't clone #1425

wangfenjin opened this issue Mar 11, 2022 · 19 comments · Fixed by #1449
Labels
arrow Changes to the arrow crate bug

Comments

@wangfenjin
Copy link
Contributor

Guys, not sure if my understanding is right, but I think this commit will break the design and create memory leak.

If we clone the FFI struct, then it means we need to free the pointer by ourself, but if we free FFI_ArrowArray, then the data in this Array will also be free? Which means we can't free the pointer(until the data are used and ready to free, but in reality we can't hold this useless pointer in a big project for such a long time), which create memory leak.

As to the question @viirya raised in #1333 , when manage memory, the one who allocate it should free it, which means in our case, we need to alloc the struct in rust and pass the pointer to java and then also free the memory in rust.

I suggest we revert this commit. cc @alamb @sunchao

Originally posted by @wangfenjin in #1334 (comment)

@viirya
Copy link
Member

viirya commented Mar 11, 2022

As to the question @viirya raised in #1333 , when manage memory, the one who allocate it should free it, which means in our case, we need to alloc the struct in rust and pass the pointer to java and then also free the memory in rust.

Simply said, I don't think this is correct. That is the whole point of C data interface. The release callback is designed to be called by not only the one allocates the C data interface structure.

@wangfenjin
Copy link
Contributor Author

wangfenjin commented Mar 11, 2022

Let me put in another way.

Let's say in Java you create an ArrowArray and import in this lib using try_from_raw API, and this API clone the ArrowArray, which means there will be two struct point to the same data, one is in Jave another is in rust. Then

  1. If you free the pointer in Java(which by design will also free all the data it point to), it will be seg fault in rust if you try to access it;
  2. And you just can't hold the Java's pointer until it's used/freed by rust, because you don't when it will be used. So it's a memory leak.

I bet currently in your Java program there is memory leak because of these.

I think the way to fix your issue is to change the Java API design, to let other language like rust to pass in a FFI_ArrowArray::empty() struct pointer, and populate data into this struct, and let rust handle the memory itself. That's why I say "the one who allocate it should free it"

@jorgecarleitao
Copy link
Member

fwiw, that is also what we do in arrow2 - we never take ownership of the pointers passed to our exporter; we just write to them (as they may have been allocated by someone else), see e.g. the example in https://jorgecarleitao.github.io/arrow2/ffi.html

We did file a rust advisory on this last week, https://rustsec.org/advisories/RUSTSEC-2022-0012.html, as we incorrectly derived Clone to the FFI structs

@viirya
Copy link
Member

viirya commented Mar 11, 2022

Let me put in another way.

Let's say in Java you create an ArrowArray and import in this lib using try_from_raw API, and this API clone the ArrowArray, which means there will be two struct point to the same data, one is in Jave another is in rust. Then

1. If you free the pointer in Java(which by design will also free all the data it point to), it will be seg fault in rust if you try to access it;

2. And you just can't hold the Java's pointer until it's used/freed by rust, because you don't when it will be used. So it's a memory leak.

I bet currently in your Java program there is memory leak because of these.

I'm not sure if you really use this FFI between Java and Rust. We have real use case to use it in an internal project. We found this issue and the fix works very well after upgrading to latest version. And, no, I don't detect memory leak there. Let me explain more.

First, of course if you free the pointer in Java, you will release the data and Rust cannot access it anymore. If you really want to free it, make sure to clear up the release field in the struct. This is easy to do with Java FFI API. In our case, the struct in Java won't be released until the arrays are exhausted by Rust. So we don't even need to do that.

In contrast, I don't know what you describe will work better. Let's say we don't clone it, then in Java and Rust, there is only one struct. So, if you free the pointer in Java, it still releases the data, isn't? The only possible is to check the release field of the struct to know if the struct is released or not by Rust. You even cannot mark it released. I don't see there is any better point.

The only missing piece, here, is that we need to mark the source struct released in Rust side. This is required by "Moving an array" which talks about the case to clone the struct. I can do it later.

I think the way to fix your issue is to change the Java API design, to let other language like rust to pass in a FFI_ArrowArray::empty() struct pointer, and populate data into this struct, and let rust handle the memory itself. That's why I say "the one who allocate it should free it"

Although I think this will work but I don't think it is a good idea. By doing that, it means to add an additional JNI call to Rust side to get the empty struct pointer. We're likely to have less JNI call as possible. I agree that the struct should be released by who allocates it. Actually you cannot free Java struct in Rust, and vice versa. I don't think we did this or intend to do it.

@wangfenjin
Copy link
Contributor Author

wangfenjin commented Mar 12, 2022

I get your point. You are calling rust from Java, so it's right you alloc the memory in Java. The problem is rust assume this pointer is from Arc::into_raw, but acctually it's not it's just a raw pointer from JNI. In this case, our solution should be add a new API like:

    /// creates a new [ArrowArray] from two pointers. Used to import from the C Data Interface.
    /// # Safety
    /// See safety of [ArrowArray]
    /// # Error
    /// Errors if any of the pointers is null
    pub unsafe fn try_from_c(
        array: *const FFI_ArrowArray,
        schema: *const FFI_ArrowSchema,
    ) -> Result<Self> {
        if array.is_null() || schema.is_null() {
            return Err(ArrowError::MemoryError(
                "At least one of the pointers passed to `try_from_raw` is null"
                    .to_string(),
            ));
        };
        Ok(Self {
            array: Arc::new(*array), // notice we don't need to clone here, if will also solve your problem
            schema: Arc::new(*schema),
        })
    }

The idea is we don't need to Clone, why bother? As in your Java program you need to hold the pointer as long as you need it, and can only free it when the job is done.

There are several drawbacks in the current solution:

  1. It breaks current API convention. Anyone use the API like me will be detected memory leak because of this change.
  2. You can say your Java program don't leak memory because you always holding the pointer, but in some aspect we can say it's, because we alloc two struct point to the same data, which can be easily and should managed in one.
  3. Allow clone will make us have the same issue mentioned by @jorgecarleitao , I add a demo here Test segfault on drop #1431

About the link you mentioned Moving an array, please noted that it talks about consumer which is rust in this case, need to call release_call if it tries to move the data, seems have nothing to do with our discuess here.

If you really want to free it, make sure to clear up the release field in the struct. This is easy to do with Java FFI API.

In here you are talking about the producer side, but actually you can't just clear up the release field and forget about it. What if rust have any issue to release the data, Java as the producer, as the one alloc the memory, should still be responsible to release it. Again, "the one who allocate it should free it".

@wangfenjin
Copy link
Contributor Author

The issue is our current FFI API design assume the pointer always comes from Arc::into_raw(), we should solve this problem, but Clone is not the solution.

@viirya
Copy link
Member

viirya commented Mar 12, 2022

The idea is we don't need to Clone, why bother? As in your Java program you need to hold the pointer as long as you need it, and can only free it when the job is done.

Are you sure it works?

error[E0507]: cannot move out of `*array` which is behind a raw pointer
...
error[E0507]: cannot move out of `*schema` which is behind a raw pointer

@viirya
Copy link
Member

viirya commented Mar 12, 2022

  • It breaks current API convention. Anyone use the API like me will be detected memory leak because of this change.

  • You can say your Java program don't leak memory because you always holding the pointer, but in some aspect we can say it's, because we alloc two struct point to the same data, which can be easily and should managed in one.

  • Allow clone will make us have the same issue mentioned by @jorgecarleitao , I add a demo here

I don't know how do you use the API. If the memory leak is happened because you need to hold the pointer in Java side, I think we can mark the source structs released after cloning it. So you can free the struct in Java.

The issue you demo is also resolvable by marking the source struct released.

@wangfenjin
Copy link
Contributor Author

The idea is we don't need to Clone, why bother? As in your Java program you need to hold the pointer as long as you need it, and can only free it when the job is done.

Are you sure it works?

error[E0507]: cannot move out of `*array` which is behind a raw pointer
...
error[E0507]: cannot move out of `*schema` which is behind a raw pointer

How you run the code? From my side:

✘ 🍻   ~/github/arrow-rs/arrow   flight-sql-client ±  cargo test -- --nocapture test_clone_array_segfault_on_drop
   Compiling arrow v10.0.0 (/Users/wangfenjin/github/arrow-rs/arrow)
    Finished test [unoptimized + debuginfo] target(s) in 41.79s
     Running unittests (/Users/wangfenjin/github/arrow-rs/target/debug/deps/arrow-01a41b09de9e05da)

running 1 test
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/Users/wangfenjin/github/arrow-rs/target/debug/deps/arrow-01a41b09de9e05da --nocapture test_clone_array_segfault_on_drop` (signal: 11, SIGSEGV: invalid memory reference)

@wangfenjin
Copy link
Contributor Author

It should work otherwise the change you make should also break. they both do *array

         let ffi_array = (*array).clone();

@viirya
Copy link
Member

viirya commented Mar 12, 2022

How you run the code? From my side:

✘ 🍻   ~/github/arrow-rs/arrow   flight-sql-client ±  cargo test -- --nocapture test_clone_array_segfault_on_drop
   Compiling arrow v10.0.0 (/Users/wangfenjin/github/arrow-rs/arrow)
    Finished test [unoptimized + debuginfo] target(s) in 41.79s
     Running unittests (/Users/wangfenjin/github/arrow-rs/target/debug/deps/arrow-01a41b09de9e05da)

running 1 test
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/Users/wangfenjin/github/arrow-rs/target/debug/deps/arrow-01a41b09de9e05da --nocapture test_clone_array_segfault_on_drop` (signal: 11, SIGSEGV: invalid memory reference)

I mean the solution you suggested:

Ok(Self {
            array: Arc::new(*array), // notice we don't need to clone here, if will also solve your problem
            schema: Arc::new(*schema),
        })

@wangfenjin
Copy link
Contributor Author

Oh I haven't test the suggested solution yet, if it doesn't work maybe we can make it Box<> as the arrow2 API?

I'm not so familiar with this

@viirya
Copy link
Member

viirya commented Mar 12, 2022

It should work otherwise the change you make should also break. they both do *array

         let ffi_array = (*array).clone();

No...It is basically about Rust ownership rule. In short, Arc::new(*array) will try to own the data behind array raw pointer. So Rust complains it cannot move the data.

(*array).clone() doesn't move data but clone it.

@viirya
Copy link
Member

viirya commented Mar 12, 2022

Yeah, if you or others definitely insist we should not do clone (although I still don't agree with it), I can try to work on to get rid of it with Box or something other. I cannot agree you is that we don't think this is a real issue (that the API assumes the raw pointers are Arc pointers) and try to revert to original behavior.

About the link you mentioned Moving an array, please noted that it talks about consumer which is rust in this case, need to call release_call if it tries to move the data, seems have nothing to do with our discuess here.

I don't agree with you here too. It is obviously talking out you can copy (i.e., clone) these structs, only if you mark source structs as released (I agree this is missed now).

@viirya
Copy link
Member

viirya commented Mar 12, 2022

So, from my side, the previous behavior is definitely wrong (assuming the passed in raw pointers are always Arc pointers).

I can think of the options:

  • Mark source structs as released after cloning them. I think this follows the C data interface. This should allow the source structs to be freed at Java side and resolve the memory leak issue and double dropping issue.
  • Try other solution like Box.

WDYT?

@wangfenjin
Copy link
Contributor Author

I think Box might be a better solution as we can avoid clone. If you don't have time I can also help on this issue. Thanks

@viirya
Copy link
Member

viirya commented Mar 12, 2022

Okay, I can work on this in the weekend. See if Box can save us from clone.

Thanks for the discussion.

@alamb
Copy link
Contributor

alamb commented Mar 12, 2022

I haven't followed all the discussion on this ticket, but it seems like it is well in hand. Let me know if I can do anything else to help

@viirya
Copy link
Member

viirya commented Mar 12, 2022

Thanks @alamb

@alamb alamb added arrow Changes to the arrow crate bug labels Mar 31, 2022
@alamb alamb changed the title ArrowArray::try_from_raw shouldn't clone FFI: ArrowArray::try_from_raw shouldn't clone Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
4 participants