Skip to content

Commit

Permalink
Change C Data Interface section to Sharing Arrow data and add info fo…
Browse files Browse the repository at this point in the history
…r IPC
  • Loading branch information
AlenkaF committed May 23, 2024
1 parent 95f7b7f commit 7610041
Showing 1 changed file with 14 additions and 51 deletions.
65 changes: 14 additions & 51 deletions docs/source/format/Intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -430,62 +430,25 @@ For this reason canonical extension types are defined in Arrow itself.

Community Extension Types
-------------------------
These are Arrow extension types that have been established as standards within specific domain areas.
These are Arrow extension types that have been established as standards within specific
domain areas.

Example:

* `GeoArrow`_: A collection of Arrow extension types for representing vector geometries

.. _GeoArrow: https://github.com/geoarrow/geoarrow

The Arrow C Data Interface
==========================

Arrow memory layout is meant to be a universal standard for tabular data, not tied to a specific
implementation.

While there are specifications to share Arrow data between processes or over the network (e.g. the
IPC messages), the Arrow C Data Interface is meant to actually zero-copy share the data between
different libraries within the same process (i.e. actually share the same buffers in memory).

The Arrow C Data Interface defines a set of small C structures:

.. code-block::
struct ArrowSchema {
const char* format;
const char* name;
const char* metadata;
int64_t flags;
int64_t n_children;
struct ArrowSchema** children;
struct ArrowSchema* dictionary;
// Release callback
void (*release)(struct ArrowSchema*);
// Opaque producer-specific data
void* private_data;
};
struct ArrowArray {
int64_t length;
int64_t null_count;
int64_t offset;
int64_t n_buffers;
int64_t n_children;
const void** buffers;
struct ArrowArray** children;
struct ArrowArray* dictionary;
// Release callback
void (*release)(struct ArrowArray*);
// Opaque producer-specific data
void* private_data;
};
The C Data Interface passes Arrow data buffers through memory pointers. So, by construction, it allows
you to share data from one runtime to another without copying it. Since the data is in standard Arrow
in-memory format, its layout is well-defined and unambiguous.
Sharing Arrow data
==================

.. seealso::
The :ref:`c-data-interface` documentation.
Arrow memory layout is meant to be a universal standard for representing tabular data in memory,
not tied to a specific implementation. Next step is the specification for sharing the data
in Arrow format that is well-defined and unambiguous between applications.

* Protocol to share Arrow data between processes or over the network is called :ref:`format-ipc`.
The specification for sharing data is called IPC message format which defines how Arrow
array or record batch buffers are stacked together to be serialized and deserialized.

* To share Arrow data in the same process :ref:`c-data-interface` is used, meant for sharing
the same buffer zero-copy in memory between different libraries within the same process.

0 comments on commit 7610041

Please sign in to comment.