Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve integration with Haskell's and other ecosystems #9

Open
YPares opened this issue Nov 16, 2018 · 5 comments
Open

Improve integration with Haskell's and other ecosystems #9

YPares opened this issue Nov 16, 2018 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@YPares
Copy link
Owner

YPares commented Nov 16, 2018

For now, porcupine only provides connectors with very simple datatypes (CSV and JSON). I think in order to enhance integration of other tools we should aim at the following goals:

  • Providing one-function read of hmatrix data (via CSV)
  • Reading and writing frames (https://hackage.haskell.org/package/Frames)
  • Thinking about how Apache Arrow could be integrated in that ecosystem. Haskell support for Arrow is basically nonexistent, but, given one one the goals of porcupine down the road is to provide easy external tasks (ideally, writing in place a python/R script as a task), it could really enhance interoperability with Python/R ecosystems to have a standard in-memory representation that could be passed between different runtimes.
@YPares YPares changed the title Improve integration with Haskell and other ecosystems Improve integration with Haskell's and other ecosystems Nov 16, 2018
@YPares YPares added the enhancement New feature or request label Nov 16, 2018
@tscholak
Copy link

tscholak commented Oct 9, 2019

+1 on Apache Arrow support

@YPares
Copy link
Owner Author

YPares commented Oct 9, 2019

@tscholak So some work has been started here https://github.com/mrkkrp/hs-arrow/tree/mk/nix
by reusing the bindings generated by stephenpascoe by gi introspection (and the C GObj arrow interface). I'd like to see if we need to integrate that at the porcupine level or if it can remain client code.

@tscholak
Copy link

tscholak commented Oct 9, 2019

Ah, interesting!

@tscholak
Copy link

tscholak commented Oct 9, 2019

Do you have an opinion on pipes or conduit connectors?

@YPares
Copy link
Owner Author

YPares commented Oct 9, 2019

@tscholak Currently, loadDataStream can open several similar files and give you a Stream. writeDataStream is its counterpart.

Opening a stream out of one big file/writing one isn't easy for now (right now it would require you to write new serials, although #40 should improve that and smplify some code in the process. It shouldn't be too complicated.). We use Streams internally but for now they aren't exposed.

But then converting Stream to Conduits or Pipes afterwards is very easy (we do it internally the other way around in porcupine-s3 and http), see "§ 7. Interoperation with the streaming-io libraries" in the readme of http://hackage.haskell.org/package/streaming

We use streaming instead of pipes/conduit because the API is much simpler (notably because it allows you to map your knowledge from Data.List to streams).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants