Improve integration with Haskell's and other ecosystems #9

YPares · 2018-11-16T10:14:08Z

For now, porcupine only provides connectors with very simple datatypes (CSV and JSON). I think in order to enhance integration of other tools we should aim at the following goals:

Providing one-function read of hmatrix data (via CSV)
Reading and writing frames (https://hackage.haskell.org/package/Frames)
Thinking about how Apache Arrow could be integrated in that ecosystem. Haskell support for Arrow is basically nonexistent, but, given one one the goals of porcupine down the road is to provide easy external tasks (ideally, writing in place a python/R script as a task), it could really enhance interoperability with Python/R ecosystems to have a standard in-memory representation that could be passed between different runtimes.

tscholak · 2019-10-09T13:08:33Z

+1 on Apache Arrow support

YPares · 2019-10-09T14:04:32Z

@tscholak So some work has been started here https://github.com/mrkkrp/hs-arrow/tree/mk/nix
by reusing the bindings generated by stephenpascoe by gi introspection (and the C GObj arrow interface). I'd like to see if we need to integrate that at the porcupine level or if it can remain client code.

tscholak · 2019-10-09T14:07:26Z

Ah, interesting!

tscholak · 2019-10-09T14:07:57Z

Do you have an opinion on pipes or conduit connectors?

YPares · 2019-10-09T15:28:51Z

@tscholak Currently, loadDataStream can open several similar files and give you a Stream. writeDataStream is its counterpart.

Opening a stream out of one big file/writing one isn't easy for now (right now it would require you to write new serials, although #40 should improve that and smplify some code in the process. It shouldn't be too complicated.). We use Streams internally but for now they aren't exposed.

But then converting Stream to Conduits or Pipes afterwards is very easy (we do it internally the other way around in porcupine-s3 and http), see "§ 7. Interoperation with the streaming-io libraries" in the readme of http://hackage.haskell.org/package/streaming

We use streaming instead of pipes/conduit because the API is much simpler (notably because it allows you to map your knowledge from Data.List to streams).

YPares changed the title ~~Improve integration with Haskell and other ecosystems~~ Improve integration with Haskell's and other ecosystems Nov 16, 2018

YPares added the enhancement New feature or request label Nov 16, 2018

YPares added the good first issue Good for newcomers label Oct 15, 2019

YPares mentioned this issue Oct 16, 2019

Investigate other exchange formats for config files #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve integration with Haskell's and other ecosystems #9

Improve integration with Haskell's and other ecosystems #9

YPares commented Nov 16, 2018

tscholak commented Oct 9, 2019

YPares commented Oct 9, 2019

tscholak commented Oct 9, 2019

tscholak commented Oct 9, 2019

YPares commented Oct 9, 2019

Improve integration with Haskell's and other ecosystems #9

Improve integration with Haskell's and other ecosystems #9

Comments

YPares commented Nov 16, 2018

tscholak commented Oct 9, 2019

YPares commented Oct 9, 2019

tscholak commented Oct 9, 2019

tscholak commented Oct 9, 2019

YPares commented Oct 9, 2019