Replies: 1 comment 6 replies
-
Hi @vishakh, I think you are indeed looking for nada-numpy here. You're spot on in the sense that we do try to make array storing & fetching as similar to the equivalent operations with integers, bools, etc. as possible. Permissioning etc. still works in the same way. Migrating to nada-numpy should not lead to any issues here - let us know if it does though! As for maximum data size, this is highly dependent on the specific operations you want to use, so difficult to put a single number on it. Hope this helped! |
Beta Was this translation helpful? Give feedback.
-
Greetings,
We need hep with figuring out how to easily perform nontrivial operations with our data.
Users provide genomic data in FASTQ format files such as https://raw.githubusercontent.com/Cryptonomic/MonadicDNA/e1f71f4f7d28b5dae183a43f94b7f3bf4fa59453/services/nillion-interactor/testdata/hu278AF5_20210124151934.txt.
So far, we have been int-encoding the data row by row and storing each row as an individual secret on Nillion: https://github.com/Cryptonomic/MonadicDNA/blob/e1f71f4f7d28b5dae183a43f94b7f3bf4fa59453/services/nillion-interactor/app.py#L72C11-L100
This lets us perform trivial (but valuable) computations relating to disease risk, e.g. https://github.com/Cryptonomic/MonadicDNA/blob/e1f71f4f7d28b5dae183a43f94b7f3bf4fa59453/services/nillion-interactor/programs/thrombosis.py.
Ideally, we would like to store large chunks of the input data and process them within Nada programs as this unlocks a lot of additional use cases related to ancestry, data analysis, machine learning, etc. The docs indicate this is not possible at the moment and even blobs are purely for storage and retrieval.
We haven't yet looked at the nada-numpy and nada-ai libraries in detail but their presence indicates some intention to operate on tabular or arrayed data.
Is there any way to accomplish what we want to do?
Thank you.
EDIT: I just had another quick look at nada-numpy. It seems to create abstractions over individual secrets to allow array operations. However, it's still not clear to me whether we can store and fetch secrets as arrays just as we do integers and bools in way that is compatible with our use case with everything like permissioning working the same way. It's also unclear what the performance tradeoffs and maximum data sizes are.
EDIT2: I see on https://github.com/NillionNetwork/nada-numpy/blob/main/examples/linear_regression/tests/linear_regression_256_2.yaml that we still need to store secrets cell by cell. I wonder whether this can be abstracted over using some syntactic sugar. The tests also look like they don't actually load a lot of data so I also wonder what the eventual goal is for maximum data size.
Beta Was this translation helpful? Give feedback.
All reactions