-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is ArrayLake a VectorDB? #6
Comments
Thanks for the suggestion Alex! You're correct that it's fairly easy to create a vector search interface on top of Xarray + Zarr. Here's an example: https://gist.github.com/rabernat/40f53bba3a81aeb420e14872388c6fc1 In contrast to most vector DB's on the market today, all of the index building and search happen on the client side--Arraylake doesn't provide any server-side implementations for any of this. So I'd be hesitant to characterize Arraylake as a VectorDB. |
You’re totally right. And, that’s why I like it so much! Like, you let the
user bring in faiss themselves to tune an index while making the hard
stuff, like transactions and concurrency, easy. Your caution makes sense,
but I do see the appeal of a more DIY VectorDB. Similar to how you’ve
written that the best data API is a cloud-optimized store in a bucket, I
like the appeal of a simple, “serverless” embedding store.
…On Mon, Jan 15, 2024 at 9:14 PM Ryan Abernathey ***@***.***> wrote:
Thanks for the suggestion Alex! You're correct that it's fairly easy to
create a vector search interface on top of Xarray + Zarr. Here's an
example: https://gist.github.com/rabernat/40f53bba3a81aeb420e14872388c6fc1
In contrast to most vector DB's on the market today, all of the index
building and search happen on the client side--Arraylake doesn't provide
any server-side implementations for any of this. So I'd be hesitant to
characterize Arraylake as a VectorDB.
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARXAB5OCIBQEIC6X66B7RDYOUTSNAVCNFSM6AAAAABBZ2ECWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJSGE2TKOBSGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
😍 Would you like to help turn my gist into a proper Python package? Could be a good project for your sabbatical? 😉 |
That sounds like a fun project. I’ll consider it, but I don’t expect to
have the time 😉.
…On Wed, Jan 17, 2024 at 8:41 PM Ryan Abernathey ***@***.***> wrote:
😍
Would you like to help turn my gist into a proper Python package? Could be
a good project for your sabbatical? 😉
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARXAB6PSLL2T5PFWVVA6NTYO7BITAVCNFSM6AAAAABBZ2ECWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJVG4ZTENRRGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That would be a cool package. Lot's to figure out like how many chunks per potentially distributed/sharded index and how we would reduce. I have had great success with the ResultsHeap class in faiss to "reduce" searches over sharded index(es). I have thought though that xarray could be well suited to this type of problem. |
Not that EarthMover would necessarily want to be known as yet another startup within this very competitive space. However, I think it would be cool to see a comparison between other Vector DBs and what a managed Zarr dataset could do. It seems like it would be easy to put a proof of concept together with faiss or annoy.
I think approximate similarity search algorithms could be interesting for scientific use cases (can it provide better lookups than metadata based search?). Further, I like that ArrayLake + Zarr address the Cloud and State Management shaped problems while stepping aside so the ML practitioner can choose their preferred tool for similarity search.
The text was updated successfully, but these errors were encountered: