Add RAG template for Timescale Vector (langchain-ai#12651)

--------- Co-authored-by: Matvey Arye <[email protected]>
xieqihui · Nov 21, 2023 · 0becf5f · 0becf5f
1 parent 97488d1
commit 0becf5f
Show file tree

Hide file tree

Showing 9 changed files with 1,774 additions and 2 deletions.
diff --git a/docs/docs/integrations/vectorstores/timescalevector.ipynb b/docs/docs/integrations/vectorstores/timescalevector.ipynb
@@ -10,7 +10,7 @@
     "This notebook shows how to use the Postgres vector database `Timescale Vector`. You'll learn how to use TimescaleVector for (1) semantic search, (2) time-based vector search, (3) self-querying, and (4) how to create indexes to speed up queries.\n",
     "\n",
     "## What is Timescale Vector?\n",
-    "**[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications.**\n",
+    "**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) is PostgreSQL++ for AI applications.**\n",
     "\n",
     "Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n",
     "- Enhances `pgvector` with faster and more accurate similarity search on 100M+ vectors via `DiskANN` inspired indexing algorithm.\n",
@@ -23,7 +23,7 @@
     "- Enables a worry-free experience with enterprise-grade security and compliance.\n",
     "\n",
     "## How to access Timescale Vector\n",
-    "Timescale Vector is available on [Timescale](https://www.timescale.com/ai), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)\n",
+    "Timescale Vector is available on [Timescale](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)\n",
     "\n",
     "LangChain users get a 90-day free trial for Timescale Vector.\n",
     "- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!\n",

diff --git a/templates/rag-timescale-hybrid-search-time/LICENSE b/templates/rag-timescale-hybrid-search-time/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 LangChain, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/templates/rag-timescale-hybrid-search-time/README.md b/templates/rag-timescale-hybrid-search-time/README.md
@@ -0,0 +1,63 @@
+# RAG with Timescale Vector using hybrid search
+
+This template shows how to use timescale-vector with the self-query retriver to perform hybrid search on similarity and time.
+This is useful any time your data has a strong time-based component. Some examples of such data are:
+- News articles (politics, business, etc)
+- Blog posts, documentation or other published material (public or private).
+- Social media posts
+- Changelogs of any kind
+- Messages
+
+Such items are often searched by both similarity and time. For example: Show me all news about Toyota trucks from 2022.
+
+[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral)  provides superior performance when searching for embeddings within a particular
+timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
+
+Langchain's self-query retriever allows deducing time-ranges (as well as other search criteria) from the text of user queries.
+
+## What is Timescale Vector?
+**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) is PostgreSQL++ for AI applications.**
+
+Timescale Vector enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.
+- Enhances `pgvector` with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm.
+- Enables fast time-based vector search via automatic time-based partitioning and indexing.
+- Provides a familiar SQL interface for querying vector embeddings and relational data.
+
+Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:
+- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
+- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
+- Enables a worry-free experience with enterprise-grade security and compliance.
+
+### How to access Timescale Vector
+Timescale Vector is available on [Timescale](https://www.timescale.com/products?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)
+
+- LangChain users get a 90-day free trial for Timescale Vector.
+- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!
+- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python.
+
+### Using Timescale Vector with this template
+
+This template uses TimescaleVector as a vectorstore and requires that `TIMESCALES_SERVICE_URL` is set.
+
+##  LLM
+
+Be sure that `OPENAI_API_KEY` is set in order to the OpenAI models.
+
+## Loading sample data
+
+We have provided a sample dataset you can use for demoing this template. It consists of the git history of the timescale project.
+
+To load this dataset, set the `LOAD_SAMPLE_DATA` environmental variable.
+
+## Loading your own dataset.
+
+To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
+This code defines the name of the collection, how to load the data, and the human-language description of both the
+contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
+to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.
+
+## Using in your own applications
+
+This is a standard LangServe template. Instructions on how to use it with your LangServe applications are [here](https://github.com/langchain-ai/langchain/blob/master/templates/README.md).
+
+