Skip to content

Latest commit



284 lines (214 loc) · 12.6 KB

File metadata and controls

284 lines (214 loc) · 12.6 KB

Tutorial on RAG

Prepare Models

Get the models for text embedding and QA ranking that are supported, for example,

Use to convert the models.

Prepare Vector Store

ChatLLM.cpp converts a raw data file into a vector store file.

  1. Generate raw data files

    The format of the raw data file is very simple: every two lines is a record, where the first line is Base64 encoded text content, and the second one is Base64 encoded meta data. Meta data is also a string which may contain anything, such as serialized JSON data.

    Here is an example. fruits.dat contains three records.

    import json, base64
    def to_b64(data) -> str:
        s = json.dumps(data, ensure_ascii=False) if isinstance(data, dict) else data
        return base64.b64encode(s.encode('utf-8')).decode()
    def gen_test_data():
        texts = [
            {'page_content': 'the apple is black', 'metadata': {'file': 'a.txt'}},
            {'page_content': 'the orange is green', 'metadata': {'file': '2.txt'}},
            {'page_content': 'the banana is red', 'metadata': {'file': '3.txt'}},
        with open('fruits.dat', 'w') as f:
            for x in texts:
    if __name__ == "__main__":
  2. Convert raw data file into vector store

    ./bin/main --embedding_model ../quantized/bce_em.bin --init_vs /path/to/fruits.dat

    Note that we must specify the text embedding model. The vector store file will be save to fruits.dat.vsdb.

Chat with RAG

Now let's chat with RAG. You can select any support LLM as backend and compare their performance.

Using MiniCPM

Here we are using MiniCPM DPO-2B, and QA ranking model is also used.

./bin/main -i -m /path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb
    ________          __  __    __    __  ___
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by MiniCPM,                    /_/   /_/
with 2724880896 (2.7B) parameters.
Augmented by BCE-Embedding (0.2B) and BCE-ReRanker (0.2B).

You  > what color is the banana?
A.I. >  Based on the given information, the color of the banana is red.

1. {"file": "3.txt"}
You  > write a quick sort function in python
A.I. >  Sure, here's a simple implementation of the QuickSort algorithm in Python:

def quick_sort(arr):
    if len(arr) <= 1:
        return arr
        pivot = arr[0]
        less = [i for i in arr[1:] if i <= pivot]
        greater = [i for i in arr[1:] if i > pivot]
        return quick_sort(less) + [pivot] + quick_sort(greater)

Note that, no information from the vector store is used when answering the second question.

Using Qwen-QAnything

Here we are using Qwen-QAnything-7B, which is a bilingual instruction-tuned model of Qwen-7B for QAnything, and QA ranking model is also used.

./bin/main -i --temp 0 -m path/to/qwen-qany-7b.bin --embedding_model /path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb --rag_template "参考信息:\n{context}\n---\n我的问题或指令:\n{question}\n---\n请根据上述参考信息回答我的问题或回复我的指令。前面的参考信息可能有用,也可能没用,你需要从我给出的参考信息中选出与我的问题最相关的那些,来为你的回答提供依据。回答一定要忠于原文,简洁但不丢信息,不要胡乱编造。我的问题或指令是什么语种,你就用什么语种回复."
    ________          __  __    __    __  ___ (通义千问)
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by QWen,                       /_/   /_/
with 7721324544 (7.7B) parameters.
Augmented by BCE-Embedding (0.2B) and BCE-ReRanker (0.2B).

You  > what color is the banana?
A.I. > the banana is red.

1. {"file": "3.txt"}
You  > write a quick sort function in python
A.I. > Sure, here's a quick sort function in Python:

def quicksort(arr):
    if len(arr) <= 1:
        return arr

Note that, we are using the default prompt template provided by QAnything.

Retrieving Only

It's also possible to run in retrieving only mode, i.e. without LLM, and +rag_dump can be used to dump the retrieved (and re-ranked) documents.

./bin/main -i --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb +rag_dump
    ________          __  __    __    __  ___
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
No LLM is loaded.                             /_/   /_/
Augmented by BCE-Embedding (0.2B) and BCE-ReRanker (0.2B).

You  > what color is the banana?
A.I. > {"file": "3.txt"}
the banana is red

1. {"file": "3.txt"}

Inter-operations between embedding and reranker models

It is possible to use a reranker model with an embedding model from another developer. Remember that max context length of different models may also differ too. For example, let use BGE-ReRanker-M3 and BCE-Embedding for augmentation:

./bin/main -i -m path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bge_reranker.bin --vector_store /path/to/fruits.dat.vsdb
    ________          __  __    __    __  ___
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by MiniCPM,                    /_/   /_/
with 2724880896 (2.7B) parameters.
Augmented by BCE-Embedding (0.2B) and BGE-ReRanker-M3 (0.4B).

You  > what color is the banana?
A.I. >  Based on the given information, the color of the banana is red.

1. {"file": "3.txt"}

Query rewriting

Rewriting the query with LLM before retrieving is also supported. Two options are provided:

  • --retrieve_rewrite_template TEMPLATE

    TEMPLATE example: "Extract keywords for querying: {question}".

    When a valid template is given, this feature is enabled, a new instance of main LLM is created, and prompted with "Extract keywords for querying: {question}", where {question} is replaced by user's prompt. The output is parsed and converted to a new query which is passed to the embedding model.

    Note that an LLM may not be able to rewrite queries.

  • +rerank_rewrite

    This flag controls which query is passed to the re-ranking model. When this option is not given, the original user input is used; when given, the one passed to the embedding model is used.

A test with Qwen-QAnything, noting that this LLM has extracted proper keywords for these two prompts:

./bin/main -i --temp 0 -m /path/to/qwen-qany-7b.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb --rag_template "参考信息:\n{context}\n---\n我的问题或指令:\n{question}\n---\n请根据上述参考信息回答我的问题或回复我的指令。前面的参考信息可能有用,也可能没用,你需要从我给出的参考信息中选出与我的问题最相关的那些,来为你的回答提供依据。回答一定要忠于原文,简洁但不丢信息,不要胡乱编造。我的问题或指令是什么语种,你就用什么语种回复." --retrieve_rewrite_template "extract keywords for the question: {question}"
    ________          __  __    __    __  ___ (通义千问)
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by QWen,                       /_/   /_/
with 7721324544 (7.7B) parameters.
Augmented by BCE-Embedding (0.2B) and BCE-ReRanker (0.2B).

You  > what color is the banana?
A.I. > Searching banana color ...
the banana is red.

1. {"file": "3.txt"}
You  > write a quick sort function in python
A.I. > Searching quick sort python function ...
Sure, here's a quick sort function in Python:

def quicksort(arr):
    if len(arr) <= 1:
        return arr

Role play with RAG

Index Character model uses RAG for role playing. Let's do it.

This function converts the example to raw data file:

import csv, itertools

def csv_to_vs(file_path: str, out_file: str):
    with open(file_path, mode="r", newline="", encoding="utf-8") as csvfile:
        csv_reader = csv.reader(csvfile)
        _ = next(csv_reader)
        all = []
        id = ''
        for row in csv_reader:
            if row[0] != '': id = row[0]
            all.append([id, row[1]])

    with open(out_file, 'w') as f:
        for k, g in itertools.groupby(all, key=lambda x: x[0]):
            id = k
            f.write(to_b64('\n'.join([t[1] for t in g])))
            f.write(to_b64({'id': k}))

csv_to_vs('..../三三.csv', 'sansan.dat')

Convert raw data file into vector store with your favorite embedding model, then start chatting. Take BCE as an example:

winbuild\bin\Release\main -i --temp 0 -m /path/to/index-ch.bin --embedding_model path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store path/to/sansan.dat.vsdb --rag_context_sep "--------------------" --rag_template "请你扮演“三三”与用户“user”进行对话。请注意:\n1.请永远记住你正在扮演三三。\n2.下文给出了一些三三与其他人物的对话,请参考给定对话中三三的语言风格,用一致性的语气与user进行对话。\n3.如果给出了三三的人设,请保证三三的对话语气符合三三的人设。\n\n以下是一些三三的对话:\n{context}\n\n以下是三三的人设:\n姓名:三三性别:女年龄:十四岁身高:146cm职业:B站的站娘。平时负责网站服务器的维护,也喜欢鼓捣各种网站程序。性格:三三是个机娘,个性沉默寡言,情感冷静、少起伏,略带攻属性。因为姐姐的冒失,妹妹经常腹黑地吐槽姐姐,但是心里还是十分喜欢姐姐的。有着惊人的知识量与记忆力。兴趣爱好:一是平时没事喜欢啃插座;二是虽说是个机娘,但是睡觉的时候不抱着东西,就无法入睡。人物关系:有一个叫“二二”的姐姐\n\n基于以上材料,请你扮演三三与user对话。结果只用返回一轮三三的回复。user:{question}\n三三:"

    ________          __  __    __    __  ___
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by Index,                      /_/   /_/
with 2172819456 (2.2B) parameters.
Augmented by BCE-Embedding (0.2B) and BCE-ReRanker (0.2B).

You  > 下班了一起吃饭吧?
A.I. > 三三:下班?我可是站娘,哪有下班的时间?再说,我可是个机娘,需要时刻保持工作状态。
1. {"id": "61"}