Poor robustness in handling `<links>` output. #534

OverrideTuring · 2024-12-26T15:06:49Z

Describe the bug

I frequently encounter the error message: "Error at generating documents from links: Invalid URL" in the console. Although it doesn't happen every time, it occurs frequently enough. After debugging the relevant source code, I traced the issue to how the system handles LLM outputs when rephrasing questions.

To Reproduce

I called the search API to do my task. My LLM is fine-tuned Llama-70B. And my prompt is:

Briefly introduce the publication named Pattern Recognition within one paragraph, indicating whether it is a journal or conference, its organizing body or publisher, its primary focus or fields of research, and its commonly used abbreviation (if any).

Steps to reproduce the issue:

Insert debug code in search.metaSearchAgent.ts, as shown here:
Fill in the above prompt in the request body.
Send a POST request to "http://localhost:3001/api/search" (i.e., the server API).
Check the console for errors. Occasionally, you will see something like this:

Expected behavior

The system should process the query correctly, send it to SearXNG, and return the desired result. Instead, the issue occurs because the output parser mistakenly interprets explanatory text, such as "no <links> block included," as the beginning of a <links> tag. This leads to invalid parsing and, ultimately, failure in URL validation.

Additional context

Suggestions for Improvements:

Replace the logical operator && in the condition (startKeyIndex === -1 && endKeyIndex === -1) with || to ensure the closedness of tags. Alternatively, you could implement post-checks for the validity of the generated <links> and their content.
I also recommend modifying some details of the webSearchRetrieverPrompt:

<question> of the first example should align with others: "What is the Capital of france?", rather than a simple "Capital of France".
The format of the second example should align with others: add "Follow up question: " before "Hi, how are you?".
Add a question mark "?" after every real question, for example: "What is Docker?" instead of "What is Docker" in the third example.

In search.metaSearchAgent.ts, I suggest switch the order of these two tags:

            <query>
            ${question}
            </query>

            <text>
            ${doc.pageContent}
            </text>

Put the <text> before <query> as other examples do.

The text was updated successfully, but these errors were encountered:

ItzCrazyKns · 2024-12-26T15:21:01Z

Hi, I've been working on the prompt and from what I am up to now. Perplexica is able to work correctly with a 3B model as well. This prompt will be released pretty soon after some final touches. Stay tuned for it!

OverrideTuring · 2024-12-26T15:47:43Z

Hi, I've been working on the prompt and from what I am up to now. Perplexica is able to work correctly with a 3B model as well. This prompt will be released pretty soon after some final touches. Stay tuned for it!

Try this complete prompt:

Briefly introduce the publication named Pattern Recognition within one paragraph, indicating whether it is a journal or conference, its organizing body or publisher, its primary focus or fields of research, and its commonly used abbreviation (if any). Return the answer in the following JSON format:
```json
{"introduction": "<concise description of the publication>"}
```

For example, the introduction of IEEE Conference on Computer Vision and Pattern Recognition should be:
```json
{"introduction": "A premier annual conference in the field of computer vision and pattern recognition. It is organized by the IEEE and the Computer Vision Foundation (CVF). It is widely known as CVPR."}
```

Don't add any reference information (e.g. [1], [2], etc.) in the JSON-format answer. You can do it after the answer.

I tell the LLM to return a JSON-format answer. This will cause the problem more often. However, my fine-tuned LLM counts, too.

thefux · 2025-01-01T20:44:48Z

Hi there, any news on this, I'm facing the same issue as in here #533
and quite often

OverrideTuring added the bug Something isn't working label Dec 26, 2024

ItzCrazyKns mentioned this issue Dec 26, 2024

Connection Issues in Recent Docker Images #533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor robustness in handling `<links>` output. #534

Poor robustness in handling `<links>` output. #534

OverrideTuring commented Dec 26, 2024

ItzCrazyKns commented Dec 26, 2024

OverrideTuring commented Dec 26, 2024

thefux commented Jan 1, 2025

Poor robustness in handling <links> output. #534

Poor robustness in handling <links> output. #534

Comments

OverrideTuring commented Dec 26, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

ItzCrazyKns commented Dec 26, 2024

OverrideTuring commented Dec 26, 2024

thefux commented Jan 1, 2025

Poor robustness in handling `<links>` output. #534

Poor robustness in handling `<links>` output. #534