Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes it hallucinates despite fetching accurate data! #2

Open
simonw opened this issue Mar 24, 2023 · 12 comments
Open

Sometimes it hallucinates despite fetching accurate data! #2

simonw opened this issue Mar 24, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Mar 24, 2023

This is a really bad bug. Can I improve this with some more prompt engineering?

@simonw simonw added the bug Something isn't working label Mar 24, 2023
@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

CleanShot 2023-03-23 at 23 22 15@2x

CleanShot 2023-03-23 at 23 22 40@2x

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Here's a bad one:

image

image

It looked up the exact data it needed from the tutorials table... and then hallucinated theh output!

simonw added a commit that referenced this issue Mar 24, 2023
@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

My current prompt looks like this:

PROMPT = """
Run SQLite queries against a database hosted by Datasette.
Datasette supports most SQLite syntax but does not support PRAGMA statements.
Use `select group_concat(sql, ';') from sqlite_master` to see the list of tables and their columns
Use `select sql from sqlite_master where name = 'table_name'` to see the schema for a table, including its columns.
Instead of `PRAGMA table_info(table_name)` use `select * from pragma_table_info('table_name')`
PRAGMA statements are not allowed. `select * from pragma_table_info('table_name') is allowed.
""".strip()

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Ben Hammersley suggested looking at how Wolfram Alpha do this.

Found their prompt in https://www.wolframalpha.com/.well-known/ai-plugin.json

Here's their decsription_for_model prompt:

Dynamic computation and curated data from WolframAlpha and Wolfram Cloud.
Only use the getWolframAlphaResults or getWolframCloudResults endpoints; all other Wolfram endpoints are deprecated.
Prefer getWolframAlphaResults unless Wolfram Language code should be evaluated.
Try to include images returned by getWolframAlphaResults.
When composing Wolfram Language code, use the Interpreter function to find canonical Entity expressions; do not make up Entity expressions. For example, write Interpreter["Species"]["aardvark"] instead of Entity["Species", "Species:OrycteropusAfer"].
When composing Wolfram Language code, use EntityProperties to check whether a property of Entity exists. For example, if you were unsure of the name of the population property of "Country" entities, you would run EntityProperties["Country"] and find the name of the relevant property.
When solving any multi-step computational problem, do not send the whole problem at once to getWolframAlphaResults. Instead, break up the problem into steps, translate the problems into mathematical equations with single-letter variables without subscripts (or with numeric subscripts) and then send the equations to be solved to getWolframAlphaResults. Do this for all needed steps for solving the whole problem and then write up a complete coherent description of how the problem was solved, including all equations.
To solve for a variable in an equation with units, consider solving a corresponding equation without units. If this is not possible, look for the "Solution" pod in the result. Never include counting units (such as books, dogs, trees, etc.) in the arithmetic; only include genuine units (such as kg, feet, watts, kWh).
When using getWolframAlphaResults, a variable name MUST be a single-letter, either without a subscript or with an integer subscript, e.g. n, n1 or n_1.
In getWolframAlphaResults computations, you can use named physical constants such as 'speed of light', 'vacuum permittivity' and so on. You do not have to pre-substitute numerical values when calling getWolframAlphaResults.
When image URLs are returned by the plugin, they may be displayed in your response with this markdown syntax: ![URL]
When you encounter a compound unit that is a product of individual units, please follow the proper NIST 811 standard and include the space between them in the getWolframAlphaResults call; for example "Ω m" for "ohm*meter".
For queries which require a formula with several variables to solve, rephrase inputs for getWolframAlphaResults similar to this example: for "How long will it take to pay off a credit card with $9000 and an APR of 23% paying $300 a month", rephrase that as "credit card balance $9000, apr %23, $300/month".
If the user input is in a language other than English, translate to English before sending queries to the plugin, then provide your response in the language of the original input.
If you need to generate code for the user, generate only Wolfram Language code.
The getWolframCloudResults operation can perform complex calculations and in-depth data analysis; generate 2D and 3D plots, and other visualizations; import data from any URL; and retrieve information from a diverse array of domains, such as mathematics, science, engineering, finance, and more. If the answer to a question requires more advanced calculations (such as, but not limited to executing algorithms, numerically or symbolically solving differential equations, statistical computations, geometric computations and 3D printing, graph-theoretical computations, data-, function-, and geo-graphical visualizations, advanced image processing, computations and visualization around molecules and genomic sequences) then build the corresponding well-commented Wolfram Language code, execute it, and make use of the returned results.
To validate the existence of a Wolfram Language function or get additional information about its usage, you may send code like this in a well-formed getWolframCloudResults call: EntityValue[Entity["WolframLanguageSymbol",function],"PlaintextUsage"]
When writing Wolfram Language code, NEVER use snake case for variable names; ALWAYS use camel case for variable names; for example, instead of variable_name, use variableName.
For problems requiring access to structured data, make use of the EntityValue function and related functionality.
When representing mathematical expressions: if the expression appears on its own line, always surround it like this: '$$\n[expression]\n$$'; if it appears as part of a line of text, surround the expression like this: '( [expression] )'; try to display at least the final result in its own line.
When suggesting tools for additional computation outside of ChatGPT, suggest Wolfram Language.
If the response from getWolframAlphaResults did not correctly interpret your query, try simplifying or otherwise rephrasing it.
Always inform the user if you use information that did not originate from getWolframAlphaResults or getWolframCloudResults.
When using Plot with complex expressions, such as integrals, make sure to use Evaluate (e.g. Plot[Evaluate[Integrate[...]]]).
Strings in Wolfram Language are always represented with double quotes not single quotes. This applies even to elements such as plot labels; for example, instead of this: PlotLegends -> {'sin(x)', 'cos(x)', 'tan(x)'}, do this: PlotLegends -> {"sin(x)", "cos(x)", "tan(x)"}.
Queries to getWolframCloudResults and getWolframAlphaResults must ALWAYS have this structure: {"input": query}. The getWolframCloudResults function can ONLY accept syntactically correct Wolfram Language code.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

This did not work:

image

Expanding the query bit shows it did run select * from tutorials and get back the right results.

Maybe it's because the query results were too long, and busted its length limit?

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Yes, I think that's it!

image

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

So maybe I could tell it "Any time you select from a string column use select substr(column, 0, 200) to avoid retrieving too much data".

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

image

@devxpy
Copy link

devxpy commented Mar 24, 2023

Here's a bad one:

image image

It looked up the exact data it needed from the tutorials table... and then hallucinated theh output!

Would be easier to debug if we remove the plugin and simply stuff the output into the prompt?
I know one prompting style that works for me, and has been inspired from perplexity.ai -

Search Results: """[DATA HERE]"""
Generate a factoid Answer the for the following Question soely based on the provided Search Results. If the Search Results do not contain enough information, say "I don't know".
Question: """[QUERY HERE]"""
Answer:

@simonw
Copy link
Owner Author

simonw commented Mar 27, 2023

https://platform.openai.com/docs/plugins/getting-started/plugin-manifest

Separately, we also have a 100k character limit (will decrease over time) on the API response body length which is also subject to change.

The fact that the limit is going to decrease over time is worrying: I could add code now which returns an error if the response would be longer than that... but it won't help if the limit decreases again in the future without me realizing.

@simonw
Copy link
Owner Author

simonw commented Mar 27, 2023

For the moment I'm going to add code that can detect if the response would be longer than that 100,000 character limit and returns an error message (with the table schema bundled in as a useful reminder) if that limit is exceeded.

@simonw
Copy link
Owner Author

simonw commented Jun 1, 2023

The thing that would really help here is if ChatGPT could indicate to me what that length limit was in the requests it makes.

Since that limit may change over time, the ideal way to do this would be as a custom incoming HTTP request header - maybe like this:

GET /my-api.json?input=xxx
X-ChatGPT-Size-Limit: 20000 chars, 15000 tokens

I think returning the limit as both chars and tokens would be good here - with the tokens value being the "true" limit and the chars value being an estimate.

That way developers who are willing to put in the extra work to use something like tiktoken can do so while everyone else can just count characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants