-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor content extraction from certain domains #4
Comments
sorry, may I get more context on:
So this warning is just if you request via the REST API but don't set what output format you want. Since we are using ollama-python instead of cURL here, it will not affect the return message. If you say the finial result is not good as for "effectiveness", I suspect is that your context length is too small. Because plus planning at 5 iterations per 4 searches per query, so 80 searches, if 50% is useful, the finial writing instruction can easily go upward 100K tokens, yes we can do some pre-summarization (and we already did on per result bases) to reduce token number, but if we do on per iteration-level, small models will hallucinate hard without full context. That is also why I cannot personally test that at my set up, I don't have a GPU that can fit all these, so for me since I am using mistral-small with max ctx at 32K on my RX7800XT, I only use 2 iteration at 3 searches without planning, the result I get seems decent, the citation is proper without much hallucination. |
Well, maybe this is also a good reminder for me to add RAG support:) I will do after I add tool-calling support |
DEFAULT_MODEL = "mistral-small:22b-instruct-2409-q5_K_M" async def call_ollama_async(session, messages, model=DEFAULT_MODEL, max_tokens=20000) |
Looks like it is during parsing? I added some verbose output and tested myself, parsing result seems reasonable to me, of cause, result may degrade on some ads-packed websites, but it shouldn't loss much context. Also what you show me is not context length, max_perdict is just max output, that'swhy I set 1.25, becauseit should not be much longer than the input html source code. For ollama, either you set it with num_ctx when requesting or you set a default one in Modelfile when importing, otherwise it will default at 2k, if you have never changed num_ctx, that might be the problem. I will also add an option to change that per model bases over the weekend. |
I set num_ctx to 25000. |
I see, then I guess RAG on gathered context will be necessary. I will think a bit about how to implement that. But that would result in a product more like perplexica(also open source on github) as it use more traditional way, ig maybe local model is still not good at agentic and long context.
Btw what fields are you mainly searching for? I only tested some research topic in my field and it does find relevant papers I expected with proper summarizing into a final report. That's why I think I may not see the full picture of the problem that you are talking about.
07.02.2025 08:51:27 wwjCMP ***@***.***>:
…> Looks like it is during parsing? I added some verbose output and tested myself, parsing result seems reasonable to me, of cause, result may degrade on some ads-packed websites, but it shouldn't loss much context.
>
> Also what you show me is not context length, max_perdict is just max output, that'swhy I set 1.25, becauseit should not be much longer than the input html source code.
>
> For ollama, either you set it with num_ctx when requesting or you set a default one in Modelfile when importing, otherwise it will default at 2k, if you have never changed num_ctx, that might be the problem. I will also add an option to change that per model bases over the weekend.
I set num_ctx to 25000.
--
Reply to this email directly or view it on GitHub:
#4 (comment)
You are receiving this because you were assigned.
Message ID: ***@***.***>
|
I am trying to do a rewrite with DSPy for structured output and RAG for reducing token usage and improving data flow efficiency. I think that will improve current situation and leverage ollama's ability. Hope to have things delivered before next weekend. |
@wwjCMP I think I find the root cause, currently the code will try to sanitize the html code before giving reader-lm to parse(otherwise it will take even longer to get useful info), current implementation will result in context loss if the website embed their context in iframe or in the same div as advertisements, or have tricky style sheet. I think the ultimate solution is to have a agent, based on website situation, determine a parse method, ranging from screenshot the website and ocr(if it tries to hide context in html code) or jina html2md or newpaper3k or bs4, this may take some extra time to implement. There can be a quick fix if you want to try, use BROWSE_LITE=true and remove all the html tags using regex, but I will try to impletement DSPy and the dynamic parsing method described above before this last resort. |
Ollama log
level=WARN source=server.go:762 msg="format is neither a schema or "json"" format=""""
Will this affect the final result? Because in my actual usage, the response effectiveness is very poor.
The text was updated successfully, but these errors were encountered: