Skip to content

Commit

Permalink
additions to eval section
Browse files Browse the repository at this point in the history
  • Loading branch information
leon-rgb committed Jul 14, 2024
1 parent 15e5f8d commit b06e952
Showing 1 changed file with 19 additions and 1 deletion.
20 changes: 19 additions & 1 deletion content/evaluation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,10 @@ \section{Model Performance}
\label{sec:modelperform}


\subsection{Hardware}
\subsection{Hardware and Language Models used}
The hardware setup was the same as described in \cref{subsec:modelcust} since it is very fast for the model sizes we wanted to test as also explained in the same section.
For measuring the model performance there was no need to have a setup of smart home devices since when the outputed \gls{json} of the model is correct the correct action will be triggered in the underlying system.
The language models we selected for evaluation are also described in \cref{subsec:modelcust}.

\subsection{Evaluation Dataset}
To assess the performance and capabilities of our smart home chatbot, we developed a comprehensive evaluation dataset with a total of 80 entries. This dataset is designed to simulate realistic user interactions and test the chatbot's ability to understand context, control devices, and provide informative responses.
Expand All @@ -127,6 +128,23 @@ \subsection{Evaluation Dataset}
\end{itemize}
\end{enumerate}

Based on the chat history a request can be build with a python script leveraging langchain to easily access Ollama since it is supported by langchain.
That way we do not have to construct API calls ourself only parse the message history into a langchain function call that does everything in the background.
We can just create an ChatOllama object like the following:
\iffalse
# Initialize language model
llm = ChatOllama(
base_url="http://127.0.0.1:5000",
model="sh-llama3-instruct",
keep_alive=-1
)

and therefore easily switch between all models that we want to test out. The keep alive option set to minus one means that the model is loaded undefinetly into memory.
After parsing the messages of one csv entry we can just use the following code to create the code and invoke the language model:

prompt = ChatPromptTemplate.from_messages(messages)
result = invoke_language_model(llm, prompt)
\fi
To create a diverse and representative dataset, we used 10 example device lists as the basis for our scenarios. These lists were carefully crafted to cover various smart home setups:

\begin{itemize}
Expand Down

0 comments on commit b06e952

Please sign in to comment.