Added Aim callback handler (#3045)

Co-authored-by: Logan Markewich <[email protected]> Co-authored-by: Logan Markewich <[email protected]>
run-llama · May 14, 2023 · 2d02ef9 · 2d02ef9
1 parent ee7e35d
commit 2d02ef9
Show file tree

Hide file tree

Showing 6 changed files with 366 additions and 1 deletion.
diff --git a/docs/examples/callbacks/AimCallback.ipynb b/docs/examples/callbacks/AimCallback.ipynb
@@ -0,0 +1,150 @@
+{
+    "cells": [
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "fedcd46b",
+            "metadata": {},
+            "source": [
+                "# AimCallback Demo\n",
+                "\n",
+                "Aim is an easy-to-use & supercharged open-source AI metadata tracker it logs all your AI metadata (experiments, prompts, etc) enables a UI to compare & observe them and SDK to query them programmatically. For more please see the [Github page](https://github.com/aimhubio/aim).\n",
+                "\n",
+                "In this demo, we show the capabilities of Aim for logging events while running queries within LlamaIndex. We use the AimCallback to store the outputs and showing how to explore them using Aim Text Explorer.\n",
+                "\n",
+                "\n",
+                "**NOTE**: This is a beta feature. The usage within different classes and the API interface for the CallbackManager and AimCallback may change!"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "3e0c9e60",
+            "metadata": {},
+            "source": [
+                "## Setup"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "id": "8e94187d",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "from llama_index.callbacks import CallbackManager, AimCallback\n",
+                "from llama_index import GPTListIndex, ServiceContext, SimpleDirectoryReader"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "17d1763e",
+            "metadata": {},
+            "source": [
+                "Let's read the documents using `SimpleDirectoryReader` from 'examples/data/paul_graham'."
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 4,
+            "id": "02e1e606",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "docs = SimpleDirectoryReader(\"../../data/paul_graham\").load_data()"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "ee34d08b",
+            "metadata": {},
+            "source": [
+                "Now lets initialize an AimCallback instance, and add it to the list of callback managers. "
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 6,
+            "id": "c667d70b",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "aim_callback = AimCallback(repo=\"./\")\n",
+                "callback_manager = CallbackManager([aim_callback])"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "25851e27",
+            "metadata": {},
+            "source": [
+                "In this snippet, we initialize a service context by providing the callback manager.\n",
+                "Next, we create an instance of `GPTListIndex` class, by passing in the document reader and the service context. After which we create a query engine which we will use to run queries on the index and retrieve relevant results."
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "id": "32fac47b",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "service_context = ServiceContext.from_defaults(callback_manager=callback_manager)\n",
+                "index = GPTListIndex.from_documents(docs, service_context=service_context)\n",
+                "query_engine = index.as_query_engine()"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "44f96768",
+            "metadata": {},
+            "source": [
+                "Finally let's ask a question to the LM based on our provided document"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "id": "11d4840b",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "response = query_engine.query(\"What did the author do growing up?\")"
+            ]
+        },
+        {
+            "attachments": {},
+            "cell_type": "markdown",
+            "id": "4e69b186",
+            "metadata": {},
+            "source": [
+                "The callback manager will log the `CBEventType.LLM` type of events as an Aim.Text, and we can explore the LM given prompt and the output in the Text Explorer. By first doing `aim up` and navigating by the given url."
+            ]
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3 (ipykernel)",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.7.11"
+        }
+    },
+    "nbformat": 4,
+    "nbformat_minor": 5
+}
diff --git a/examples/callbacks/LlamaDebugHandler.ipynb → ...xamples/callbacks/LlamaDebugHandler.ipynb b/examples/callbacks/LlamaDebugHandler.ipynb → ...xamples/callbacks/LlamaDebugHandler.ipynb
diff --git a/docs/how_to/callbacks.rst b/docs/how_to/callbacks.rst
@@ -0,0 +1,36 @@
+📞 Callbacks
+==============================
+
+LlamaIndex provides callbacks to help debug, track, and trace the inner workings of the library. 
+Using the callback manager, as many callbacks as needed can be added.
+
+In addition to logging data related to events, you can also track the duration and number of occurances
+of each event.
+
+While each callback may not leverage each event type, the following events are available to be tracked:
+
+- CHUNKING -> Logs for the before and after of text splitting.
+- NODE_PARSING -> Logs for the documents and the nodes that they are parsed into.
+- EMBEDDING -> Logs for the number of texts embedded.
+- LLM -> Logs for the template and response of LLM calls.
+- QUERY -> Keeps track of the start and end of each query.
+- RETRIEVE -> Logs for the nodes retrieved for a query.
+- SYNTHESIZE -> Logs for the result for synthesize calls.
+- TREE -> Logs for the summary and level of summaries generated.
+
+You can implement your own callback to track these events, or use an existing callback.
+
+Complete examples can be found in the notebooks below:
+
+- [LlamaDebugHandler](../examples/callbacks/LlamaDebugHandler.ipynb)
+- [AimCallback](../examples/callbacks/AimCallback.ipynb)
+
+And the API reference can be found [here](../../reference/callbacks.rst).
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Callbacks
+
+   ../examples/callbacks/LlamaDebugHandler.ipynb
+   ../examples/callbacks/AimCallback.ipynb
+   ../../reference/callbacks.rst
diff --git a/docs/index.rst b/docs/index.rst
@@ -90,6 +90,7 @@ That's where the **LlamaIndex** comes in. LlamaIndex is a simple, flexible inter
    how_to/output_parsing.md
    how_to/evaluation/evaluation.md
    how_to/integrations.rst
+   how_to/callbacks.rst
    how_to/storage.rst
 
 

diff --git a/llama_index/callbacks/__init__.py b/llama_index/callbacks/__init__.py
@@ -1,5 +1,12 @@
 from .base import CallbackManager
 from .llama_debug import LlamaDebugHandler
+from .aim import AimCallback
 from .schema import CBEvent, CBEventType
 
-__all__ = ["CallbackManager", "CBEvent", "CBEventType", "LlamaDebugHandler"]
+__all__ = [
+    "CallbackManager",
+    "CBEvent",
+    "CBEventType",
+    "LlamaDebugHandler",
+    "AimCallback",
+]
diff --git a/llama_index/callbacks/aim.py b/llama_index/callbacks/aim.py
@@ -0,0 +1,171 @@
+import logging
+from typing import Any, Dict, List, Optional
+
+try:
+    from aim import Run, Text
+except ModuleNotFoundError:
+    Run, Text = None, None
+
+from llama_index.callbacks.base import BaseCallbackHandler
+from llama_index.callbacks.schema import CBEventType
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.WARNING)
+
+
+class AimCallback(BaseCallbackHandler):
+    """
+    AimCallback callback class.
+
+    Args:
+        repo (:obj:`str`, optional):
+            Aim repository path or Repo object to which Run object is bound.
+            If skipped, default Repo is used.
+        experiment_name (:obj:`str`, optional):
+            Sets Run's `experiment` property. 'default' if not specified.
+            Can be used later to query runs/sequences.
+        system_tracking_interval (:obj:`int`, optional):
+            Sets the tracking interval in seconds for system usage
+            metrics (CPU, Memory, etc.). Set to `None` to disable
+            system metrics tracking.
+        log_system_params (:obj:`bool`, optional):
+            Enable/Disable logging of system params such as installed packages,
+            git info, environment variables, etc.
+        capture_terminal_logs (:obj:`bool`, optional):
+            Enable/Disable terminal stdout logging.
+        event_starts_to_ignore (Optional[List[CBEventType]]):
+            list of event types to ignore when tracking event starts.
+        event_ends_to_ignore (Optional[List[CBEventType]]):
+            list of event types to ignore when tracking event ends.
+    """
+
+    def __init__(
+        self,
+        repo: Optional[str] = None,
+        experiment_name: Optional[str] = None,
+        system_tracking_interval: Optional[int] = 1,
+        log_system_params: Optional[bool] = True,
+        capture_terminal_logs: Optional[bool] = True,
+        event_starts_to_ignore: Optional[List[CBEventType]] = None,
+        event_ends_to_ignore: Optional[List[CBEventType]] = None,
+        run_params: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        if Run is None:
+            raise ModuleNotFoundError(
+                "Please install aim to use the AimCallback: 'pip install aim'"
+            )
+
+        event_starts_to_ignore = (
+            event_starts_to_ignore if event_starts_to_ignore else []
+        )
+        event_ends_to_ignore = event_ends_to_ignore if event_ends_to_ignore else []
+        super().__init__(
+            event_starts_to_ignore=event_starts_to_ignore,
+            event_ends_to_ignore=event_ends_to_ignore,
+        )
+
+        self.repo = repo
+        self.experiment_name = experiment_name
+        self.system_tracking_interval = system_tracking_interval
+        self.log_system_params = log_system_params
+        self.capture_terminal_logs = capture_terminal_logs
+        self._run: Optional[Any] = None
+        self._run_hash = None
+
+        self._llm_response_step = 0
+
+        self.setup(run_params)
+
+    def on_event_start(
+        self,
+        event_type: CBEventType,
+        payload: Optional[Dict[str, Any]] = None,
+        event_id: str = "",
+        **kwargs: Any,
+    ) -> str:
+        """
+        Args:
+            event_type (CBEventType): event type to store.
+            payload (Optional[Dict[str, Any]]): payload to store.
+            event_id (str): event id to store.
+        """
+        return ""
+
+    def on_event_end(
+        self,
+        event_type: CBEventType,
+        payload: Optional[Dict[str, Any]] = None,
+        event_id: str = "",
+        **kwargs: Any,
+    ) -> None:
+        """
+        Args:
+            event_type (CBEventType): event type to store.
+            payload (Optional[Dict[str, Any]]): payload to store.
+            event_id (str): event id to store.
+        """
+        if not self._run:
+            raise ValueError("AimCallback failed to init properly.")
+
+        if event_type is CBEventType.LLM and payload:
+            self._run.track(
+                Text(payload["formatted_prompt"]),
+                name="prompt",
+                step=self._llm_response_step,
+                context={"event_id": event_id},
+            )
+
+            self._run.track(
+                Text(payload["response"]),
+                name="response",
+                step=self._llm_response_step,
+                context={"event_id": event_id},
+            )
+
+            self._llm_response_step += 1
+        elif event_type is CBEventType.CHUNKING and payload:
+            for chunk_id, chunk in enumerate(payload["chunks"]):
+                self._run.track(
+                    Text(chunk),
+                    name="chunk",
+                    step=self._llm_response_step,
+                    context={"chunk_id": chunk_id, "event_id": event_id},
+                )
+
+    @property
+    def experiment(self) -> Run:
+        if not self._run:
+            self.setup()
+        return self._run
+
+    def setup(self, args: Optional[Dict[str, Any]] = None) -> None:
+        if not self._run:
+            if self._run_hash:
+                self._run = Run(
+                    self._run_hash,
+                    repo=self.repo,
+                    system_tracking_interval=self.system_tracking_interval,
+                    log_system_params=self.log_system_params,
+                    capture_terminal_logs=self.capture_terminal_logs,
+                )
+            else:
+                self._run = Run(
+                    repo=self.repo,
+                    experiment=self.experiment_name,
+                    system_tracking_interval=self.system_tracking_interval,
+                    log_system_params=self.log_system_params,
+                    capture_terminal_logs=self.capture_terminal_logs,
+                )
+                self._run_hash = self._run.hash
+
+        # Log config parameters
+        if args:
+            try:
+                for key in args:
+                    self._run.set(key, args[key], strict=False)
+            except Exception as e:
+                logger.warning(f"Aim could not log config parameters -> {e}")
+
+    def __del__(self) -> None:
+        if self._run and self._run.active:
+            self._run.close()