Skip to content

Commit

Permalink
Added Aim callback handler (#3045)
Browse files Browse the repository at this point in the history

Co-authored-by: Logan Markewich <[email protected]>
Co-authored-by: Logan Markewich <[email protected]>
  • Loading branch information
3 people authored May 14, 2023
1 parent ee7e35d commit 2d02ef9
Show file tree
Hide file tree
Showing 6 changed files with 366 additions and 1 deletion.
150 changes: 150 additions & 0 deletions docs/examples/callbacks/AimCallback.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "fedcd46b",
"metadata": {},
"source": [
"# AimCallback Demo\n",
"\n",
"Aim is an easy-to-use & supercharged open-source AI metadata tracker it logs all your AI metadata (experiments, prompts, etc) enables a UI to compare & observe them and SDK to query them programmatically. For more please see the [Github page](https://github.com/aimhubio/aim).\n",
"\n",
"In this demo, we show the capabilities of Aim for logging events while running queries within LlamaIndex. We use the AimCallback to store the outputs and showing how to explore them using Aim Text Explorer.\n",
"\n",
"\n",
"**NOTE**: This is a beta feature. The usage within different classes and the API interface for the CallbackManager and AimCallback may change!"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3e0c9e60",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e94187d",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.callbacks import CallbackManager, AimCallback\n",
"from llama_index import GPTListIndex, ServiceContext, SimpleDirectoryReader"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "17d1763e",
"metadata": {},
"source": [
"Let's read the documents using `SimpleDirectoryReader` from 'examples/data/paul_graham'."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "02e1e606",
"metadata": {},
"outputs": [],
"source": [
"docs = SimpleDirectoryReader(\"../../data/paul_graham\").load_data()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ee34d08b",
"metadata": {},
"source": [
"Now lets initialize an AimCallback instance, and add it to the list of callback managers. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c667d70b",
"metadata": {},
"outputs": [],
"source": [
"aim_callback = AimCallback(repo=\"./\")\n",
"callback_manager = CallbackManager([aim_callback])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "25851e27",
"metadata": {},
"source": [
"In this snippet, we initialize a service context by providing the callback manager.\n",
"Next, we create an instance of `GPTListIndex` class, by passing in the document reader and the service context. After which we create a query engine which we will use to run queries on the index and retrieve relevant results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "32fac47b",
"metadata": {},
"outputs": [],
"source": [
"service_context = ServiceContext.from_defaults(callback_manager=callback_manager)\n",
"index = GPTListIndex.from_documents(docs, service_context=service_context)\n",
"query_engine = index.as_query_engine()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "44f96768",
"metadata": {},
"source": [
"Finally let's ask a question to the LM based on our provided document"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11d4840b",
"metadata": {},
"outputs": [],
"source": [
"response = query_engine.query(\"What did the author do growing up?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4e69b186",
"metadata": {},
"source": [
"The callback manager will log the `CBEventType.LLM` type of events as an Aim.Text, and we can explore the LM given prompt and the output in the Text Explorer. By first doing `aim up` and navigating by the given url."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File renamed without changes.
36 changes: 36 additions & 0 deletions docs/how_to/callbacks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
📞 Callbacks
==============================

LlamaIndex provides callbacks to help debug, track, and trace the inner workings of the library.
Using the callback manager, as many callbacks as needed can be added.

In addition to logging data related to events, you can also track the duration and number of occurances
of each event.

While each callback may not leverage each event type, the following events are available to be tracked:

- CHUNKING -> Logs for the before and after of text splitting.
- NODE_PARSING -> Logs for the documents and the nodes that they are parsed into.
- EMBEDDING -> Logs for the number of texts embedded.
- LLM -> Logs for the template and response of LLM calls.
- QUERY -> Keeps track of the start and end of each query.
- RETRIEVE -> Logs for the nodes retrieved for a query.
- SYNTHESIZE -> Logs for the result for synthesize calls.
- TREE -> Logs for the summary and level of summaries generated.

You can implement your own callback to track these events, or use an existing callback.

Complete examples can be found in the notebooks below:

- [LlamaDebugHandler](../examples/callbacks/LlamaDebugHandler.ipynb)
- [AimCallback](../examples/callbacks/AimCallback.ipynb)

And the API reference can be found [here](../../reference/callbacks.rst).

.. toctree::
:maxdepth: 1
:caption: Callbacks

../examples/callbacks/LlamaDebugHandler.ipynb
../examples/callbacks/AimCallback.ipynb
../../reference/callbacks.rst
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ That's where the **LlamaIndex** comes in. LlamaIndex is a simple, flexible inter
how_to/output_parsing.md
how_to/evaluation/evaluation.md
how_to/integrations.rst
how_to/callbacks.rst
how_to/storage.rst


Expand Down
9 changes: 8 additions & 1 deletion llama_index/callbacks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
from .base import CallbackManager
from .llama_debug import LlamaDebugHandler
from .aim import AimCallback
from .schema import CBEvent, CBEventType

__all__ = ["CallbackManager", "CBEvent", "CBEventType", "LlamaDebugHandler"]
__all__ = [
"CallbackManager",
"CBEvent",
"CBEventType",
"LlamaDebugHandler",
"AimCallback",
]
171 changes: 171 additions & 0 deletions llama_index/callbacks/aim.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
import logging
from typing import Any, Dict, List, Optional

try:
from aim import Run, Text
except ModuleNotFoundError:
Run, Text = None, None

from llama_index.callbacks.base import BaseCallbackHandler
from llama_index.callbacks.schema import CBEventType

logger = logging.getLogger(__name__)
logger.setLevel(logging.WARNING)


class AimCallback(BaseCallbackHandler):
"""
AimCallback callback class.
Args:
repo (:obj:`str`, optional):
Aim repository path or Repo object to which Run object is bound.
If skipped, default Repo is used.
experiment_name (:obj:`str`, optional):
Sets Run's `experiment` property. 'default' if not specified.
Can be used later to query runs/sequences.
system_tracking_interval (:obj:`int`, optional):
Sets the tracking interval in seconds for system usage
metrics (CPU, Memory, etc.). Set to `None` to disable
system metrics tracking.
log_system_params (:obj:`bool`, optional):
Enable/Disable logging of system params such as installed packages,
git info, environment variables, etc.
capture_terminal_logs (:obj:`bool`, optional):
Enable/Disable terminal stdout logging.
event_starts_to_ignore (Optional[List[CBEventType]]):
list of event types to ignore when tracking event starts.
event_ends_to_ignore (Optional[List[CBEventType]]):
list of event types to ignore when tracking event ends.
"""

def __init__(
self,
repo: Optional[str] = None,
experiment_name: Optional[str] = None,
system_tracking_interval: Optional[int] = 1,
log_system_params: Optional[bool] = True,
capture_terminal_logs: Optional[bool] = True,
event_starts_to_ignore: Optional[List[CBEventType]] = None,
event_ends_to_ignore: Optional[List[CBEventType]] = None,
run_params: Optional[Dict[str, Any]] = None,
) -> None:
if Run is None:
raise ModuleNotFoundError(
"Please install aim to use the AimCallback: 'pip install aim'"
)

event_starts_to_ignore = (
event_starts_to_ignore if event_starts_to_ignore else []
)
event_ends_to_ignore = event_ends_to_ignore if event_ends_to_ignore else []
super().__init__(
event_starts_to_ignore=event_starts_to_ignore,
event_ends_to_ignore=event_ends_to_ignore,
)

self.repo = repo
self.experiment_name = experiment_name
self.system_tracking_interval = system_tracking_interval
self.log_system_params = log_system_params
self.capture_terminal_logs = capture_terminal_logs
self._run: Optional[Any] = None
self._run_hash = None

self._llm_response_step = 0

self.setup(run_params)

def on_event_start(
self,
event_type: CBEventType,
payload: Optional[Dict[str, Any]] = None,
event_id: str = "",
**kwargs: Any,
) -> str:
"""
Args:
event_type (CBEventType): event type to store.
payload (Optional[Dict[str, Any]]): payload to store.
event_id (str): event id to store.
"""
return ""

def on_event_end(
self,
event_type: CBEventType,
payload: Optional[Dict[str, Any]] = None,
event_id: str = "",
**kwargs: Any,
) -> None:
"""
Args:
event_type (CBEventType): event type to store.
payload (Optional[Dict[str, Any]]): payload to store.
event_id (str): event id to store.
"""
if not self._run:
raise ValueError("AimCallback failed to init properly.")

if event_type is CBEventType.LLM and payload:
self._run.track(
Text(payload["formatted_prompt"]),
name="prompt",
step=self._llm_response_step,
context={"event_id": event_id},
)

self._run.track(
Text(payload["response"]),
name="response",
step=self._llm_response_step,
context={"event_id": event_id},
)

self._llm_response_step += 1
elif event_type is CBEventType.CHUNKING and payload:
for chunk_id, chunk in enumerate(payload["chunks"]):
self._run.track(
Text(chunk),
name="chunk",
step=self._llm_response_step,
context={"chunk_id": chunk_id, "event_id": event_id},
)

@property
def experiment(self) -> Run:
if not self._run:
self.setup()
return self._run

def setup(self, args: Optional[Dict[str, Any]] = None) -> None:
if not self._run:
if self._run_hash:
self._run = Run(
self._run_hash,
repo=self.repo,
system_tracking_interval=self.system_tracking_interval,
log_system_params=self.log_system_params,
capture_terminal_logs=self.capture_terminal_logs,
)
else:
self._run = Run(
repo=self.repo,
experiment=self.experiment_name,
system_tracking_interval=self.system_tracking_interval,
log_system_params=self.log_system_params,
capture_terminal_logs=self.capture_terminal_logs,
)
self._run_hash = self._run.hash

# Log config parameters
if args:
try:
for key in args:
self._run.set(key, args[key], strict=False)
except Exception as e:
logger.warning(f"Aim could not log config parameters -> {e}")

def __del__(self) -> None:
if self._run and self._run.active:
self._run.close()

0 comments on commit 2d02ef9

Please sign in to comment.