This project demonstrates how to create an LLM-powered Android automation agent using uiautomator2 and LangChain.
-
models.py
Contains Pydantic models describing the possible UI actions (tap, input_text, press_key) and the JSON schema for the LLM’s output. -
manager.py
ProvidesUIAutomatorManager
, which encapsulates device-specific operations like dumping the UI hierarchy, tapping, inputting text, and pressing keys. -
runnables.py
Defines LangChainRunnable
classes that orchestrate actions:- HierarchyRunnable to dump and parse the UI hierarchy.
- DecideActionRunnable to query an LLM for the next action.
- PerformActionRunnable to execute that action on the device.
- LoopRunnable to repeat the cycle until completion.
-
agent.py
Defines anAndroidAgent
class which ties everything together. It instantiates the runnables and provides a singlerun(user_goal)
method that loops until the LLM decides it’s done.
-
Clone or copy this repository.
-
Install Dependencies:
poetry install
-
Connect an Android device:
- Enable USB debugging on your Android phone or use an emulator.
- Ensure you have
adb
installed and your device is recognized viaadb devices
.
-
Set your OpenAI API key:
- Either store it in an environment variable:
export OPENAI_API_KEY="your-secret-key"
- Or provide it directly when creating the agent.
- Either store it in an environment variable:
In Python:
import logging
import os
from agent.agent import AndroidAgent
GOAL = """
Open YouTube, search for podcast, and play the first video.
Find the name of the video creator, then Google it (use Android's native search, not Chrome).
Respond with a list of links from the search results.
"""
# Configure logging
logging.basicConfig(level=logging.INFO, format="[%(name)s] (%(levelname)s) %(message)s")
if __name__ == "__main__":
# Replace with your actual OpenAI API key
openai_api_key = os.getenv("OPENAI_API_KEY")
device_id = None # or "emulator-5554", etc.
agent = AndroidAgent(openai_api_key, device_id=device_id, max_loops=30)
# Close all apps
agent.ui_manager.reset()
# Execute the agent
result = agent.run(GOAL)
print("Result:", result)