Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement --single_prompt mode to use dir-assistant as part of the workflow #20

Merged

Conversation

iSevenDays
Copy link
Collaborator

Implement --single_prompt mode to use dir-assistant as part of the workflow.

PRINT_CGRAG=0 OPENAI_API_BASE=http://192.168.0.130:1234/v1 LM_STUDIO_API_KEY="ollama" dir-assistant start --single-prompt "what is dir-assistant?"

/usr/local/anaconda3/envs/py312/lib/python3.12/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Based on the `setup.py` file, it seems that "dir-assistant" is a software application or library designed to interact with files in a user's current directory through conversational prompts. Here are some key points about what dir-assistant appears to do:

1. **Interaction with Files**: The app allows users to chat with the files contained in their current directory. It implies that it can process and respond to textual queries or commands related to these files.

2. **Local or API LLM (Large Language Model)**: It uses either a local or an API-based Large Language Model, which suggests that it relies on advanced natural language processing techniques to understand and generate responses.

3. **Python-Based Application**: The app is written in Python, using several libraries to handle various functionalities such as file interaction, text processing, and possibly cloud services or machine learning models.

Here's a breakdown of the components mentioned in the `setup.py` file:

- **llama-cpp-python**: A library that provides an interface for LLMs (like Llama 2), potentially used for local language model processing.
  
- **faiss-cpu**: A library for efficient similarity search, which could be used to index and search through files in the directory.

- **litellm**: A Python library designed to work with various language models, possibly used as an API client to external LLM services.

- **colorama**: A utility for colored terminal text, which might be used for enhancing the user interface by changing text color or style.

- **sqlitedict**: A simple way of storing dictionary data in SQLite database files, which could be used for caching or storing session state information.

- **prompt-toolkit**: A library that provides rich text editing experiences using a command-line interface, possibly used to implement a conversational user interface.

- **watchdog**: A Python module to monitor file system events, useful for keeping track of changes in the directory.

- **google-generativeai** and **openai**: Libraries that provide interfaces to Google's and OpenAI's respective AI services, which are likely used to leverage external LLMs through their APIs.

- **boto3** and other AWS-related libraries: These might be used for interacting with AWS services or cloud storage if dir-assistant is designed to handle files stored in the cloud.

In summary, "dir-assistant" appears to be a Python-based application that uses conversational AI to interact with the files in a user's directory. It leverages various libraries and potentially external APIs to process and respond to textual queries regarding these files.

Please test on your system as well. I plan to use dir-assistant as a part of the workflow e.g. as a tool, not as a standalone app.
This feature implementation allows to get output from the dir-assistant and hide debug information which should not be visible.
Given that I had to add dynamic configuration options, a workflow can now specify properties like PRINT_CGRAG=0 and other like context length.
Some questions for small repos don't require full context, but if question is asked across multiple repos, the context can be dynamically changed.
I haven't tested yet, but it should be possible to change LLM as well by passing LITELLM_MODEL env parameter.

@curvedinf
Copy link
Owner

curvedinf commented Jan 16, 2025

Awesome work. I would like to expand on this a bit before merging it however. The main problem with this methodology is the index startup and model load need to happen for every run. I'd like to instead have a client/server approach where you run dir-assistant server in the directory you want to have indexed as a blocking process, and then in a separate shell, you can use dir-assistant client "prompt here". This will eliminate the startup cost. Furthermore, I'd like the server to be a basic openai-compliant API with initially basic MVP features. This will allow dir-assistant to be used as the backend of a variety of better clients in addition to the usecase you're adding this for.

I'll leave this PR open, as this is good work you did. Would you be interested in working on the features I mentioned above?

The way I'd do the server is a flask HTTP server, multiprocessing for a separate python instance for dir-assistant's main thread, and input/output queues between the processes.

@iSevenDays
Copy link
Collaborator Author

iSevenDays commented Jan 16, 2025

@curvedinf I can explain my use case and why I opted out of having a server inside dir-assistant.

I use LM Studio or Ollama. They manage my LLM models. I have multiple computers with OpenAI-compatible servers.
I define them as simply as
OPENAI_API_BASE=http://192.168.0.120:1234/v1
OPENAI_API_BASE=http://192.168.0.140:1234/v1
that eliminates a need for a LLM server.
The time the model loads the embeddings is very low compared to actual generation. Embeddings are loaded in like <10 seconds, where the generation may take up to 10 minutes (5 min CGRAG + 5 min LLM response).

The most important part is that I don't want to expose dir-assistant to network, rather than I login into my machine that has dir-assistant installed using workflow SSH node, execute dir-assistant --single-prompt "explain to me the project" -d project1,project2.
The call above is configured in a workflow n8n e.g. if a user asks about project1, I classify the request as project1-related and dir-assistant only executes in project1 folder.

I don't want dir-assistant to gather context information across 20 of my projects, it will take more than 900 000+ tokens and 30 minutes to respond.
I want to classify a user's request and only execute that request for a specific folder.

The current implementation already does what it has to do - I send a prompt and a list of folders and I receive a response back.

Here are logs of time execution

2025-01-16 19:13:23 [DEBUG] 
Received request: POST to /v1/embeddings with body  {
  "input": "what is dir-assistant?",
  "model": "bge-m3",
  "encoding_format": "base64"
}
2025-01-16 19:13:23  [INFO] 
Received request to embed: what is dir-assistant?
2025-01-16 19:13:26 [DEBUG] 
[INFO] [LlamaEmbeddingEngine] All parsed chunks succesfully embedded!
2025-01-16 19:13:26  [INFO] 
Returning embeddings (not shown in logs)
2025-01-16 19:13:26 [DEBUG] 
Received request: POST to /v1/chat/completions with body 

and client side

dir-assistant % time PYTHONWARNINGS="ignore" PRINT_CGRAG=0 OPENAI_API_BASE=http://192.168.0.120:1234/v1 OPENAI_API_KEY=empty LM_STUDIO_API_KEY="ollama" dir-assistant start --single-prompt "what is dir-assistant?"

Dir-assistant is an open-source Python application designed to interact with the files in the current working directory using a Local or API-based Large Language Model (LLM). It allows users to chat with their computer's file system, making it easier to navigate and manipulate files. Here are some key points about Dir-assistant based on the provided information:

- **Features**: The application likely offers a command-line interface where users can engage in conversations with the computer's file system, leveraging AI capabilities to perform tasks such as searching for specific files, extracting information from text files, or organizing and managing directories.
  
- **Technology**: It uses various Python libraries and external packages (like llama-cpp-python, faiss-cpu, litellm, colorama, etc.) to implement these features. These packages are likely used for tasks like natural language processing, indexing, and interacting with external APIs (such as Google Generative AI or OpenAI).

- **Structure**: The codebase includes several modules and files, such as `dir_assistant/__init__.py`, `dir_assistant/main.py`, and various assistant and CLI (Command Line Interface) submodules. This suggests that the application is modular and has a structured design.
  
- **Usage**: The package also defines an entry point for a console script called `dir-assistant`, which means it can be run directly from the command line.

- **License**: The project is licensed under the MIT License, a widely used open-source license that allows users to freely use and modify the code as long as they follow certain guidelines.

- **Documentation**: A README file is provided, which contains more detailed information about the application. The `setup.py` file also includes a long description of the program, which gives users an overview of its capabilities and usage.

In summary, Dir-assistant appears to be a project that brings AI-powered conversation features to the command line interface for interacting with files in your computer's directory structure.
PYTHONWARNINGS="ignore" PRINT_CGRAG=0 OPENAI_API_BASE= OPENAI_API_KEY=empty =  3,30s user 0,45s system 6% cpu 56,181 total

server

lama_perf_context_print:        eval time =    9515.76 ms /   138 runs   (   68.95 ms per token,    14.50 tokens per second)
llama_perf_context_print:       total time =   17880.86 ms /  1089 tokens
2025-01-16 19:13:44  [INFO] 
Finished streaming response
2025-01-16 19:13:44  [INFO] 
[LM STUDIO SERVER] Client disconnected. Stopping generation..

dir-assistant start-up time is not an issue, LLM response for long context is.

@iSevenDays
Copy link
Collaborator Author

iSevenDays commented Jan 16, 2025

Here is an additional log of the same request above of only sever requests to show you how much time it takes on a server.
Then we can calculate how much time did the client spend vs how much time did server take to respond.

First request CGRAG

2025-01-16 **19:13:26** [DEBUG] 
Received request: POST to /v1/chat/completions with body  {
  "messages": [
    {
      "role": "user",
      "content": "You are a helpful AI assistant.",
      "tokens": 14
    },
    {
      "role": "assistant",
      "content": "

Request finished

2025-01-16 **19:13:44**  [INFO] 
Finished streaming response

Second request - embeddings (3 seconds)
Third request

2025-01-16 **19:13:47** [DEBUG] 
Received request: POST to /v1/chat/completions with body  {

Finished

2025-01-16 19:14:17  [INFO] 
Finished streaming response

From first to last request - 51 seconds.
Let's compare the total time
PYTHONWARNINGS="ignore" PRINT_CGRAG=0 OPENAI_API_BASE= OPENAI_API_KEY=empty = 3,30s user 0,45s system 6% cpu **56,181 total**. e.g 56 seconds.
So for loading of local cache, network requests, and latency the client spent 5 seconds.

In my case I already have the embeddings cached for all of my projects, and LLM generation time is around 99% of the total time for context >32-128k.

@curvedinf
Copy link
Owner

I can see the need for a non-IP based solution (security and other reasons). Let me hash it over a bit. It may be best to have this and the client/server version. Regarding startup not being a concern, I have run larger models on huge filesets and startup can get pretty long, so it is a concern for those instances. Also APIs have low time to first token because they are parallelized, and you can opt to use smaller API models. In those instances, generation time can be short even on large contexts, which means you can end up spending most of your time in startup in certain situations.

BTW I was suggesting in your hypothetical case with the server, you use dir-assistant to provide an API and consume an API.

@iSevenDays
Copy link
Collaborator Author

Just a small note regarding long startup time.

I have long startup time only if the embedding hasn't been created for a specific file before. That's not my case, as the projects don't change that often, so I have to update maybe 10 files per day.
Maybe if you have tens of thousands of files, that can be an issue, but that can only be solved with some kind of daemon server.
In my opinion, I try to keep PRs small and maybe that can be implemented in a separate PR.

The current PR provides a solution, so that dir-assistant can now be integrated as a part of the complex workflow, rather than a standalone solution that you need to launch on your machine.
I could even integrate it into chat UI with this PR, so that's a win for me for sure. Thanks for the great project! I found out only this project works for large codebases. 🙂

@curvedinf curvedinf changed the base branch from main to single-prompt January 29, 2025 00:38
@curvedinf
Copy link
Owner

I'm pulling into the 'single-prompt' branch to test and polish. If you have additional changes, make a PR to that branch.

@curvedinf curvedinf merged commit 188e560 into curvedinf:single-prompt Jan 29, 2025
@iSevenDays
Copy link
Collaborator Author

@curvedinf thank you very much! I appreciate that!

I've tried to fix as many bugs as I could for my workflow. If you see any bugs, please let me know and I'll try to fix them in my free time.
I'm currently integrating it as part of n8n workflow where dir-assistant is being executed on a remote machine using simple server.

I'll share a snippet with you below in case you find it useful. I'll be testing this this week.

app = FastAPI(middleware=middleware)
app.add_middleware(GZipMiddleware, minimum_size=1000)

# Rate limiter
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

@app.post("/execute")
@limiter.limit("10/minute")
async def execute_command(request: Request, command: CommandRequest):
    # Verify access before processing
    await verify_access(request)
    
    PROCESS_GAUGE.inc()
    try:
        # Working directory validation disabled for debugging
        
        # Temporarily disabled path validation for debugging
        # Kubernetes mounts should enforce directory access
        
        # Prepare command with proper argument order
        cmd = ["dir-assistant", "start"]
    
        # Validate and convert directory paths
        if command.directories:
            for dir_path in command.directories:
                # Resolve relative paths against working_dir first
                full_path = os.path.normpath(os.path.join(command.working_dir, dir_path))
                
                # Verify the resolved path is within allowed workspace
                if not any(full_path.startswith(base) for base in ALLOWED_BASE_PATHS):
                    raise HTTPException(
                        status_code=400,
                        detail=f"Path {full_path} is not in allowed directories"
                    )
                
                # Verify directory exists in container
                if not os.path.exists(full_path):
                    raise HTTPException(
                        status_code=400,
                        detail=f"Directory {full_path} does not exist in container"
                    )
                    
                # Add each directory with its own -d flag
                cmd.extend(["-d", dir_path])
    
        # Disable color and add single prompt argument
        cmd.extend(["--no-color", "--single-prompt", command.prompt])
        
        # Prepare environment variables
        env = {
            **os.environ,
            "PYTHONUNBUFFERED": "1",
            "WORKING_DIR": command.working_dir,
            "ACTIVE_MODEL_IS_LOCAL": "false",
            "ACTIVE_EMBED_IS_LOCAL": "false",
            "PYTHONWARNINGS": "ignore",
            "XDG_CONFIG_HOME": "/home/appuser/.config",
            "XDG_CACHE_HOME": "/home/appuser/.cache"
        }
        
        # Map input parameter variations to standardized environment variables
        env_vars_mapping = {
            # OpenAI API base variations
            "openai_api_base": "OPENAI_API_BASE",
            "OPENAI_API_BASE": "OPENAI_API_BASE", 
            "api_base": "OPENAI_API_BASE",
            "base_url": "OPENAI_API_BASE",
            "litellm_api_base": "OPENAI_API_BASE",
            "LITELLM_API_BASE": "OPENAI_API_BASE",
            
            # API keys
            "openai_api_key": "OPENAI_API_KEY",
            "OPENAI_API_KEY": "OPENAI_API_KEY",
            "lm_studio_api_key": "LM_STUDIO_API_KEY", 
            "LM_STUDIO_API_KEY": "LM_STUDIO_API_KEY",
            
            # Model configurations
            "litellm_model": "LITELLM_MODEL",
            "LITELLM_MODEL": "LITELLM_MODEL",
            "litellm_embed_model": "LITELLM_EMBED_MODEL",
            "LITELLM_EMBED_MODEL": "LITELLM_EMBED_MODEL",
            "litellm_context_size": "LITELLM_CONTEXT_SIZE",
            "LITELLM_CONTEXT_SIZE": "LITELLM_CONTEXT_SIZE",
            
            # Other settings
            "global_ignores": "GLOBAL_IGNORES",
            "GLOBAL_IGNORES": "GLOBAL_IGNORES",
            "print_cgrag": "PRINT_CGRAG",
            "PRINT_CGRAG": "PRINT_CGRAG"
        }
        
        # Validate required API parameters
        required_params = {
            "OPENAI_API_BASE": "http://169.254.240.175:1234/v1",
            "LITELLM_MODEL": "openai/hermes-3-llama-3.1-8b",
            "LITELLM_EMBED_MODEL": "openai/bge-m3"
        }
        
        # Get actual values from request or use defaults
        for key, default in required_params.items():
            # Check if param exists in additional_params (case-insensitive)
            param_value = next((v for k,v in command.additional_params.items() if k.upper() == key), None)
            
            if param_value:
                env[key] = str(param_value)
            elif key not in env:
                env[key] = default
                print(f"Warning: Using default value for {key} = {default}")
                
            if not env[key]:
                raise HTTPException(
                    status_code=400,
                    detail=f"{key} must be specified in additional_params"
                )

        # Validate API base URL format
        api_base = env["OPENAI_API_BASE"]
        if not api_base.startswith(("http://", "https://")):
            raise HTTPException(
                status_code=400,
                detail="OPENAI_API_BASE must start with http:// or https://"
            )
        env["OPENAI_API_BASE"] = api_base.rstrip('/')
        
        # Validate API keys exist (even if empty)
        for key in ["OPENAI_API_KEY", "LM_STUDIO_API_KEY"]:
            env.setdefault(key, "empty")
        
        # Ensure required config flags are always false for cloud mode
        env["ACTIVE_MODEL_IS_LOCAL"] = "false"
        env["ACTIVE_EMBED_IS_LOCAL"] = "false"
        if env.get('VERBOSE', 0) == 1:
            cmd.extend(["--verbose"])
        
        # Map all additional params to env vars, handling case variations and value types
        for param_key, param_value in command.additional_params.items():
            # Handle both exact match and case-insensitive match
            env_key = env_vars_mapping.get(param_key) or env_vars_mapping.get(param_key.lower())
            
            if env_key:
                # Convert list values to comma-separated strings
                if isinstance(param_value, list):
                    env[env_key] = ",".join(param_value)
                else:
                    env[env_key] = str(param_value)
        
        # Validate required API base
        if not env.get("OPENAI_API_BASE"):
            raise HTTPException(
                status_code=400, 
                detail="OPENAI_API_BASE must be specified in additional_params"
            )
    
        # Execute command with timeout and better error handling
        try:
            print(f"Executing command: {' '.join(cmd)}")  # Debug logging
            print(f"With environment: { {k: v for k, v in env.items() if 'KEY' not in k} }")  # Redact secrets
            print(f"Full working directory: {command.working_dir}")  # Debug path
            # return {
            #     "status": "success",
            #     "working_dir": command.working_dir,
            #     "directories": command.directories,
            #     "env": {f"{k}: {v}" for k, v in env.items()},
            #     "cmd": cmd
            # }
            result = subprocess.run(
                cmd,
                cwd=command.working_dir,
                capture_output=True,
                text=True,
                env=env,
                timeout=300,  # 5 minute timeout
                check=True  # Raise exception on non-zero exit
            )
            
            # Update metrics
            REQUESTS_COUNTER.labels(
                method="POST",
                endpoint="/execute",
                http_status=200 if result.returncode == 0 else 500
            ).inc()
            
            return {
                "status": "success" if result.returncode == 0 else "error",
                "exit_code": result.returncode,
                "working_dir": command.working_dir,
                "directories": command.directories,
                "log_entries": len(result.stdout.splitlines()) + len(result.stderr.splitlines()),
                "output": result.stdout if result.returncode == 0 else result.stderr
            }
            
        except subprocess.CalledProcessError as e:
            print(f"Command failed: {e}\nStderr: {e.stderr[:500]}")  # Truncate long outputs
            error_detail = f"Command failed with exit code {e.returncode}: {e.stderr[:1000]}"
            print(f"Command error: {error_detail}")  # Log full error
            raise HTTPException(
                status_code=500,
                detail=error_detail
            )
        except Exception as e:
            import traceback
            tb = traceback.format_exc()
            print(f"Unexpected error: {str(e)}\n{tb}")
            raise HTTPException(
                status_code=500, 
                detail=f"Internal server error: {str(e)}"
            )
            
    finally:
        PROCESS_GAUGE.dec()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants