Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI Integration #1

Open
mrtomlin21 opened this issue May 16, 2023 · 3 comments
Open

Add OpenAI Integration #1

mrtomlin21 opened this issue May 16, 2023 · 3 comments

Comments

@mrtomlin21
Copy link

Get description of python files from openai and add to markdown file.

@Maralai
Copy link
Owner

Maralai commented Feb 1, 2025

  1. SummarizeGPT's primary value proposition is providing comprehensive codebase context for LLM pair programming interactions
  2. While large context windows (like Gemini's) reduce token efficiency concerns, there's value in optimizing for:
    • Zero/few-shot effectiveness
    • Human interpretability
    • Machine-friendly semantic signatures

The interesting part about using a model to distill tokens while maintaining human readability reminds me of the concept of "dual-purpose encoding" - where the same content serves both human and machine needs effectively. Using a local model would be more cost-effective.

Perhaps we could explore a format that includes:

# [filename]
@semantictags: [local-model-generated-tokens]
@type: [file-type]
@path: [relative-path]
@summary: [local-model-generated-brief]

```content```

This could allow:

  1. Traditional human reading patterns
  2. Quick machine parsing of semantic metadata
  3. Preservation of original content when needed
  4. Hierarchical context through path information

@Maralai
Copy link
Owner

Maralai commented Feb 1, 2025

Building on these observations about SummarizeGPT's role in LLM pair programming and the potential for dual-purpose encoding, I've been considering a systematic approach through a flexible configuration system.

Rather than jumping straight to implementing semantic enhancements, I believe we should first establish a robust configuration framework that can support various levels of enhancement while maintaining the tool's simplicity. Here's what I'm envisioning:

# Example config structure (config.yaml)
summarize_gpt:
  default:
    encoding: cl100k_base
    max_lines: null
    semantic:
      enabled: false
      model: "sentence-transformers/all-MiniLM-L6-v2"
      token_limit: 100
  semantic:
    enabled: true
    model: ${SEMANTIC_MODEL_PATH}  # from env
    api_key: ${OPENAI_API_KEY}     # from env

With a clear configuration precedence:

  1. CLI arguments
  2. Environment variables
  3. Local config (.summarizegpt.yaml in current dir)
  4. User config (~/.config/summarizegpt/config.yaml)
  5. Default config (packaged with tool)

This configuration-first approach would provide the foundation needed to support both the current functionality and future semantic enhancements while giving users fine-grained control over how they want to use the tool.

An additional benefit of this YAML-based approach is that it enables effortless re-summarization of projects. By storing the configuration details in the project's directory, teams can maintain consistent summarization settings across multiple runs and between different team members. This is particularly valuable when working with large codebases or when you need to regenerate summaries after code changes while maintaining the same semantic analysis parameters and exclusion rules.

The local .summarizegpt.yaml effectively serves as both a configuration cache and a project-level standard, ensuring that everyone working with the codebase gets the same context representation when using SummarizeGPT. This consistency is crucial for maintaining effective LLM pair programming practices across a team.

Thoughts?

@Maralai
Copy link
Owner

Maralai commented Feb 1, 2025

I think I will try to get #4 looked at. I see the benefit in my flows when implementing a configuration-specific setup. We can use this as a basis to provide SummarizeGPT the ability to execute code summarization so this tool can be a more effective dual-encoded tool. Thanks again for using this tool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants