Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed File Type Filter #326

Open
jdspugh opened this issue Sep 22, 2024 · 5 comments
Open

Embed File Type Filter #326

jdspugh opened this issue Sep 22, 2024 · 5 comments

Comments

@jdspugh
Copy link

jdspugh commented Sep 22, 2024

Is your feature request related to a problem? Please describe.
Some binary files are being embedded which I don't want embedded.

Describe the solution you'd like
A filter for accepting only certain file types (e.g. .mjs, .js. .ts, .html).

Describe alternatives you've considered
A filter for rejecting certain file types.

@rjmacarthy
Copy link
Collaborator

Hey, thanks for the interest. Is the .gitignore file not usable for your use case, should there be an additional option?

@jdspugh
Copy link
Author

jdspugh commented Oct 14, 2024

I have some media files in my projects (in this specific case, .wof font files). And it seems they were being processed/added into the embeddings. I want to see the files in my VSC project, but I don't want them processed/added to the embeddings. That's why I thought a file extension filter would be useful. What do you think?

@fishshi
Copy link
Collaborator

fishshi commented Nov 8, 2024

Hello, this issue should be resolved. Now, Twinny not only uses .gitignore to ignore files during embedding, but you can also add custom ignore items in the global settings.

@mmorys
Copy link

mmorys commented Dec 10, 2024

I had the same question and found this issue. The above solution works and this issue can probably be closed.

@psyray
Copy link

psyray commented Jan 25, 2025

Hello, this issue should be resolved. Now, Twinny not only uses .gitignore to ignore files during embedding, but you can also add custom ignore items in the global settings.

Hi, thanks for the modification. It works, but why not using a specific file for twinny ? Like .twinnyignore
For example, with Cursor.ai we could set a .cursorignore file to prevent specific files to be indexed.

Because when doing embedings using

  • .gitignore is project specific but we cannot add files that we want to track using git, or we could add it but we need to remove them later when doing a commit if those ignored files have changed
  • global settings is workspace specific, and we could have, in the same workspace, more than one project, git tracked or not, with different techs, and the ignore could be to wide.

Using a specific file for twinny to ignore files could be a more granular approach and provide a lot of flexibility to construct a pertinent embeding context.
We could place those files in each base folder of our workspace and ignore files according to the techs used.
Or set it at the root of the workspace and define accurate rules for each sub folder

So, that file could be placed in the .gitignore file to not be tracked.
And you can let the user the choice to use .gitignore file, or .twinnyignore file.
And the global settings remains for files we are sur we never want to index.

What do you think about this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants