Skip to content

Code search with tree-sitter + semantic search

Notifications You must be signed in to change notification settings

skrider/softgrep

Repository files navigation

softgrep

Code semantic search tool. Uses tree-sitter to quickly parse files in a language-aware manner, and generates embeddings on a remote server over gRPC. Uses embeddings to perform semantic search. Takes full advantage of parallelism to parse files and tokenize semantic chunks. Fully language agnostic with tree-sitter. Aware of git. Caches results for increased performance. Like ripgrep except for semantic search.

At a high level, the flow looks like:

cli -> walk directory -> chunk version controlled files with tree sitter -> tokenize via huggingface fast tokenizer -> generate embeddings remotely via gRPC client -> cosine distance nearest neighbor search on flat index

The embedding service runs remotely on Triton. Right now am using microsoft/codebert-base.

About

Code search with tree-sitter + semantic search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published