Skip to content
View Pringled's full-sized avatar

Organizations

@MinishLab

Block or report Pringled

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pringled/README.md

Hi there 👋

I'm Thomas van Dongen. I am currently working as head of AI engineering at Springer Nature. I am one of the founding members of The Minish Lab where we develop open-source machine learning packages.

My research interests include:

  • 🚤 Small, fast models: Making CPU-friendly models.
  • 🧩 Embeddings: Focusing on static embeddings to balance performance and resource usage.
  • Efficient Nearest Neighbors: Optimizing ANN/KNN methods for high-speed search and scalable similarity comparisons.
  • 🔍 Recommenders: Developing smarter systems to improve recommendations and information retrieval, focussed on the scientific publishing space.

I'm currently working on:

  • model2vec: a library for creating state-of-the-art static embeddings by distilling sentence transformers.
  • vicinity: a library for fast and lightweight nearest neighbors, with flexible indexing backends.
  • semhash: a library for lightweight text deduplication.
  • tokenlearn: a library for pre-training static embeddings.

Info:

Pinned Loading

  1. MinishLab/model2vec MinishLab/model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    Python 933 43

  2. MinishLab/semhash MinishLab/semhash Public

    Fast Semantic Text Deduplication

    Python 449 19

  3. MinishLab/vicinity MinishLab/vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 229 6

  4. MinishLab/tokenlearn MinishLab/tokenlearn Public

    Pre-train Static Word Embeddings

    Python 41 2