Unlocking 70% Faster Response Times Through Token Pooling
The magic of hierarchical clustering token pooling
Dec 2, 20248 min read

Search for a command to run...
The magic of hierarchical clustering token pooling

An experiment on vision models and RAG

RAG (Retrieval Augmented Generation) is a powerful technique that allows us to enhance large language models (LLMs) output with private documents and proprietary knowledge that is not available elsewhere. For example, a company's internal documents o...
ColiVara stores, searches, and retrieves documents based on their visual embedding.