What is Embeddings?
Embeddings — Numerical representations of concepts or words that allow AI to understand semantic relationships.
Embeddings convert text, images, or other data into dense numerical vectors where similar items are positioned close together in mathematical space. They are the foundation of semantic search, recommendation systems, and RAG — enabling computers to understand meaning rather than just keywords.
Frequently Asked Questions
How are embeddings created?
Specialized embedding models process your data and output a vector of numbers (typically 768-1536 dimensions). Popular embedding models include OpenAI’s text-embedding-3 and open-source alternatives like BGE.
What is the difference between embeddings and tokens?
Tokens are how models break up text for processing. Embeddings are the numerical representations of that text’s meaning. Tokenization is a preprocessing step; embeddings capture semantic content.
Do embeddings work for non-text data?
Yes. There are embedding models for images (CLIP), audio, code, and even structured data. Multi-modal embeddings can place text and images in the same vector space.