pgvector: Vector Search Inside Postgres (No Extra DB)
Why pgvector lets Postgres do vector search itself - a vector column type, distance operators, and IVFFlat vs HNSW indexes - so most projects never need a separate vector database.
Watch (20:08)
Overview
Why pgvector lets Postgres do vector search itself - a vector column type, distance operators, and IVFFlat vs HNSW indexes - so most projects never need a separate vector database.
Full transcript (from the video)
If you have ever wanted to search by meaning instead of by keyword, you have probably been told you need a dedicated vector database, a second system to deploy to secure, to keep in sync with your real data, and to back up on its own schedule. This video is about why for a very large number of projects, you do not need that second system at all. pgvector is an open-source extension that teaches Postgres how to store and search vectors. It adds a new column type for embeddings, a handful of distance operators for measuring similarity, and optional indexes that make search fast at scale.
Everything happens inside the database you already run. Over the next 20 minutes, we will go from the very first idea, what a vector even is, all the way to choosing and tuning an index for production. We will write real SQL, look at the project's own documentation together, and finish by placing pgvector inside a retrieval pipeline. No prior vector search experience required.
Start with the problem that vectors solve. Traditional search matches token. If a user types the word car and your documents only ever say automobile, a keyword index finds nothing because the letters do not line up. The two words mean the same thing, but the database has no idea because it is comparing spelling, not meaning.
Semantic search takes a different path. Instead of comparing the raw text, it first converts each piece of content into a list of numbers that captures what the content is about. Two passages that mean similar things produce similar lists of numbers, even when they share no words at all. Once meaning is represented as numbers, finding related content becomes a geometry problem.
You are simply looking for the numbers that sit closest together. That single shift from matching tokens to measuring distance between numbers is the entire foundation of vector search. Everything else is detail about how to store those numbers and how to find the nearest ones quickly. The thing that turns content into numbers is called an embedding model.
You hand it a sentence, an image or an audio clip and it returns a fixed length list of numbers. That list is the embedding and we call it a vector. The length of the list is fixed for a given model. Common sizes are 768 numbers or 1,536 numbers.
Every item you embed with the same model produces a list of exactly that length, which is what lets you compare them. Here is the property that makes the whole idea work. The model is trained so that content with similar meaning produces vectors that sit close together and content with different meaning produces vectors that sit far apart. You can picture each vector as a point in a space with hundreds of dimensions.
We cannot draw hundreds of dimensions. So on screen we flatten it to a simple scatter. But the intuition holds. Similar things cluster together while different things spread far apart.
Search becomes the act of finding the nearest points. So what does pgvector actually add to Postgres? Three things and they map cleanly onto the three pieces of any search system. First, storage.
pgvector adds a new column type. Simply you declare how many numbers each vector holds and from then on that column stores an embedding in every row right next to your title, your timestamp, your foreign keys. The embedding is not off in some other system. It is a column in your table.
Second, comparison. pgvector adds distance operators that measure how far apart two vectors are. These are the tools that answer the question, how similar are these two things? Third, speed.
pgvector adds two kinds of approximate index with the names IVFFlat and HNSW. We will spend real time on both later because choosing between them is the most important decision you will make. For now, just hold the shape in your head. A type to store, operators to compare, and indexes to go fast.
That is the whole extension. Let us make this concrete with the smallest possible working example. Three statements take you from nothing to a table holding real vectors. The first statement enables the extension.
You run create extension vector one time per database and from that point Postgres understands the new type and operators. You need the extension installed on the server which most managed providers already offer. The second statement creates an ordinary table. The only new thing is the embedding column declared as a vector with a fixed length.
In this toy example, each vector holds three numbers because three is easy to read on a slide. In a real project, that length matches your embedding model. So, it would be 768 or 1,536. The third statement inserts two rows.
You write each embedding as a list of numbers wrapped in square brackets and passed as text. In production, you would not type these by hand. Your application calls an embedding model, gets back the list of numbers, and passes it straight into the insert. That is all it takes to start storing vectors.
To find the nearest vectors, you need a way to measure distance. And pgvector gives you four. They look like small arrow symbols in SQL, but each one answers the same question in a different way. The first is Euclidean distance, the everyday straight line distance between two points.
The second is the inner product, which measures how much two vectors point in the same direction and how long they are. The third is cosine distance which ignores length entirely and looks only at the angle between two vectors which is to say only at direction. The fourth is taxi cab distance also called L1 which adds up the differences along each dimension as if you could only travel along a grid. Which one should you use?
The answer is not about taste. It is about how your embedding model was trained. Most modern text embedding models are trained so that the angle between vectors carries the meaning which makes cosine distance the right default. When in doubt, check your model's documentation and match the operator to what it expects.
Here is the moment where vector search stops feeling exotic. To find the most similar rows, you order by distance and take the top few. That is it. There is no special search function to learn.
No separate query language. You write order by the embedding column, a distance operator, and the vector you are searching for. Then limit to however many neighbors you want. Look closely at this query because it shows the real advantage of keeping vectors in Postgres.
There is a WHERE clause filtering to rows from the last 30 days sitting right next to the vector ordering. The similarity search and the ordinary filter run together in one query against one consistent snapshot of your data. In a separate vector database, that filter is a real problem. You would search vectors in one system, fetch identifiers, then go back to Postgres to filter and join and stitch the results together in application code.
Here, the planner handles it for you. Vectors are just another thing your database knows how to sort by. Before we go deeper into indexes, let us look at where all of this is documented. Because the project's own readme is genuinely excellent and you will come back to it constantly.
We are looking at the pgvector repository on GitHub. Right at the top is the oneline description. Open- source vector similarity search for Postgres. Scroll down and the first thing you reach is installation with instructions for compiling from source and for the major package managers and cloud providers.
Keep scrolling and you reach getting started. Notice that it is exactly the three statements we just wrote. Create the extension. Create a table with a vector column.
Insert some rows. Below that, the storing and querying section lays out the distance operators we just covered, each with its symbol and meaning. Now we scroll into the part we are about to study in depth indexing. Here are the two create index statements.
One for HNSW and one for IVFFlat with all of their options. When you build something real, this page is the reference you keep open in a tab. So far, every query we have written is exact with no index. Postgres compares your search vector against every single row, computes the true distance to each one, and returns the genuine closest matches.
The results are perfectly correct. The problem is the word every. When your table holds a few thousand rows, scanning all of them takes no time at all, and you should not add an index. But as the table grows to millions of rows, that full scan grows right along with it.
Cost rises in a straight line with the number of rows, and eventually each query is simply too slow to serve in real time. This is where approximate nearest neighbor search comes in. The keyword is approximate. Instead of guaranteeing the exact closest matches, an approximate index examines only a clever fraction of the data and returns matches that are almost always the true neighbors.
You give up a tiny measurable amount of accuracy and in exchange you get search that stays fast no matter how large the table grows. That trade is the heart of every vector index and pgvector gives you two ways to make it. pgvector offers two index types and choosing between them is the most consequential decision in this whole topic. Let us see how each one thinks.
The first is IVFFlat. The idea is to divide all your vectors into a number of groups called lists where each group holds vectors that are near each other. At query time, instead of scanning every vector, the index figures out which few groups your search vector is closest to and only searches inside those. Fewer groups examined means a faster query.
IVFFlat builds quickly and uses relatively little memory. But it has one catch we will come back to. It needs to look at your existing data to decide where the group boundaries go. The second is HNSW, which builds a layered graph connecting each vector to its neighbors.
A search starts at the top sparse layer and hops from vector to vector, always moving closer to the target, dropping into denser layers as it homes in. HNSW gives the best balance of speed and accuracy of anything in pgvector. The price is that it builds more slowly and uses more memory. Let us tune IVFFlat because it has exactly two knobs and they are easy to reason about.
The first knob is set at build time when you create the index. It is the number of lists, the number of groups the vectors get divided into. The project recommends a simple rule of thumb for tables up to about a million rows. Set lists to your row count divided by 1,000.
past a million rows, use the square root of the row count instead. More lists means each group is smaller. So, a query that searches a fixed number of groups touches fewer vectors and runs faster. But it can also miss neighbors that fell into a group you did not search.
The second knob is set at query time and it is called probes. It controls how many of those groups each query actually searches. With probes set to one, you search only the single closest group, which is fast but can miss. Raise probes and you search more groups, which improves recall at the cost of speed.
Notice also that the index references cosine operations, which ties the index to the distance operator your queries use. The index and the operator must agree. Now, HNSW, which has three knobs. Two are set at build time and one at query time.
At build time, you set m the number of connections each vector keeps to its neighbors in the graph. The default is 16. More connections make the graph richer and improve recall, but they cost memory and slow the build. You also set EF construction, the size of the candidate list the builder considers as it wires up each vector.
The default is 64. A larger value builds a higher quality graph again at the cost of build time. At query time, you set EF search the size of the candidate list. The search keeps as it walks the graph.
This is your live recall versus speed dial. raise it and the search considers more candidates, finds better neighbors, and runs a little slower. HNSW has one quiet but important advantage over IVFFlat. It does not need to study your data before building because the graph is built incrementally as rows arrive.
That means you can create the index on an empty table and insert afterward, which fits naturally into a normal application. Here is the good news about tuning. Once you have built an index, day-to-day tuning collapses to a single question. How much recall do you need and how much latency will you pay for it?
Both index types give you one query time knob for this. For HNSW, it is EF search. For IVFFlat, in both cases, the behavior is the same. Turn the knob up and each query examines more candidates.
So you find more of the true nearest neighbors and the query takes a little longer. Turn it down and queries get faster but start to miss. Your job is to raise it just until recall is good enough for your users and then stop because every step beyond that point is latency. You are paying for accuracy.
Nobody will notice but good enough compared to what? This is the step people skip. To know your real recall, you compare your index's results against the exact answer. Run the same query with no index on a sample of your data.
Treat that as ground truth and measure how many of those true neighbors your index actually returned. Tune against that number, not against a guess. Float vectors are the default, but they are not your only option, and the alternatives matter once your tables get large. The standard vector type stores each number as a full-size 32-bit float that is accurate, but it is also the most expensive choice for both storage and memory, and it can be indexed up to 2,000 dimensions.
The halfvec type stores each number as a 16-bit float instead. You lose a little numerical precision, which for similarity search almost never matters. And in exchange, you cut storage and memory roughly in half. It also doubles the dimensions you can put under an index up to 4,000, which is exactly what you need for the largest modern embedding models.
There are two more specialized types. The bit type stores binary vectors, useful for compact fingerprints compared with Hamming distance. The sparsevec type stores vectors that are mostly zeros, keeping only the nonzero entries, which is ideal for very high-dimensional sparse representations. The guiding rule is simple.
Pick the smallest type that still captures your meaning and your tables and indexes stay lean. pgvector is actively developed and two recent additions are worth knowing because they fix real production pain. The first is iterative index scans. Remember that approximate indexes return a fixed-sized batch of candidates.
If you also have a selective WHERE clause, a filter on date or category or tenant, many of those candidates can be filtered away and you can end up with fewer results than you ask for. Even though good matches exist deeper in the index. Iterative scans fix this. The index keeps scanning more of itself in batches until enough rows survive your filter or the index is exhausted.
You turn it on with a single setting and the strict and relaxed variants let you choose whether results must stay in exact distance order. The second is parallel index builds for HNSW. Building a graph over millions of vectors used to be a long single-threaded wait. Recent versions spread that work across multiple worker processes.
Cutting build time substantially on a multi-core machine. Builds are fastest when the graph fits in maintenance memory. So giving the build room to work pays off directly. Let us place pgvector inside the application most people are building today.
Retrieval augmented generation or rag. Rag has the first is ingestion and it happens ahead of time. You take your documents, split them into chunks and run each chunk through an embedding model to get a vector. Then you store everything in Postgres.
The chunk text, its vector and any metadata like source, author or because pgvector lives inside Postgres. All of that sits in one row in one table under one transaction. The second phase happens live when a user asks a question. You embed the question with the same model.
Then run the nearest neighbor query we learned earlier to pull the handful of chunks most similar to the question. Those chunks become the context you hand to a large language model which writes an answer grounded in your actual content. Notice what the database is doing here. It is the retrieval layer, the memory of the whole system.
And because it is just Postgres, your permission filters, your joins and your transactions all still apply to that retrieval for free. Let us bring it home. The promise at the start of this video was that for many projects, you do not need a separate vector database. And now you have seen exactly why pgvector keeps your embeddings in the same Postgres that already holds your rows, enforces your constraints, and runs your transactions.
That means one system to operate and back up instead of two. The same permissions and the same consistent snapshot apply to every query. Your similarity search composes with your ordinary filters and joins because it is all just SQL. The path forward is short.
Enable the extension with create extension vector. Add a vector column sized to your embedding model. Store your embeddings and query the nearest neighbors with order by and a distance operator. While your table is small, run exact search and add no index at all.
When it grows and queries slow down, add an HNSW index, then turn a single query time knob up until recall is good enough. measured against exact search.