PostgreSQL extensions

pg_vector

There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the pg_vector extension installed.

pg_vector is open-source vector similarity search for Postgres

Store your vectors with the rest of your data. Supports:

Column Types

vector

Store your vectors with the rest of your data

For more info please refer to the official pg_vector docs docs.

const table = pgTable('table', {
	embedding: vector({ dimensions: 3 }),
	embedding2: vector({ dimensions: 3 }).default([0, -2, 3])
})
CREATE TABLE "table" (
	"embedding" vector(3),
	"embedding2" vector(3) DEFAULT '[0,-2,3]'
);

halfvec

Half-precision vector type, stores dense vectors of half-precision float values

For more info please refer to the official pg_vector docs docs.

export const table = pgTable('table', {
	halfvec: halfvec({ dimensions: 3 }),
	halfvec2: halfvec({ dimensions: 3 }).default([0, -2, 3])
})
CREATE TABLE "table" (
	"halfvec" halfvec(3),
	"halfvec2" halfvec(3) DEFAULT '[0,-2,3]'
);

sparsevec

Sparse vector type, stores vectors with mostly zero values efficiently

For more info please refer to the official pg_vector docs docs.

export const table = pgTable('table', {
	sparsevec: sparsevec({ dimensions: 3 }),
	sparsevec2: sparsevec({ dimensions: 5 }).default(`{1:-1,3:2,5:3}/5`)
})
CREATE TABLE "table" (
	"sparsevec" sparsevec(3),
	"sparsevec2" sparsevec(5) DEFAULT '{1:-1,3:2,5:3}/5'
);

bit

Bit string type, used for binary vectors in pgvector

For more info please refer to the official pg_vector docs.

export const table = pgTable('table', {
	bit: bit({ dimensions: 5 }),
	bit2: bit({ dimensions: 3 }).default(`101`)
})
CREATE TABLE "table" (
	"bit" bit(5),
	"bit2" bit(3) DEFAULT '101'
);

Indexes

You can now specify indexes for pg_vector and utilize pg_vector functions for querying, ordering, etc.

Let’s take a few examples of pg_vector indexes from the pg_vector docs and translate them to Drizzle

L2 distance, Inner product and Cosine distance for vector

// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

export const table = pgTable('items', {
    embedding: vector({ dimensions: 3 })
}, (table) => [
  index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops')),
  index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops')),
  index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
])

L2 distance, Inner product and Cosine distance for halfvec

// CREATE INDEX ON items USING hnsw (embedding halfvec_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding halfvec_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

export const table = pgTable('items', {
	embedding: halfvec({ dimensions: 3 })
}, (table) => [
  index('l2_index').using('hnsw', table.embedding.op('halfvec_l2_ops')),
  index('ip_index').using('hnsw', table.embedding.op('halfvec_ip_ops')),
  index('cosine_index').using('hnsw', table.embedding.op('halfvec_cosine_ops')),
])

L2 distance, Inner product and Cosine distance for sparsevec

// CREATE INDEX ON items USING hnsw (embedding sparsevec_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding sparsevec_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding sparsevec_cosine_ops);

export const table = pgTable('items', {
	embedding: sparsevec({ dimensions: 3 })
}, (table) => [
  index('l2_index').using('hnsw', table.embedding.op('sparsevec_l2_ops')),
  index('ip_index').using('hnsw', table.embedding.op('sparsevec_ip_ops')),
  index('cosine_index').using('hnsw', table.embedding.op('sparsevec_cosine_ops')),
])

Hamming distance and Jaccard distance for bit

// CREATE INDEX "hamming_index" ON "items" USING hnsw ("embedding" bit_hamming_ops);
// CREATE INDEX "bit_jaccard_index" ON "items" USING hnsw ("embedding" bit_jaccard_ops);

export const table = pgTable("items", {
    embedding: bit({ dimensions: 3 }),
}, (table) => [
  index("hamming_index").using("hnsw", table.embedding.op("bit_hamming_ops")), 
  index("bit_jaccard_index").using("hnsw", table.embedding.op("bit_jaccard_ops"))
]);

Helper Functions

For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.

You can also use the following helpers:

import { l2Distance, l1Distance, innerProduct, 
          cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'

l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'

innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'

hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'

If pg_vector has some other functions to use, you can replicate implementation from existing one we have. Here is how it can be done

export function l2Distance(
  column: SQLWrapper | AnyColumn,
  value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
  if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
    return sql`${column} <-> ${value}`;
  }
  return sql`${column} <-> ${JSON.stringify(value)}`;
}

Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR

Examples

Let’s take a few examples of pg_vector queries from the pg_vector docs and translate them to Drizzle

import { l2Distance } from 'drizzle-orm';

// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))

// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })

// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)

// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${innerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)

// and more!

postgis

There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using postgis types, indexes, and queries, you have a PostgreSQL database with the postgis extension installed.

As PostGIS website mentions:

PostGIS extends the capabilities of the PostgreSQL relational database by adding support for storing, indexing, and querying geospatial data.

If you are using the introspect or push commands with the PostGIS extension and don’t want PostGIS tables to be included, you can use extensionsFilters to ignore all the PostGIS tables

Column Types

geometry

Store your geometry data with the rest of your data

For more info please refer to the official PostGIS docs docs.

const items = pgTable('items', {
  geo: geometry('geo', { type: 'point' }),
  geoObj: geometry('geo_obj', { type: 'point', mode: 'xy' }),
  geoSrid: geometry('geo_options', { type: 'point', mode: 'xy', srid: 4000 }),
});

mode

Type geometry has 2 modes for mappings from the database: tuple and xy.

type

The current release has a predefined type: point, which is the geometry(Point) type in the PostgreSQL PostGIS extension. You can specify any string there if you want to use some other type

Indexes

With the available Drizzle indexes API, you should be able to write any indexes for PostGIS

Examples

// CREATE INDEX custom_idx ON table USING GIST (geom);

const table = pgTable('table', {
  	geo: geometry({ type: 'point' }),
}, (table) => [
  index('custom_idx').using('gist', table.geo)
])