When Catalyst PSA hit 10 million time entries and queries started taking 30+ seconds, we had to get serious about database optimization. After 27 years of database work and optimizing queries across PostgreSQL, SQL Server, MySQL, and MongoDB, here are the techniques that made the biggest impact.
Start with EXPLAIN: Understand Before Optimizing
Never optimize blind. Always start with query execution plans.
PostgreSQL EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT p.id, p.name, COUNT(te.id) as entry_count
FROM projects p
LEFT JOIN time_entries te ON te.project_id = p.id
WHERE p.status = 'active'
GROUP BY p.id, p.name;Output shows:
GroupAggregate (cost=125.45..130.12 rows=150 width=520) (actual time=2543.234..2544.012 rows=148 loops=1)
-> Hash Join (cost=12.50..100.00 rows=500 width=512) (actual time=0.234..2500.123 rows=500000 loops=1)
Hash Cond: (te.project_id = p.id)
-> Seq Scan on time_entries te (cost=0.00..50000.00 rows=10000000 width=16) (actual time=0.012..1500.456 rows=10000000 loops=1)
-> Hash (cost=10.00..10.00 rows=150 width=512) (actual time=0.123..0.123 rows=148 loops=1)
-> Seq Scan on projects p (cost=0.00..10.00 rows=150 width=512) (actual time=0.001..0.050 rows=148 loops=1)
Filter: (status = 'active'::text)
Planning Time: 0.234 ms
Execution Time: 2544.123 ms
Key insights:
- Seq Scan: Full table scan (bad for large tables)
- actual time: Real execution time
- rows: Estimated vs actual rows
- loops: How many times operation ran
The sequential scan on 10M time_entries is killing performance.
Indexing: The Most Important Optimization
Indexes are to databases what table of contents is to books.
Basic Single-Column Index
-- Problem: Slow filtering by status
SELECT * FROM projects WHERE status = 'active';
-- Solution: Index on status
CREATE INDEX idx_projects_status ON projects(status);
-- Result: 2500ms -> 15msComposite Indexes
Order matters! Most selective column first:
-- Query filtering by tenant and status
SELECT * FROM projects
WHERE tenant_id = '123' AND status = 'active';
-- Good: tenant_id first (more selective)
CREATE INDEX idx_projects_tenant_status ON projects(tenant_id, status);
-- Bad: status first (less selective)
CREATE INDEX idx_projects_status_tenant ON projects(status, tenant_id);With good index: 0.5ms. With bad index: 50ms.
Covering Indexes
Include all columns needed by query:
-- Query needs id, name, status
SELECT id, name, status FROM projects
WHERE tenant_id = '123' AND status = 'active';
-- Covering index - no table lookup needed
CREATE INDEX idx_projects_covering ON projects(tenant_id, status) INCLUDE (id, name);
-- Result: 50ms -> 2msPostgreSQL fetches everything from index without touching table.
Partial Indexes
Index only relevant rows:
-- Only query active projects
SELECT * FROM projects WHERE status = 'active';
-- Partial index - smaller, faster
CREATE INDEX idx_projects_active ON projects(tenant_id)
WHERE status = 'active';
-- Index is 80% smaller, queries are fasterIndex on Computed Columns
-- Query on lowercase email
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index on expression
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now query uses indexQuery Optimization Techniques
Avoid SELECT *
Select only needed columns:
-- Bad: Fetches all columns (wasteful)
SELECT * FROM projects;
-- Good: Fetch only what you need
SELECT id, name, status FROM projects;
-- Result: 500ms -> 100ms (5x faster)With 50 columns and millions of rows, this matters.
Use LIMIT
Always limit results:
-- Bad: Returns all 10 million records
SELECT * FROM time_entries ORDER BY created_at DESC;
-- Good: Return first page
SELECT * FROM time_entries ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- For pagination, use cursor-based:
SELECT * FROM time_entries
WHERE created_at < '2024-01-01T00:00:00Z'
ORDER BY created_at DESC
LIMIT 20;Cursor-based pagination is faster than OFFSET for large datasets.
Avoid N+1 Queries
Classic ORM problem:
// Bad: N+1 queries (1 + 100 queries)
const projects = await db.select().from(projects);
for (const project of projects) {
const client = await db.select().from(clients).where(eq(clients.id, project.clientId));
project.client = client;
}
// Good: Single query with join
const projectsWithClients = await db
.select({
project: projects,
client: clients,
})
.from(projects)
.leftJoin(clients, eq(projects.clientId, clients.id));Result: 100 queries -> 1 query. 5000ms -> 50ms.
Batch Operations
Insert/update in batches:
-- Bad: 1000 separate inserts
INSERT INTO time_entries (project_id, hours) VALUES ('123', 8);
INSERT INTO time_entries (project_id, hours) VALUES ('124', 6);
-- ... 998 more
-- Good: Single batch insert
INSERT INTO time_entries (project_id, hours)
VALUES
('123', 8),
('124', 6),
('125', 7),
-- ... all 1000 rows
;
-- Result: 30 seconds -> 0.5 secondsWith ORMs:
// Drizzle batch insert
await db.insert(timeEntries).values(
entries.map(e => ({
projectId: e.projectId,
hours: e.hours,
}))
);Avoid Subqueries When Possible
-- Bad: Correlated subquery (runs for each row)
SELECT p.id, p.name,
(SELECT COUNT(*) FROM time_entries WHERE project_id = p.id) as entry_count
FROM projects p;
-- Good: Join and aggregate
SELECT p.id, p.name, COUNT(te.id) as entry_count
FROM projects p
LEFT JOIN time_entries te ON te.project_id = p.id
GROUP BY p.id, p.name;
-- Result: 8000ms -> 200msJOIN Optimization
Choose the Right JOIN Type
-- INNER JOIN: Only matching rows (fastest)
SELECT * FROM projects p
INNER JOIN clients c ON p.client_id = c.id;
-- LEFT JOIN: All left rows + matching right (slower)
SELECT * FROM projects p
LEFT JOIN clients c ON p.client_id = c.id;
-- Use INNER JOIN when possibleJoin Order Matters
Smaller table first:
-- Bad: 10M rows joined with 1K rows
SELECT * FROM time_entries te
LEFT JOIN projects p ON te.project_id = p.id;
-- Good: 1K rows joined with 10M rows
SELECT * FROM projects p
LEFT JOIN time_entries te ON p.id = te.project_id;Modern query planners optimize this, but explicit ordering helps.
Index Foreign Keys
Always index columns used in joins:
CREATE INDEX idx_time_entries_project_id ON time_entries(project_id);
CREATE INDEX idx_time_entries_user_id ON time_entries(user_id);Without these indexes, joins do full table scans.
Aggregation Optimization
Use Indexed Columns in GROUP BY
-- Query
SELECT client_id, COUNT(*) FROM projects GROUP BY client_id;
-- Index helps grouping
CREATE INDEX idx_projects_client_id ON projects(client_id);
-- Result: 5000ms -> 50msMaterialized Views for Complex Aggregations
Pre-compute expensive aggregations:
-- Create materialized view
CREATE MATERIALIZED VIEW project_stats AS
SELECT
p.id,
p.name,
COUNT(DISTINCT te.id) as total_entries,
SUM(te.hours) as total_hours,
COUNT(DISTINCT te.user_id) as team_size
FROM projects p
LEFT JOIN time_entries te ON te.project_id = p.id
GROUP BY p.id, p.name;
CREATE INDEX idx_project_stats_id ON project_stats(id);
-- Query materialized view (instant)
SELECT * FROM project_stats WHERE id = '123';
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY project_stats;Complex aggregation: 30s -> 0.01s.
Date Range Queries
Use Date Indexes
-- Query by date range
SELECT * FROM time_entries
WHERE entry_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Index on date column
CREATE INDEX idx_time_entries_date ON time_entries(entry_date);
-- For timestamp queries, use BRIN index for large tables
CREATE INDEX idx_time_entries_created_brin ON time_entries USING BRIN (created_at);BRIN indexes are tiny and fast for chronological data.
Partition by Date
For massive tables, partition by date:
CREATE TABLE time_entries (
id UUID,
project_id UUID,
entry_date DATE,
hours DECIMAL
) PARTITION BY RANGE (entry_date);
CREATE TABLE time_entries_2024 PARTITION OF time_entries
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE time_entries_2023 PARTITION OF time_entries
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');Queries only scan relevant partitions. 10M row scan -> 1M row scan.
Full-Text Search
Use PostgreSQL's Built-in FTS
-- Add tsvector column
ALTER TABLE projects ADD COLUMN search_vector tsvector;
-- Generate search vector
UPDATE projects SET search_vector = to_tsvector('english', name || ' ' || description);
-- Index it
CREATE INDEX idx_projects_search ON projects USING GIN(search_vector);
-- Query
SELECT * FROM projects
WHERE search_vector @@ to_tsquery('english', 'website & redesign');
-- Automatic updates
CREATE TRIGGER projects_search_update
BEFORE INSERT OR UPDATE ON projects
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', name, description);Full-text search: 5000ms -> 20ms.
Connection Pooling
Don't create new connections for each query:
// Bad: New connection per query
export async function query(sql: string) {
const client = new Client({ connectionString: process.env.DATABASE_URL });
await client.connect();
const result = await client.query(sql);
await client.end();
return result;
}
// Good: Connection pool
import { Pool } from 'pg';
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // max connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
export async function query(sql: string) {
return await pool.query(sql);
}With Drizzle:
import { drizzle } from 'drizzle-orm/node-postgres';
import { Pool } from 'pg';
const pool = new Pool({ max: 20 });
const db = drizzle(pool);Analyze and Vacuum
PostgreSQL needs maintenance:
-- Update statistics for query planner
ANALYZE projects;
ANALYZE time_entries;
-- Or all tables
ANALYZE;
-- Clean up dead rows
VACUUM time_entries;
-- Full vacuum (locks table)
VACUUM FULL time_entries;
-- Enable auto-vacuum
ALTER TABLE time_entries SET (autovacuum_enabled = true);We run ANALYZE after bulk imports. Without it, query planner makes bad decisions.
Monitoring and Profiling
Find Slow Queries
-- Enable slow query log (postgresql.conf)
log_min_duration_statement = 1000 # Log queries > 1s
-- Or use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;This showed us that a single report query was taking 80% of database time.
Monitor Index Usage
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;We found 15 unused indexes consuming 2GB. Dropped them.
Results
After applying these optimizations to Catalyst PSA:
- Dashboard load: 30s -> 1.5s (20x faster)
- Report generation: 45s -> 3s (15x faster)
- API average response time: 500ms -> 50ms (10x faster)
- Database size: Reduced 20% by removing unused indexes
- Query throughput: 500 qps -> 2000 qps
Lessons Learned
- Index everything used in WHERE/JOIN: Most important optimization
- EXPLAIN is your best friend: Always check execution plans
- **Avoid SELECT ***: Select only needed columns
- Batch operations: Never insert/update one row at a time
- Monitor in production: Find slow queries with logging
- Partition large tables: Especially time-series data
- Connection pooling is essential: Never create connections per query
Conclusion
Database optimization is about understanding how databases work: indexes, query plans, joins, and aggregations. With the right indexes and query patterns, even databases with hundreds of millions of rows can be fast.
After 27 years of database work, I've learned that most performance problems come from missing indexes or N+1 queries. Fix those, and you're 90% of the way there.

Jason Cochran
Sofware Engineer | Cloud Consultant | Founder at Strataga
27 years of experience building enterprise software for oil & gas operators and startups. Specializing in SCADA systems, field data solutions, and AI-powered rapid development. Based in Midland, TX serving the Permian Basin.