Projects
Next

What The Link — How I AI-fied My WhatsApp

How I built a WhatsApp-powered bookmark manager with semantic search using Baileys, TanStack Start, Hono, Drizzle, SQLite, and Gemini embeddings.

Have you ever used WhatsApp as a bookmarking tool?

I did. I even have a group with just me where I dump thoughts and links. Blogs. Tweets. Tools. Products. Ideas.

The problem? Finding them again. So I built What The Link.

The Problem

I'm someone who always sends links — products, blogs, brainstorming notes, ideas — everything gets dumped into my WhatsApp personal group for quick access.

But over time it gets flooded. Messy. Hard to search.

You need to remember what you're searching for. There's no semantic search in WhatsApp — and that's the real problem. You have to remember the exact words you used, not what the link was actually about.

So I thought — why not give WhatsApp that ability? AI-ify it. Make it actually smart.

Finding the Right Approach

When the idea first clicked, I thought about piping chats from WhatsApp to Notion. But only paid bot integrations existed. Same story with the WhatsApp Business API — also paid. Not what I needed.

Then I got curious about how OpenClaw handles their WhatsApp integration. That's how I found Baileys — an unofficial WhatsApp socket package where you connect your WhatsApp via QR code and keep the session active. For free.

That was exactly what I wanted.

I didn't go through every file in their big monorepo to understand the codebase. What I did instead was prepend their GitHub link with DeepWiki, which had already indexed the whole repo. I just chatted with it and got the info I needed.

deepwiki.com/[github-repo-url]

Around the same time I got the Claude Max plan for 6 months through Anthropic's open source program. What else do you want, right? I started building, planning, and figuring out the scope.

The Stack

I picked the T3 stack builder to bootstrap the project. The full stack:

  • TanStack Start — full-stack React framework, file-based routing, server functions built in
  • Hono — lightweight backend API layer
  • Drizzle ORM — type-safe, SQL-first. Pairs really well with SQLite
  • SQLite — simple, file-based, no infra overhead for a v1
  • shadcn/ui — for the UI components

The whole thing is type-safe end to end. Drizzle handles the schema and queries, Hono handles the API routes, TanStack Start ties the frontend and backend together. For a solo project it's a really clean setup — not over-engineered, but also not something you'll regret later.

Building It — V1 First

I didn't try to build everything at once.

V1 was just the CRUD and the backend. Baileys listens for incoming WhatsApp messages, extracts any links, and saves them to SQLite via Drizzle. That's it. No AI yet. Just — capture the link, store it, show it in the web UI.

Get the plumbing right first. The smart stuff comes after.

WhatsApp message

Baileys socket picks it up

Extract the link

Save to SQLite via Drizzle

Show in the web UI

Once that was working cleanly, I moved to the part I actually wanted to build.

How the WhatsApp Part Actually Works

Baileys gives you a raw WebSocket connection to WhatsApp Web. You scan a QR code once, and the session gets persisted to disk — so on restart, it just reconnects. No QR again.

But it's not plug-and-play. WhatsApp rejects outdated client versions with a 405. So the server fetches the latest WhatsApp Web version from a public repo on startup. If that fails, it falls back to a hardcoded version. Small detail, but without it — the whole thing breaks silently.

Connection state machine

connection === "open"   → good, listening
connection === "close"  → check why
  └─ 405 → outdated version → refetch, retry in 10s
  └─ 440 → another device took over → logout, clear auth, wait
  └─ anything else → auto-reconnect after 5s

Message pipeline

When a message comes in, it goes through a pipeline:

  1. Ignore your own messages
  2. Filter by allowed group (optional — you can lock it to one group)
  3. Check if it's a command (?help, ?search query)
  4. If not — extract links and save
Layer 1: Regex — fast, free. Grabs any http/https URL from the text.
Layer 2: AI text extraction — if regex finds nothing, ask Gemini to parse the message.
Layer 3: AI vision — if the message has an image (screenshot, photo of a screen,
         QR code), download it and run it through Gemini's vision model.

Most messages hit Layer 1 and move on. But the vision layer is surprisingly useful — people screenshot tweets and product pages all the time. That would've been lost without it.

URLs also get cleaned up before saving. Tracking parameters like utm_source, fbclid, gclid — 21 of them — get stripped. Trailing slashes removed. URLs normalized. No junk in the database.

Reactions

When a link is saved, the bot reacts to the WhatsApp message:

ReactionMeaning
🔖Saved
⚠️Duplicate
📝Saved as a note (no URL found, just text)
Nothing useful in the message

Small touch but it makes it feel alive.

The Database — Keeping It Simple

The whole thing runs on one SQLite file. Bookmarks, tags, settings, embeddings — all in one place.

Bookmarks table

CREATE TABLE bookmarks (
  url           TEXT UNIQUE,
  title         TEXT,
  description   TEXT,
  image         TEXT,
  favicon       TEXT,
  domain        TEXT,
  tags          TEXT,      -- JSON array, denormalized
  source        TEXT,      -- 'whatsapp' | 'manual' | 'import'
  whatsapp_message_id TEXT,
  summary       TEXT,      -- AI-generated
  embedding     BLOB,      -- serialized Float32Array
  metadata_status   TEXT,
  summary_status    TEXT,
  embedding_status  TEXT,
  is_archived   INTEGER DEFAULT 0,
  created_at    TEXT,
  updated_at    TEXT
);

Each bookmark has three independent processing pipelines — metadata, summary, and embedding — each with their own status and retry count. They run independently so a failed summary doesn't block the embedding, and vice versa.

Tags are stored as a JSON array directly on the bookmark. No join table. For this scale it's the right call — simpler queries, simpler code, and SQLite's json_each() handles filtering just fine.

There's also a separate tags table that tracks tag names and usage counts — basically a tag cloud index. And an app_settings key-value table for things like which WhatsApp group to listen to and digest preferences.

No foreign keys. No complex relationships. One file, easy to back up, easy to move.

Then I AI-fied It

After V1 was solid, I added OpenRouter using the OpenAI SDK — so I'm hitting any model I want through one unified API without being locked into one provider.

The AI layer is a singleton client — it only initializes if you set the OPENROUTER_API_KEY. No key? No AI. The app still works, you just get basic keyword search instead of semantic search. Graceful degradation.

When a bookmark is saved, three things happen asynchronously — fire-and-forget style. The user sees the bookmark immediately. The AI stuff runs in the background:

1. Metadata fetch

Cheerio crawls the page — grabs the title, description, OG image, favicon. If the crawl gets blocked (some sites do), it falls back to AI inference from just the URL structure. Like instagram.com/p/XYZ — even without crawling, you know it's an Instagram post.

2. Summary + Tags

The page content gets sent to Gemini via OpenRouter. Two parallel prompts:

  • Summary: "Summarize in 4-5 concise lines." Capped at 2000 characters.
  • Tags: "Suggest 2-4 short lowercase tags." Returns a JSON array.

Tags only auto-generate if the user didn't include hashtags in the original message. If you send #design #tools https://some-link.com, those hashtags become the tags and the AI doesn't override them.

3. Embedding

Once the summary exists, the app builds an embedding text by concatenating:

Title | Tags: tag1, tag2 | Full summary | Description (truncated to 500 chars)

That combined text gets sent to gemini-embedding-001 via OpenRouter. The embedding comes back as a Float32Array, gets serialized to a binary blob, and stored directly in the SQLite embedding column.

No separate vector database. No Pinecone. No Qdrant. Just a BLOB column in the same SQLite file.

The Retry Machine

AI APIs fail. Rate limits hit. Pages block crawlers. So every pipeline has a retry system.

Three cron jobs run every 5 minutes:

JobWhat it does
Summary RetryPicks up pending or failed summaries (batch of 50), generates summary + auto-tags, chains to embedding job on success
Embedding RetryPicks up pending or failed embeddings (batch of 50), generates and caches embedding
Metadata RetryRetries failed page crawls

Each bookmark gets 3 retries max per pipeline. After that it's marked failed and left alone — you can manually retry from the UI.

Rate limits get special treatment. When the app hits a 429 from OpenRouter, it pauses the entire batch for 60 seconds, then resumes. No aggressive retries hammering the API.

On startup, it also runs an immediate pass — processes all pending items before the cron kicks in. So after a deploy or restart, everything catches up fast.

How Semantic Search Actually Works

This is the part that makes the whole thing worth building.

The approach: in-memory embedding cache. On startup, the server loads every bookmark's embedding into memory as Float32Array objects. When you search, it:

  1. Generates an embedding for your query (one API call)
  2. Computes cosine similarity against every cached embedding
  3. Filters results above a 0.55 threshold
  4. Returns top 50, sorted by score

Cosine similarity is just dot product divided by the product of magnitudes. Three lines of math on Float32Array. For a few thousand bookmarks, it runs in about 2ms. No need for a vector index.

If AI isn't configured or the cache is empty, search falls back to SQL LIKE queries across title, description, URL, summary, and even inside the JSON tags array using json_each(). Not as smart, but it works.

Search from WhatsApp

You can search directly from WhatsApp too:

CommandAction
?indie hackersSemantic search
?#designTag filter
?recent 10Last 10 bookmarks

Results come back as WhatsApp messages with clickable links. So you don't even need to open the web UI.

The Web UI

In January I came across Zaid Alam's bookmarking tool. He built it for his own use case. I loved the minimal UI so I took some inspiration — and a screenshot — and prompted it straight into Claude Code.

It's not open source so I couldn't clone it. So I just fired more tokens and built the web UI myself using shadcn components.

The frontend is a TanStack Start app — SSR on first load, then SPA navigation. Two routes:

  • Home — the bookmark library. Search bar, tag filters, pagination (25 or 50 per page), data grid with title, domain, date, and actions. Keyboard shortcuts for everything: Cmd+K to focus search, arrow keys to navigate, Delete to remove.
  • Settings — WhatsApp QR code display, group selection dropdown, daily digest toggle with hour picker. Connect, disconnect, reconnect — all from the browser.

Auth is simple. Single password. Cookie-based session that lasts 30 days. Rate limited to 10 failed attempts per IP per 15 minutes. No OAuth, no user management. It's a personal tool.

Deployment

The whole thing ships as a Docker image. Multi-stage build — builder compiles everything, runner is Alpine with just the compiled output.

docker-compose up -d

One volume mount for /data — that's where the SQLite database and the WhatsApp auth session live. Survives restarts, easy to backup, easy to migrate.

There's also a one-liner install script for VPS setups. It checks for Docker, asks for your password, builds the image, and starts the container. I'm running mine on Oracle Cloud's Always Free ARM instance — 24GB RAM for a SQLite app is hilariously overkill but hey, it's free.

Health check pings /health every 30 seconds. Container auto-restarts unless you explicitly stop it.

Architecture at a Glance

┌──────────────────────────────────────────────────────────────────┐
│                          WhatsApp                                │
│                     (Baileys Socket)                             │
│    QR Auth ←→ Session Persistence ←→ Message Listener           │
└──────────────┬───────────────────────────────────────────────────┘
               │ messages.upsert

┌──────────────────────────────────────────────────────────────────┐
│                    Link Extraction Pipeline                      │
│         Regex → AI Text → AI Vision (3-layer fallback)          │
└──────────────┬───────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────────┐
│                     SQLite (via Drizzle)                         │
│              bookmarks │ tags │ app_settings                     │
│                    Single .db file                                │
└──────────┬────────────────────────────────┬──────────────────────┘
           │                                │
           ▼                                ▼
┌─────────────────────────┐   ┌─────────────────────────────────┐
│   AI Processing (Async) │   │        Hono API Server          │
│  ┌───────────────────┐  │   │  /api/bookmarks  (CRUD+search)  │
│  │ Metadata (Cheerio) │  │   │  /api/whatsapp   (QR+status)   │
│  │ Summary (Gemini)  │  │   │  /api/settings   (config)       │
│  │ Tags (Gemini)     │  │   │  /api/login      (auth)         │
│  │ Embedding (Gemini)│  │   └──────────┬──────────────────────┘
│  └───────────────────┘  │              │
│  Retry cron: */5 * * *  │              ▼
└─────────────────────────┘   ┌─────────────────────────────────┐
                              │     TanStack Start (React)       │
           ┌──────────────────│  SSR + SPA │ shadcn/ui           │
           │                  │  Search │ Settings │ Auth         │
           ▼                  └─────────────────────────────────┘
┌─────────────────────────┐
│  In-Memory Embedding    │
│  Cache (Float32Array)   │
│  Cosine Similarity      │
│  ~2ms search time       │
└─────────────────────────┘

That's the story. Built out of frustration, unlocked by one good insight, and shipped with a lot of tokens.