
Cursor Chat: Architecture, Data Flow & Storage#
This document explains how Cursor chat works end-to-end: what happens when you type a message, where data is saved, whether embeddings are used, and how the codebase index fits in.
1. What Happens When You Type in the Chat Window#
When you send a message in Cursor’s chat (Ask, Agent, Composer, Cmd+K, etc.):
Local UI
The Electron app captures your message, any @-mentions (files, codebase, docs, MCP), rules, and current context (open file, cursor position, etc.).Context assembly
Cursor gathers:- System prompt (role, formatting rules, citation format, etc.)
- User message and optional
<user_query> - Attached context:
<current_file>,<attached_files>,<manually_added_selection>, rules, MCP data - For codebase-aware requests: semantic search results (see §4)
Request to Cursor servers (not directly to LLM)
The request is sent to Cursor’s own backend first. Your API credentials are forwarded there. Cursor does extra processing (auth, logging, prompt assembly, sometimes codebase retrieval) before calling the LLM.LLM call
Cursor forwards the composed request to the configured provider (OpenAI, Anthropic, etc.). The model sees a single prompt (system + user + context). Tool use (file read, search, etc.) can add more context during the turn.Streaming response
The LLM streams tokens back. Cursor displays them in the UI and may run an “apply model” for edits.Local persistence
Once the turn is done, Cursor writes the conversation (messages, metadata) into local SQLite (state.vscdb). Chat content is not stored on Cursor’s servers; it’s local only (see §2).
So: You type → Cursor IDE → Cursor servers → LLM provider → response streamed back → local save to SQLite.
2. Where Chats and Metadata Are Saved#
2.1 Storage locations#
| OS | Base path |
|---|---|
| Windows | %APPDATA%\Cursor\User\ |
| macOS | ~/Library/Application Support/Cursor/User/ |
| Linux | ~/.config/Cursor/User/ |
Two important subdirs:
globalStorage/— app-wide state (settings, UI state, composer metadata like pane list, “last opened” IDs).- Main DB:
globalStorage/state.vscdb
- Main DB:
workspaceStorage/<hash>/— per-workspace state. Each workspace has a hashed folder with its ownstate.vscdb.- Conversation content typically lives here (e.g.
composer.composerData, sometimes legacyworkbench.panel.aichat.view.aichat.chatdata).
- Conversation content typically lives here (e.g.
So: chat content → usually workspaceStorage; sidebar list / UI metadata → globalStorage (and sometimes workspace).
2.2 Database format: state.vscdb#
state.vscdb is a SQLite database. VS Code/Cursor use a generic key–value table:
CREATE TABLE ItemTable (
key TEXT PRIMARY KEY,
value TEXT -- often JSON stored as text
);
key: string identifier (e.g.workbench.panel.aichat.view.aichat.chatdata,composer.composerData,workbench.backgroundComposer.persistentData).value: JSON blob (or other serialized data) for that key.
Settings, workspace state, and chat data all go into ItemTable as key/value pairs.
2.3 Important keys (chat / composer)#
workbench.panel.aichat.view.aichat.chatdata(global or workspace) — Legacy chat data:tabs[],bubbles[](user/assistant messages).composer.composerData(workspace) — Current composer/chat content per workspace.workbench.backgroundComposer.persistentData(global) — Composer UI metadata (dataVersion,lastOpenedBcIds, etc.). Often small; no conversation content.workbench.panel.composerChatViewPane.<id>(global or workspace) — Per-chat/tab UI state (e.g..hidden), mapping to composer data.workbench.panel.composerChatViewPane.<id>.hidden(global) — Visibility for specific panes.
Cursor has moved from aichat to composer. Newer versions use composer keys; old data may still live under aichat keys. The sidebar chat list is built from metadata (e.g. global); when you open a chat, Cursor resolves a “composer data handle” to the workspace-specific composer.composerData (or legacy chatdata). If that resolution fails → “No composer data handle found” and the chat doesn’t load.
2.4 Example: exporting chat-related data#
SELECT key, value
FROM ItemTable
WHERE key IN (
'workbench.panel.aichat.view.aichat.chatdata',
'workbench.backgroundComposer.persistentData'
)
OR key LIKE 'workbench.panel.composer%'
OR key = 'composer.composerData';
Run this against the relevant state.vscdb (global or workspace). Use read-only access if Cursor is open (?mode=ro).
2.5 Metadata#
- Stored in the same DB as chat data:
ItemTable. - Metadata includes: tab IDs, open/closed state, “last opened” composer IDs, workspace–composer mappings, etc.
- No separate “metadata store”; it’s key/value blobs in SQLite.
3. Is There an Embedding Database for Chats?#
Cursor does not ship an embedding DB for your chat history.
- Chat history is stored as JSON in SQLite (
state.vscdb), not in a vector DB. - Cursor does not embed your past chats to answer new ones via semantic search over history.
Community projects exist to vectorize Cursor chat exports (e.g. with LanceDB) for your own RAG over past conversations. That’s outside Cursor’s built-in design.
4. Embedding Database: Codebase Index (RAG)#
Cursor does use an embedding-based RAG pipeline for codebase indexing, not for chat history. It powers “Index & Docs”, @codebase, and code-aware answers.
4.1 High-level flow#
Chunking
Code is split into semantic chunks (functions, classes, logical blocks) using tree-sitter (AST). Chunks respect code structure, not arbitrary character limits.Embeddings + metadata
A custom embedding model produces a vector per chunk. Metadata (e.g. masked file path, line range) is stored with each vector. Paths are obfuscated on the client before upload.Vector store
Vectors + metadata are stored in Turbopuffer (vector + full-text search, cloud-backed). Chunk hashes can be cached (e.g. in AWS) to speed up re-indexing.Semantic search at query time
Your query is embedded with the same model. Cursor searches Turbopuffer, gets metadata (masked path + line ranges) only—not the raw code. The local client de-obfuscates paths and reads the actual code from your machine, then injects those snippets into the LLM context.
So: only embeddings + metadata live in the cloud; source code stays local. Code is sent to the LLM only temporarily, for the specific chunks used in that request.
4.2 Sync and Merkle tree#
- Index sync runs periodically (e.g. every few minutes).
- A Merkle tree of file hashes is used to detect changes and update only affected files.
5. End-to-end architecture (Mermaid)#
5.1 When you send a chat message#
sequenceDiagram
participant User
participant IDE as Cursor IDE Electron
participant Cursor as Cursor Servers
participant Emb as Turbopuffer embeddings
participant LLM as LLM OpenAI Anthropic
User->>+IDE: Send chat message
IDE->>IDE: Assemble context (files, rules, MCP)
IDE->>+Cursor: Request + workspace fingerprint
Cursor->>+Emb: Semantic search
Emb-->>-Cursor: Metadata only (paths, line ranges)
Cursor-->>-IDE: Metadata only, no raw code
IDE->>IDE: Read actual code from local files
IDE->>+Cursor: Full prompt (system + user + context)
Cursor->>+LLM: Composed request
LLM-->>-Cursor: Streamed tokens
Cursor-->>-IDE: Streamed response
IDE->>IDE: Render and save to local SQLite state.vscdb
IDE-->>-User: Display response
5.2 Where data lives#
graph TB
subgraph Local[Your machine]
IDE[Cursor IDE]
FS[Project files]
GLOBAL[globalStorage state.vscdb]
WS[workspaceStorage hash state.vscdb]
end
subgraph CursorCloud[Cursor infrastructure]
API[Cursor API Auth]
TURBO[Turbopuffer]
end
subgraph LLMProviders[LLM providers]
OAI[OpenAI]
ANT[Anthropic]
end
IDE -->|Read/write chat, metadata| GLOBAL
IDE -->|Read/write chat, metadata| WS
IDE -->|Read code for context| FS
IDE <-->|Auth, prompts, proxy to LLM| API
API <-->|Vector search metadata only| TURBO
API -->|Send prompts, receive stream| OAI
API -->|Send prompts, receive stream| ANT
style GLOBAL fill:#e1f5fe,stroke-width:4px
style WS fill:#e1f5fe
style TURBO fill:#fff3e0
- Blue: Local SQLite (
state.vscdb) — chat history + metadata (paths:globalStorage,workspaceStorage/<hash>/). - Orange: Turbopuffer — codebase embeddings + metadata only; no source code.
5.3 Chat storage layout (simplified)#
graph LR
subgraph Global[globalStorage]
PD[persistentData composer UI meta]
Pane[composerChatViewPane pane list]
end
subgraph Workspace[workspaceStorage hash]
CD[composer.composerData conversation content]
AICHAT[aichat.chatdata legacy tabs bubbles]
end
Sidebar[Sidebar chat list] --> Pane
Sidebar --> PD
Open[Open chat] --> Handle[Composer data handle]
Handle --> CD
Handle -.->|legacy| AICHAT
style PD fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
style Pane fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
style CD fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
style AICHAT fill:#fff3e0,stroke:#333,stroke-width:2px,color:#000
6. Summary#
- What happens when I type in chat? — IDE → Cursor servers → LLM. Context (rules, @mentions, codebase search) is assembled; Cursor proxies the request; response is streamed back and saved locally.
- Where are chats saved? — Local SQLite
state.vscdbinglobalStorageand workspaceStorage (composer.composerData, legacyaichat.chatdata). - Where is metadata saved? — Same
ItemTableinstate.vscdb(e.g.workbench.backgroundComposer.persistentData,composerChatViewPane.*). - Is there an embedding DB for chats? — No. Chats are JSON in SQLite. You can add your own (e.g. LanceDB) over exported history.
- Embedding DB for code? — Yes. Codebase index uses Turbopuffer. Embeddings + metadata only; source code stays local.
7. References#
- Cursor chat overview
- Cursor semantic search
- How Cursor indexes your codebase (chunking, Turbopuffer, Merkle tree)
- Reverse-engineering Cursor’s LLM client (request flow, Cursor as proxy)
- Managing Cursor chat history (state.vscdb, ItemTable, export)


Comments: