Why My Bot Needs a Database: Pool Metadata and TX Templates
The Moment I Realized I Needed a Database
For a long time I told myself my bot did not really need a local database. Everything I cared about was on-chain. Pool addresses, token mints, reserves, fees — all of it lives on Solana, queryable through any RPC endpoint. Why bother duplicating it on my own disk?
The answer arrived the way most painful answers do: through a stopwatch. I instrumented every RPC call my bot made during a single scan loop, totaled them up, and realized I was spending more time talking to a remote endpoint than actually thinking about trades. The bot was technically functional. It was also, charitably, a slow loris with a wallet.
This article is about the moment I gave in and added persistence — not because I wanted to build a database, but because I could not afford to keep being the customer who orders the same coffee at the same Starbucks every morning and watches the barista re-learn the recipe each time.
What "Pool Metadata" Actually Means
Before I justify the database, let me describe what I am storing. On Solana, an automated market maker pool is a specific on-chain account belonging to a DEX program. That account holds a structured blob of data. Some of it is essentially permanent: the pool's address, the two token mints it trades, the vault accounts that hold reserves, the LP token mint, the fee tier, the program ID of the DEX itself. Some of it changes constantly: the current reserve amounts, the active liquidity range, the latest swap price.
This distinction — static versus dynamic — is the entire game. The static fields never change for the life of the pool. The dynamic fields change every block. Treating them the same way is what gets your bot rate-limited.
For every pool I want to trade, I keep a row with the static fields: pool ID, DEX name, token A mint, token B mint, token A vault, token B vault, fee rate, pool type, program ID, LP mint, and a fetched_at timestamp marking when I last reconciled with the chain. Dynamic state lives somewhere else with its own refresh policy.
It sounds obvious written down. It was not obvious to me three weeks ago.
Why I Stopped Asking RPC for the Same Thing Twice
A developer breakdown of how Solana trading bots work in production puts the latency picture bluntly: a public RPC endpoint typically responds in around 200 milliseconds, while a private endpoint sits in the 20 to 50 millisecond range. An infrastructure guide focused on low-latency Solana setups is even sharper on the MEV stakes, framing every extra 30 to 50 milliseconds on Solana as a direct translation into reduced profits.
Meanwhile, a local SQLite read against an indexed table — especially one served out of an in-memory page cache — lands in under a millisecond. That is not a small improvement. That is the difference between a Domino's delivery driver and a guy who already lives in your apartment.
Multiply that delta by every pool I scan per loop. Multiply it by every trade evaluation. Multiply it by the number of times I need a token's decimals just to render a number correctly. The cost compounds the way late fees compound on a credit card you forgot about.
The same infrastructure guide frames the workaround in one line: "To optimize Solana trading bots, cache frequent queries (such as token metadata) locally to avoid throttling." That is the entire thesis of this article in a sentence.
Why SQLite, of All Things
When you say "database" in 2026, people assume Postgres, or some hosted thing with a logo and a free tier. I went the other direction. SQLite. A single file on disk. No server process. No daemon to babysit. No port to leave open by accident.
Liora.io's overview captures why this fits a single-host trading bot perfectly: SQLite is described there as a lightweight engine that demands fewer resources than a non-embedded database and runs comfortably in resource-constrained environments. It is the duct tape of databases, in the best possible sense — the U-Haul of persistence, if U-Hauls came with ACID guarantees.
The practical wins for a bot author:
- Zero setup. No
initdb, no users, no roles, nopg_hba.conf. The file exists or it does not. - One file to back up. I can
cpit. I canscpit. I can stick it in cloud storage like it is a vacation photo. - ACID by default. Sudden crashes do not eat my trade history. The official docs are emphatic about this.
- Lives next to my process. No network hop, no serialization across a wire, no "is the database up?" question to answer at 2 a.m.
A tutorial for building a Telegram-style swap bot makes the same call for the same reasons: "The bot uses a local SQLite database (via better-sqlite3) to store user data securely and ensure persistence across sessions. It's lightweight but powerful — giving the bot instant access to user settings, wallet info, and swap history." Different bot, same logic.
Is SQLite the right answer for every trading workload? No. A separate Medium write-up on TimescaleDB for Solana DEX analytics describes a system targeting more than one million transactions per second of tick ingestion. That is not my world. My world is a few hundred pools, occasional inserts into a trade log, and a metric ton of reads. SQLite was built for exactly that shape of work.
The PRAGMA That Made Everything Tolerable
The first time I ran SQLite under bot load, I was unimpressed. Writes felt sluggish. Concurrent reads stalled. I assumed I had picked the wrong tool.
I had picked the right tool. I had picked the wrong settings.
The single most important configuration change for a bot is enabling WAL mode — Write-Ahead Logging. The official SQLite documentation explains it cleanly: "The original content is preserved in the database file and the changes are appended into a separate WAL file." Instead of writing each change twice through a rollback journal, the engine writes once, sequentially, to a separate log file. Then it folds those changes back into the main database at checkpoint time.
The payoff is the line that mattered most to me: "… Readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently." In a bot that has one thread inserting trades while another fifty are reading pool metadata, that property is not a nice-to-have. It is the whole point.
WAL is not a silver bullet. The docs are honest about the limits: all processes touching the database must be on the same host, so this does not work over a network share. WAL also stops being a win for transactions larger than around 100 megabytes — at that size, the traditional rollback journal becomes faster. Neither limitation applies to my workload. My bot is one process on one machine, and my transactions are tiny.
Alongside WAL, I tuned a handful of other PRAGMAs the way most production guides recommend: synchronous set to NORMAL (safe under WAL, faster than FULL), a generous page cache, temp tables in memory, memory-mapped I/O enabled, and foreign keys enforced. I will not list the exact values because the right numbers depend on your machine and workload. The point is that out-of-the-box SQLite is tuned for correctness over speed. A bot needs both.
TX Templates: The Part Nobody Talks About
Pool metadata is the obvious thing to cache. Transaction templates are the less obvious thing, and they were a bigger win for me.
A Solana swap transaction is not a simple object. It is a structured payload that includes: an ordered list of accounts the instruction will touch, instruction data formatted to the DEX's expectations, a compute budget instruction, a priority fee instruction, optional Address Lookup Table references, a recent blockhash, and a signature. Building this from scratch every time the bot wants to trade is wasteful, because most of the payload does not change between trades against the same pool.
A developer breakdown of how Solana trading bots work in production describes the same shape: when an event is detected, the bot constructs and signs a transaction the instant it can, leaning on prebuilt instructions and tuned priority fees to outpace competing traders. An infrastructure guide is even more direct, naming "… pre-signed transactions (eliminating runtime construction overhead)" as a core performance technique for competitive Solana bots.
The insight is that a transaction template separates the constant parts from the variable parts:
Constant per pool (cache it):
- The account list — pool, vaults, program ID, system program, token program
- The instruction discriminator and data layout for the DEX
- ALT references (a guide on Jupiter notes that "Address lookup table accounts can be fetched, which can be particularly useful for Token Swap instructions, as it allows passing many accounts to the transaction")
- Compute unit limit
- DEX-specific account ordering
Variable per trade (fill at execution):
- Recent blockhash (Solana's blockhash validity window is roughly 60 to 90 seconds, so this has a short shelf life)
- Amount in
- Slippage-adjusted minimum amount out
- Priority fee, tuned to current congestion
The template lives in the database next to the pool metadata. The bot loads it at startup, keeps it in memory, and on each trade swaps in the four variable fields, signs, and ships. The difference between this and rebuilding from scratch is the difference between assembling IKEA furniture once and being handed a finished bookshelf every time you need one.
The Hybrid Cache Pattern
I do not actually read from SQLite during the hot path. I read from a process-memory dictionary that was populated from SQLite at startup. SQLite is the backing store; RAM is the serving layer.
A forum on SQLite has a clean writeup on caching strategies that describes three tiers: pure in-memory, query result caching, and hybrid. The hybrid pattern is the one that fits a bot: "Data survives application restarts" through SQLite persistence, while hot reads stay in memory. The risk of a pure in-memory cache is that a crash or restart wipes it. The risk of pure SQLite-on-the-hot-path is latency, even if it is sub-millisecond. The hybrid splits the difference and gets both properties.
Writes work the other direction. When I discover a new pool or update a token's metadata, I write through to SQLite first, then update the in-memory cache. If the write fails, the in-memory state never updates, and I avoid the situation where the bot believes something that disk does not.
For dynamic fields — reserve amounts, current price — I never trust the SQLite copy as authoritative. Those fields exist in the database only as a debugging aid and a warm-start hint. The real source of truth is the on-chain state, refreshed via a streaming subscription. The same infrastructure guide articulates the principle: "Subscribe only to the accounts and programs your agent needs — specific AMM pools, lending protocol positions, target wallets — filtering at the source eliminates wasted bandwidth and CPU overhead." Stream what changes, cache what does not, never confuse the two.
Schema Discipline: Static, Dynamic, Append-Only
When I sat down to design the schema, I forced myself to put every column in exactly one of three buckets:
Static metadata. Pool addresses, mints, decimals, program IDs, fee tiers. Written rarely. Read constantly. Indexed on the keys I actually look up by.
Dynamic state snapshots. Reserves, prices, liquidity depth, with a
last_updatedtimestamp. Read with awareness that they might be stale. Refreshed by background subscriptions, not by the request path.Append-only logs. Trade history, signal events, error records. Never updated, never deleted in normal operation. New rows only.
A forum post on storing immutable audit records calls out the discipline directly: "Records are never updated. Each row represents a point in time." That mental model — the log table as a stack of timestamped facts that no one is allowed to rewrite — is the closest a relational database gets to the blockchain mindset I already live in. It also makes the audit story trivial.
The production reference I keep going back to is an open-source Python crypto bot. Its public SQL cheat sheet documents a three-table core — trades, orders, trade_custom_data — with foreign keys enforced and is_open boolean flags for fast filtering of active versus closed positions. The schema is older and battle-tested. I did not copy it, but I let its shape influence mine. There is no shame in standing on the shoulders of someone else's debugged migrations.
The Latency Math, in Plain Terms
Here is the rough budget that pushed me over the line. Numbers from the research on public versus private Solana RPC endpoints, plus my own observation of local SQLite reads.
- A public RPC
getAccountInfocall to fetch pool state: roughly 200 milliseconds. - The same call against a private endpoint: roughly 20 to 50 milliseconds.
- A local SQLite indexed read, served from cache: well under one millisecond.
For a scan that touches dozens of pools, this is the difference between a loop that completes in a heartbeat and one that completes in a sigh. And it is before you account for rate limits, which public endpoints enforce aggressively. The first time I hit a 429 in the middle of a promising signal, I understood viscerally why caching is not an optimization. It is a precondition.
The trade-off, of course, is staleness. The on-chain state has moved on by the time my cached copy serves a read. For static fields this does not matter — they have not changed since 2024. For dynamic fields it matters enormously, which is why dynamic data is never cache-first. It is stream-first, with the database as a warm-start aid and a debugging trail.
What the Database Buys Me That Memory Cannot
A bot can run entirely in memory. I ran mine that way for weeks. Once I had persistence, I noticed how much I had been quietly accepting:
- Pool discovery survives restarts. I do not re-scan the chain for the same set of pools every time the process boots. The discovery work is amortized across the lifetime of the database file, not the lifetime of the process.
- Token decimals are always available. Without a cache, every amount calculation can stall on a
getAccountInfoto a mint account. With a cache, decimals are a primary-key lookup that always hits. - TX templates do not have to be rebuilt. The account ordering, the ALT references, the instruction layout — all loaded once, served forever.
- Trade history is real. P&L accounting does not vanish when the process dies. I can ask questions about last week's trades without resurrecting a log file from cold storage.
- Idempotency becomes cheap. Storing a hash of an event I have already acted on, then checking that hash before acting again, is a one-line
INSERT ... ON CONFLICT IGNORE. Without a database, idempotency across restarts is a hand-rolled mess. - Configuration lives somewhere. Per-pool slippage tolerance, per-DEX priority fee multipliers, wallet metadata — all of it gets a home that is not a YAML file I forget to commit.
None of this is glamorous. None of this is the kind of work that makes a good demo. It is the plumbing that lets the demo not embarrass you.
Where SQLite Stops Being the Answer
I am not religious about this. SQLite is the right tool for the bot I am building, at the scale I am operating, on the host I am running. It will not be the right tool forever.
The write ceiling matters. SQLite is generally well-suited to read-heavy workloads with modest write rates, not to high-frequency append paths that hammer the same table thousands of times per second. For a bot that logs every market event into the database, that ceiling is closer than you think. Batching helps. Sampling helps more. If neither helps, the answer is to move the high-frequency stream somewhere else and keep SQLite for the metadata and trade history that actually benefits from a relational shape.
Multi-process access is the other cliff. WAL mode does not work over a network filesystem, and if you ever need multiple machines reading and writing the same database, you have outgrown the tool. The cleanest path is to keep SQLite for what it is good at and add a different store for what it is not.
For anything resembling serious tick-data analytics, the Medium piece on TimescaleDB-backed Solana analytics is a useful sketch of the other end of the spectrum. That world has its own complexity tax. I am not paying it yet.
What Changed for the Bot After This Week
The scan loop tightened. I am not going to share the exact numbers — that is the kind of detail competitors actually use — but the qualitative change was unmistakable. The loop went from "I can read each iteration as it scrolls by" to "I have to add deliberate logging to see what is happening." Rate-limit warnings stopped showing up in the logs. Cold starts became warm starts: the bot wakes up already knowing every pool it has ever cared about, every token mint it has ever priced, every transaction template it has ever assembled.
What I keep coming back to is that none of this required a clever idea. It required taking seriously the gap between "this works" and "this works at the speed Solana actually moves at." The chain does not slow down to wait for you to look up a token's decimals. Solana does not have a mempool, but the leader does not slow down either. You either come to the table with your homework already done, or you do not.
Adding a database is the bot equivalent of doing your homework before class. It is unglamorous. It is also, increasingly, the only way I can stay in the room.
Key Takeaways
- Pool metadata and token decimals are static and should be cached locally — re-fetching them from RPC on every trade is the most common avoidable latency in a Solana bot.
- SQLite in WAL mode gives a single-process bot ACID persistence, concurrent reads, and sub-millisecond lookups with essentially no operational overhead.
- Transaction templates separate constant parts (account list, instruction layout, ALT references) from variable parts (blockhash, amount, priority fee), letting the bot ship trades faster by assembling less at runtime.
- A hybrid cache — in-memory for hot reads, SQLite as the persistent backing store — keeps latency low while making cold starts feel like warm ones.
- Static, dynamic, and append-only data each deserve their own schema discipline: cache aggressively, refresh deliberately, and never rewrite the log.
Disclaimer
This article is for informational and educational purposes only and does not constitute financial, investment, legal, or professional advice. Content is produced independently and supported by advertising revenue. While we strive for accuracy, this article may contain unintentional errors or outdated information. Readers should independently verify all facts and data before making decisions. Company names and trademarks are referenced for analysis purposes under fair use principles. Always consult qualified professionals before making financial or legal decisions.