.webp)
The announcement went out on a Friday afternoon in June.
No blockbuster deal price. No press conference. No ticker tape. Just a blog post on OpenAI's website — clean, spare, almost clinical — announcing that it had acquired Rockset, a database startup that had raised $105 million from Sequoia, Greylock, and Redpoint, spent seven years building something genuinely new in the database world, and served customers like Walmart, Cisco, Klarna, and Athena Health.
The blog post was short. It said OpenAI would use Rockset to power its retrieval infrastructure — the systems that let its products find and serve relevant information at scale. The Rockset team would join OpenAI. The Rockset product would be sunset. Existing customers were advised to begin migrating off.
In the database world, the community held its breath for a moment, then exhaled into a long, complicated conversation. Most of the technology press treated it as a footnote — a mid-sized acqui-hire, talent grab, a deal in the shadow of billion-dollar rounds and GPT-5 speculation. In the AI world, where ten things happen before breakfast, it passed by in about a news cycle.
But buried inside that announcement was something worth understanding carefully: the most important AI company in the world had decided that its retrieval infrastructure problem — the problem of making AI agents find the right information at the right moment, at massive scale, in milliseconds — was too important to solve with off-the-shelf tools. It had decided this problem was worth acquiring an entire company to fix. A company that had spent seven years solving, from first principles, the exact thing AI now urgently needed.
This is the story of that company. Of the gap it saw, the architecture it built to close it, and the strange, fitting ending it found.
The Scale Nobody Talks About
Before there was Rockset, there was a server room in Menlo Park and a man named Venkat Venkataramani trying to make sure a billion people could see their news feed in under a second.
Venkat joined Facebook in the mid-2000s, during the years when the platform was making the terrifying leap from college social network to global infrastructure. He worked on the infrastructure powering Facebook's ads systems and feed ranking — the plumbing that decided, thousands of times per second, what content each user should see, which ads were relevant, which signals to weight, which data to act on.
This was not casual engineering. Facebook's infrastructure in those years was the subject of engineering papers, conference talks, entire careers. The systems Venkat worked on had to process real-time signals — clicks, impressions, engagement — and combine them with historical data — user profiles, behavioral patterns, content metadata — and do this millions of times per second, with sub-second latency, for one of the most visited properties on the internet.
Running infrastructure at that scale teaches you things that cannot be learned in a classroom or a startup. You learn that databases lie. They promise you they can do everything — transactions, analytics, search — and what they actually give you is trade-offs. You learn that every database design choice is, at its core, a decision about which questions you want to answer fast. You learn, eventually, that the industry has built two completely separate worlds to handle two completely separate types of questions — and that there is a dangerous, expensive gap between them.
The Gap
The two worlds have names. OLTP: Online Transaction Processing. OLAP: Online Analytical Processing. The names are dry, but the distinction is real and enormous.
OLTP systems — MySQL, PostgreSQL, the operational databases — are built for writes. They handle transactions. They're optimized for the kind of work that happens when a user clicks a button: update this row, insert that record, read back this small slice of data. They are fast at individual operations. They are terrible at questions that span millions of rows.
OLAP systems — data warehouses like Snowflake, BigQuery, Redshift — are built for reads. Huge, sweeping analytical queries. They're optimized for the kind of work that happens when a business intelligence team runs a report: scan all transactions from the last six months, aggregate by region and product, compute percentages. They are brilliant at bulk analysis. They are terrible at freshness.
Data warehouses are, almost by design, stale. Data arrives in batches — hourly, daily. If you run a query at 9 AM, you might be looking at data from 8 PM the previous night. For traditional business intelligence, that was fine. For operational decisions that needed to happen in real time — fraud detection, recommendation systems, financial risk scoring, real-time dashboards — it was a fundamental limitation.
The gap between OLTP and OLAP was not new. Engineers had known about it for years. They'd built workarounds: pre-compute queries, maintain caches, run complex ETL pipelines, cobble together Redis and Kafka and Spark in intricate Rube Goldberg arrangements that required teams of engineers to maintain and still, somehow, failed to give them what they actually needed: fresh data, queryable at low latency, at the scale of production traffic.
Venkat had seen this problem from the inside. He had lived it. At Facebook's scale, the consequences of that gap were measured in revenue, in user experience, in infrastructure costs in the tens of millions. He had watched engineers build elaborate workarounds. He had watched the workarounds break. He had watched teams rebuild them.
And he had started to believe that no one had built the right solution because everyone was building in the wrong place.
The Founding Thesis
Venkat Venkataramani left Facebook and founded Rockset in 2016. The company's headquarters went up in San Mateo, California — the same valley corridor where Snowflake had launched a few years earlier to do for batch analytics what Rockset intended to do for real-time.
The founding thesis was precise: there is a category of workloads that neither OLTP nor OLAP serves well. They need fresh data — data updated in seconds, not hours. They need analytical queries — not just point lookups but aggregations, joins, filters across large datasets. They need low latency — response times measured in milliseconds, suitable for serving end users or powering application features. And they need to handle the scale of production traffic — not just a few queries per minute from a BI team but hundreds or thousands per second from applications.
This category had a name: real-time operational analytics. And no existing database system was architected specifically for it.
The Converged Index
The solution Venkat and his team designed was called the converged index. It is the technical core of everything Rockset built, and it is worth explaining carefully because it is genuinely unusual.
Most databases make a choice when they store data. They organize it in rows — great for lookups, terrible for scanning columns across millions of records. Or they organize it in columns — great for analytics, terrible for random access. Some systems have tried to maintain both, but doing so adds complexity and cost, and the tradeoffs remain.
Rockset did something different. When data arrived — from a database stream, a message queue, an object store, a REST API — Rockset automatically indexed it in three ways simultaneously. Columnar indexing, for analytical scan performance. Row indexing, for low-latency point lookups. And inverted indexing, the technique search engines use to make arbitrary text queries fast.
The result: whatever query you wrote, Rockset had an index built for it. No pre-optimization required. No query planning gymnastics. No decisions the developer had to make upfront about what questions they'd want to ask. You put data in, and Rockset made it queryable from every angle.
This had a practical consequence that differentiated Rockset sharply from everything else on the market: you didn't have to know your query patterns in advance. Traditional data warehouses require you to think carefully about partitioning, clustering, materialized views — all techniques that essentially mean "I know I'll ask this kind of question, so I'll pre-arrange my data to answer it." Rockset's converged index meant you could be exploratory. You could ask questions you didn't know you'd need to ask.
The Real-Time Pipeline
The second architectural decision was about freshness. Rockset built native connectors into the streaming data sources that mattered: Kafka, DynamoDB streams, MongoDB change streams, MySQL binlogs, S3, and more. Data flowing through these systems was ingested into Rockset continuously — not in batch jobs that ran hourly, but in near real-time. The lag between a write in the source system and the data being queryable in Rockset was measured in seconds, sometimes sub-second.
For a fraud detection team at a fintech, this meant they could query transactions that happened five seconds ago. For an e-commerce company, it meant inventory levels and pricing data were never more than seconds out of date. For a healthcare company building clinical dashboards, it meant patient data was fresh without requiring a team of data engineers to maintain pipeline infrastructure.
Funding and Traction
Sequoia Capital led Rockset's Series A. Greylock and Redpoint joined subsequent rounds. By 2021, Rockset had closed a $105 million Series B — a large round for a database infrastructure company, a signal of how seriously sophisticated investors took the real-time analytics gap. Total funding reached $105 million.
The customer list that developed over the years reflected the breadth of the real-time analytics use case. Walmart used Rockset to power real-time personalization and inventory intelligence at retail scale. Cisco used it for network analytics, feeding real-time telemetry data into operational dashboards. Klarna, the buy-now-pay-later giant, used it for fraud and risk scoring — exactly the millisecond-latency use case that traditional batch analytics could never serve. Athena Health used it to power clinical data queries for healthcare providers.
These were not small experiments. These were production workloads at companies with billions in revenue, where the alternative — slower data, more engineering toil, less reliable decisions — carried real costs.
The New Problem
In 2023 and into 2024, OpenAI was dealing with a database problem that had no name yet, because the workload that created it was less than two years old.
ChatGPT and the broader suite of OpenAI products had crossed a threshold from demos to critical infrastructure. Millions of users. Enterprise customers with compliance requirements. Products that needed to remember things — user context, conversation history, relevant facts — and retrieve them at inference time, in milliseconds, without hallucinating, and at a scale that made any existing retrieval system look like a prototype.
The retrieval problem in AI is deceptively complex. At its surface, it looks like search: find relevant context and inject it into the model's prompt before it generates a response. This is Retrieval-Augmented Generation, or RAG, and it had become one of the defining architectural patterns of the post-ChatGPT era.
But at OpenAI's scale, the RAG problem had dimensions that ordinary search infrastructure couldn't handle. You needed vector search — finding semantically similar content using embedding vectors, the AI-native form of lookup. But you also needed SQL — structured queries over structured metadata, filtering by date, user, source type, access permissions. And you needed full-text search — keyword matching for cases where semantic similarity misses exact phrases. You needed all three, simultaneously, on the same data, at latencies measured in tens of milliseconds, while handling the concurrent query volume of one of the most-used software products in history.
And you needed the data to be fresh. User context updates in real time. New documents get uploaded. Information changes. An AI agent that can only see data from an hour ago is an agent that will confidently tell users outdated things — and at OpenAI's scale, "outdated things" is not an acceptable product behavior.
Why Rockset
This is where the fit becomes clear, almost uncomfortably so.
Rockset had spent seven years building a database that indexed data in three ways simultaneously — columnar, row, and inverted. The converged index was not designed for AI. It was designed for operational analytics at companies like Walmart and Klarna. But the thing it produced — a database capable of handling vector search, SQL queries, and full-text search on fresh data at low latency and high concurrency — was precisely the thing OpenAI needed to build next-generation memory and retrieval infrastructure.
The Rockset team understood something fundamental about data retrieval that most database engineers don't: that the question "which index should I use?" is the wrong question. The right architecture makes every index available simultaneously and lets the query planner decide. At the scale of AI inference — where millions of different retrieval queries arrive per second, from millions of different users, asking for millions of different types of context — the flexibility of the converged index was not just nice-to-have. It was the difference between a retrieval system that works and one that requires constant engineering intervention.
There was also the team. Venkat's engineers had spent years operating a multi-tenant cloud database at scale — the kind of scale that exposes every edge case, every failure mode, every subtle performance cliff. Building that operational expertise takes years. It cannot be acquired-hired at the engineering talent market. It lives in people who have run the system through incidents, through growth, through the messy unpredictability of real production traffic.
OpenAI needed that team. Not just the technology. The team.
The Migration
For Rockset's existing customers, the acquisition was an operational disruption dressed in corporate announcement language. The blog post was clear: Rockset as a standalone product was ending. Customers had a migration window — months, not weeks — to move their workloads off the platform and onto alternatives.
For some customers, this was a manageable engineering project. For others, those who had built deep integrations, who had taken advantage of specific Rockset capabilities that had no obvious equivalent elsewhere, it was the kind of forced migration that database teams dread. Real-time operational analytics was not a problem with abundant good solutions. The alternatives — standing up Elasticsearch for search, Druid or Pinot for real-time analytics, maintaining separate vector databases for embeddings — required more engineering, more infrastructure cost, and more operational complexity than the unified platform Rockset had provided.
The market noticed. In the months after the announcement, the data infrastructure space began a conversation about what it meant when a key piece of the real-time analytics stack was no longer independently available. ClickHouse saw increased interest. Apache Druid had its moment in analyst write-ups. A handful of newer players offering similar "converged" database capabilities started getting calls from companies that had been comfortable on Rockset and were now shopping.
The Team at OpenAI
Venkat and the Rockset engineering team joined OpenAI and got to work. The public details of what they've built are sparse — OpenAI does not publish architecture papers about its retrieval infrastructure — but the implications are visible in how OpenAI products have evolved.
ChatGPT's memory capabilities, which allow the system to remember facts about users across sessions, require exactly the kind of infrastructure Rockset was designed to provide: real-time writes, fresh reads, low-latency retrieval, at scale. OpenAI's enterprise features, which increasingly involve grounding model responses in customer-specific data — company documents, CRM records, knowledge bases — require retrieval infrastructure that can handle structured and unstructured data simultaneously, with access controls, at production speed.
The Rockset team is likely building something that will never carry the Rockset brand — an internal platform that powers a generation of AI products without a public name, without a company page, without a Crunchbase profile. That is the fate of infrastructure that gets absorbed into larger machines. The thing they built does not disappear. It becomes the thing that makes something bigger possible.
The Irony
There is a certain poetic fitness to where Rockset ended up that is worth pausing on.
Venkat Venkataramani left Facebook's infrastructure — a system designed to retrieve the right content for the right user at the right millisecond, at planetary scale — to build a company that solved the retrieval problem for operational analytics. He had seen, at the most demanding scale in the world, what good data retrieval looked like when the stakes were real. He built a company around that vision. He spent seven years proving it worked.
And then the most consequential computing shift since the web — the large language model revolution — created a retrieval problem that was structurally identical to the one he had spent his career solving, only now the stakes were higher, the scale was larger, and the customer was not Walmart but the company trying to build artificial general intelligence.
There is no clean bow to tie around that. Rockset did not become a public company. Its customers had to migrate. The product is gone. But the architecture it pioneered — the idea that fresh data should be indexable from every angle simultaneously, that retrieval should not require foreknowledge of query patterns, that operational latency and analytical depth are not opposites — is now embedded in the infrastructure of the most important AI company in the world.
That is a strange kind of success. But it is success.
1. The Facebook infrastructure Venkat built was the blueprint, not the inspiration.
Most founding stories frame the prior job as "inspiration" — I saw a problem, I left to solve it. Venkat's story is different. At Facebook, he didn't just see the real-time analytics gap. He built workarounds for it at the most demanding scale on Earth. The Facebook ads and feed infrastructure required fresh data, analytical queries, and millisecond latency simultaneously — and solving that forced him and his team to invent, piecemeal, many of the architectural ideas that Rockset would later systematize. Rockset wasn't inspired by Facebook. It was distilled from Facebook. The converged index was, in a sense, the formal version of things Venkat's team had done informally at planetary scale.
2. The converged index solves a problem that most engineers never consciously identify.
Most engineers are trained to think about indexes as choices: you decide what questions you'll ask, then you index your data to answer those questions efficiently. This seems reasonable until you operate at scale and discover that query patterns change, that new requirements emerge, and that re-indexing production data is expensive, slow, and risky. Rockset's converged index turned this assumption inside out: index everything, in every useful way, by default. The insight is almost obvious in retrospect, but it required an unusual combination of storage engineering depth and practical database operating experience to turn into a production system. Most database teams would have considered the storage overhead of maintaining three indexes simultaneously prohibitive. Rockset bet that cloud storage economics had changed the calculus — and they were right.
3. The OpenAI acquisition was not about replacing a database — it was about hiring a team that had already solved the right problem.
Acqui-hires are common in tech, but they are usually about talent in a domain the acquirer needs to enter. The Rockset acquisition was different in a specific way: the team hadn't just demonstrated capability in a relevant area. They had spent seven years running, at scale, in production, under real customer SLAs, the exact system that OpenAI needed to build. The Rockset engineers had operated multi-tenant cloud database infrastructure with real-time ingestion pipelines, vector search, and SQL support under genuine enterprise workloads. That operational scar tissue — the knowledge that comes from midnight incidents, from performance regressions, from customer escalations — is not a thing you can train into engineers who've never built it. OpenAI wasn't buying potential. It was buying seven years of proven infrastructure experience in the specific domain that AI's memory problem lives in.
4. Rockset's customers were the most dangerous proof of the real-time analytics gap.
The companies that chose Rockset were not startups running experiments. They were Walmart, Cisco, Klarna — organizations with existing data warehouse investments, database engineering teams, and strong reasons to use whatever tool was cheapest or most familiar. The fact that they chose a $105M startup over established players like Elasticsearch, Druid, or BigQuery was not a vote for novelty. It was a vote for a capability those systems could not provide. When you understand what those companies actually used Rockset for — real-time fraud scoring, network analytics at millisecond latency, operational personalization at retail scale — you understand why no amount of BigQuery optimization was going to get them there. The customers were proof that the gap was real, not theoretical.
5. The migration notice Rockset sent its customers was an accidental map of the still-unsolved problem.
When OpenAI announced the acquisition and Rockset told its customers to migrate, the implicit message was: go find an alternative. But there was no obvious alternative. There was no single system that offered fresh real-time ingestion plus SQL analytics plus full-text search plus vector search in a managed cloud database. The migration notice exposed the fact that Rockset had been filling a gap that the rest of the database market had not yet converged on. In 2024, as the notice went out, teams began building the stacks that Rockset had replaced: Kafka for ingestion, ClickHouse or Druid for analytics, Elasticsearch for search, Pinot for hybrid, Weaviate or Pinecone for vectors. Four systems, stitched together with pipelines, doing what Rockset did as one. The market scramble after the acquisition was the most honest acknowledgment the industry gave that what Rockset built had been genuinely useful — and genuinely irreplaceable.
Rockset was founded in 2016. It raised $105 million from Sequoia Capital, Greylock Partners, and Redpoint Ventures. It was acquired by OpenAI in June 2024. The Rockset product was sunset following the acquisition. Venkat Venkataramani and the Rockset team joined OpenAI to build retrieval infrastructure for AI products.