Named After a Real Duck: The Origin Story of DuckDB

At a research institute on the eastern edge of the city — a quiet, brick-and-glass campus called Science Park — two men are building a database in their spare time. Nobody commissioned it. Nobody funded it. There is no product roadmap, no sales deck, no pitch competition. There is a researcher named Hannes Mühleisen, who lives on a boat in Amsterdam's canals and keeps a duck named Wilbur, and his PhD student Mark Raasveldt, a soft-spoken programmer who builds video game engines as a hobby.
Blog post featured image

Named After a Real Duck: The Origin Story of DuckDB

Generated by Master Biographer | Source for LinkedIn Content


I. THE HOOK: The Database Nobody Asked For

Amsterdam. 2018.

At a research institute on the eastern edge of the city — a quiet, brick-and-glass campus called Science Park — two men are building a database in their spare time.

Nobody commissioned it. Nobody funded it. There is no product roadmap, no sales deck, no pitch competition. There is a researcher named Hannes Mühleisen, who lives on a boat in Amsterdam's canals and keeps a duck named Wilbur, and his PhD student Mark Raasveldt, a soft-spoken programmer who builds video game engines as a hobby. Together, they have a simple, stubborn conviction: that the way data scientists interact with databases is fundamentally broken, and that nobody with the power to fix it is bothering to try.

So they will fix it themselves.

They will name it after the duck.

What happens next is one of the strangest stories in modern software: an academic side project, built evenings and weekends inside a publicly funded research institute, becomes the most talked-about analytical database in the world. No marketing budget. No sales team. No enterprise playbook.

Just code, and a name that makes people smile when they say it.


II. THE BACKSTORY: The Institute Where Python Was Born

To understand DuckDB, you have to understand CWI.

Centrum Wiskunde & Informatica — the Centre for Mathematics and Computer Science — was founded in Amsterdam in 1946. It is the Netherlands' national institute for mathematics and computer science, funded by the government, staffed by researchers who measure success not in quarterly earnings but in ideas that outlast them.

CWI has a habit of producing things that reshape the world.

In 1952, they built the Netherlands' first computer. In 1959, a CWI researcher named Edsger Dijkstra published the shortest path algorithm that still routes traffic, ships packages, and navigates aircraft today. In 1986, CWI registered one of the first country-level internet domains — .nl. In 1988, they established the first European civil internet connection.

And in 1989, during the Christmas holidays, a Dutch programmer named Guido van Rossum sat down at his desk at CWI and started writing a scripting language. He named it after Monty Python's Flying Circus. He called it Python.

This is the institution where Hannes Mühleisen built his career.

The Database Architectures Group

Within CWI, there is a research group dedicated entirely to database systems. It is not a large group. But it has a lineage that any database company in the world would kill for.

In the 1990s and early 2000s, CWI researchers built MonetDB — a column-oriented database that proved, empirically, that storing data by column rather than by row was dramatically faster for analytical queries. MonetDB became the intellectual ancestor of nearly every modern data warehouse.

Then CWI researchers published a paper on vectorized query execution — processing data in batches rather than row by row, matching modern CPU architectures. That paper became the foundation for Vectorwise, which was later commercialized as Actian Vector. The primary author of that paper, Marcin Żukowski, went on to co-found Snowflake.

When Hannes Mühleisen joined this group, he joined a tradition.

The Problem He Couldn't Ignore

Mühleisen was studying the intersection of databases and data science. He wanted to understand how analysts and researchers actually worked with data.

What he found bothered him.

He went to talk to the R community — serious practitioners, statisticians, researchers who worked with data for a living. He expected to find people who loved databases, or at least used them. Instead, he found people who had given up on databases entirely.

They were living in CSV files. They were building homemade data frame engines. They were re-implementing, from scratch, solutions that database researchers had figured out decades ago.

When he asked why, they told him: databases are a pain. You need a server. You need credentials. You need to transfer data in and out. Code that works on your laptop fails when you share it because your colleague has a different database setup. The overhead of the client-server model — designed in the 1980s for a world of mainframes and terminals — was so high that for many workloads, it was faster to just load everything into memory and use Pandas.

There was a gap in the market that nobody had noticed.

SQLite existed — the embedded database used in every iPhone, every Android device, every browser. But SQLite was built for transactions: reading and writing individual rows, one at a time. It was not built for analytics: scanning millions of rows, aggregating, joining across tables. Ask SQLite to compute the average of a billion numbers and it will grind through them one by one, the way a clerk with an abacus might.

There was no SQLite for analytics. There was no database you could embed in a Python or R process, transfer data to at memory speed, and use to run real analytical queries without spinning up a server.

Nobody had built it because, from the outside, the need was not obvious. Data scientists were coping. They were using workarounds. They were not filing bug reports about a database that did not exist.

But Mühleisen could see the gap.


III. THE GRIND: Building in the Dark

The False Start: MonetDBLite

Before DuckDB, there was MonetDBLite.

The logic seemed sound: CWI already had MonetDB, a powerful analytical database. Why not make an embedded version of it?

Mühleisen and Raasveldt tried. They embedded MonetDB into R and Python packages. It worked, sort of. Thousands of people downloaded it every month.

But MonetDB had not been designed to live inside another process. The original codebase assumed it was in charge. It could crash freely. It could modify global state. It had no concept of multiple databases running simultaneously, or being cleanly shut down and restarted, or fitting inside a single file.

Getting MonetDB to behave as a polite embedded library required rewriting it in ways that made the codebase unmaintainable — a fork that drifted further from the original with every commit, accruing debt that would eventually become unpayable.

As Raasveldt later explained on Hacker News: they could either keep struggling with an increasingly unmaintainable fork, or build a new system from scratch — one designed from the ground up to live inside other processes.

The decision to start over was not made lightly.

2018: The Real Beginning

They started building DuckDB in 2018, evenings and weekends alongside their day jobs as researchers.

The design philosophy came first, and it was deliberately radical in its simplicity:

No server. Ever. The database runs inside your process, not beside it.

No installation. The entire system compiles into two files: a header and an implementation.

No external dependencies. You do not need to install anything to use DuckDB.

Column-oriented storage, vectorized execution — processing data in batches of thousands of values at a time, aligned with how modern CPUs actually work.

Full SQL support. Not a subset. Not a dialect. Standard SQL, including window functions, CTEs, complex joins.

The goal was to make the correct behavior so easy that it required no configuration — and the wrong behavior nearly impossible to accidentally achieve.

This is harder than it sounds. SQL is a specification that has grown for decades. Window functions alone took months. Raasveldt, working through a doctoral thesis that would eventually be titled Integrating Analytics with Relational Databases, implemented features one by one, running millions of test queries to verify correctness. At every point in the project's history, passing the test suite was non-negotiable.

The Academic Paper That Nobody Noticed at First

In 2019, Mühleisen and Raasveldt published their first DuckDB paper at SIGMOD — the flagship database research conference. The paper was titled, plainly: DuckDB: An Embeddable Analytical Database.

The contribution it described was precise: a system that matched or exceeded the performance of production analytical databases, while running embedded in a Python or R process with no server, no configuration, and no data transfer overhead.

The paper landed among database researchers with the kind of quiet acknowledgment that serious papers get. It was cited. It was discussed. It did not go viral.

Meanwhile, they made DuckDB open-source on GitHub, under the MIT license. The R community — the same community that had told Mühleisen they hated databases — began testing it.

What happened next surprised even the people who built it.

"Barely Working" Software and the R Community

Mühleisen later reflected on the early testers with visible gratitude.

The R community, he said, was willing to pick up software that was barely working and give honest feedback. This was unusual. Most users of production software would simply abandon a tool that crashed or behaved unexpectedly. R practitioners — many of them statisticians and researchers who understood the difference between a prototype and a product — would file issues, describe the exact conditions that caused failures, and wait for fixes.

"My definition of success," Mühleisen has said, "is not to write papers. It's to have impact."

For a researcher at a publicly funded institute, that is a quietly radical statement.

2020: The PhD Defense

In 2020, Mark Raasveldt defended his thesis at Leiden University. The panel reviewed years of work on database-analytics integration. DuckDB was the capstone — the system that incorporated every lesson, every failed prototype, every insight from MonetDBLite's near-miss.

At the time of his defense, DuckDB was a promising open-source project with growing adoption in the R ecosystem.

Within three years, it would be downloaded ten million times a month.


IV. THE BREAKTHROUGH: When the Internet Discovered the Duck

Hacker News, 2020

The post title was simple: DuckDB: SQLite for Analytics.

It landed on Hacker News in June 2020 and climbed the front page. The comments were not the usual skeptical banter that greets database announcements. They were something rarer: genuine excitement.

People had been waiting for this without knowing they were waiting for it.

The pattern of discovery was consistent across hundreds of comments and later blog posts: a data scientist tries DuckDB for the first time. They have a query that takes 45 seconds in Pandas or requires spinning up a Spark cluster. They run it in DuckDB. It finishes in 2 seconds. On their laptop.

The absence of servers was as important as the speed.

When you use a server-based database, a query is a round trip: your code serializes the request, sends it over a socket, the server executes it, serializes the result, sends it back. For small queries, the overhead is trivial. For analytical queries — the kind that scan millions of rows and return a summary — the overhead of getting data in and out dominates. You optimize the query engine and then lose the gains to serialization.

DuckDB had no serialization overhead. The data lived in the same process. Moving a result from DuckDB into a Pandas DataFrame was not a network operation. It was a pointer.

The Data Scientist's New Best Friend

The Python community followed the R community, and then grew far beyond it.

DuckDB released its Python package. The install command was, predictably: pip install duckdb.

Data scientists discovered they could query Parquet files directly from S3, without loading them into memory first. They could run SQL on top of Pandas DataFrames without copying the data. They could replace complex Pandas chains — the kind where you chain .groupby().agg().reset_index().merge() until nobody can read it — with a single SQL query that ran faster.

The benchmarks circulated freely. DuckDB completing in 3.84 seconds what Pandas needed 19.57 seconds for. DuckDB running analytical queries on 100GB datasets on a single machine that would have required a Spark cluster before.

At 100,000 downloads per week in 2021, Mühleisen still sounded slightly stunned: "In a world where most successful software has been developed in the corporate sector in the USA, it is remarkable that software coming out of the publicly funded research institute CWI is gaining such traction."

He wasn't performing modesty. He genuinely didn't see it coming.


V. THE COMPANY FORMS

July 2021: DuckDB Labs

With download numbers accelerating and enterprises asking about support contracts, Mühleisen and Raasveldt did something their academic peers rarely do: they started a company.

DuckDB Labs B.V. became CWI's 28th spin-off in 2021. Mühleisen as CEO. Raasveldt as CTO.

The governance structure they designed was deliberate and unusual. The commercial company — DuckDB Labs — would earn revenue through support and consulting. But the intellectual property of DuckDB itself would live in the DuckDB Foundation, a non-profit entity whose statutes guarantee, in perpetuity, that DuckDB remains open-source under the MIT license.

They had seen what happened to open-source databases that got acquired or pivoted. They were not going to let that happen to this one.

The team grew slowly and carefully. By 2023, DuckDB Labs had approximately 20 employees — a deliberately small number for a project with this much usage. The philosophy was: do not over-hire, do not add complexity, do not become a company that needs to sell enterprise features to justify its own payroll.


VI. THE STRANGER ARRIVES

April 2022: A Call from Seattle

Jordan Tigani was not someone you'd expect to champion a "scrappy single-node system."

He had spent years at Google as a founding engineer on BigQuery — the hyperscale, distributed query engine that processes petabytes for the world's largest companies. He had built a machine that could throw 10,000 servers at a single query. He knew, better than almost anyone alive, what distributed data infrastructure looked like from the inside.

After Google, he became Chief Product Officer at SingleStore, another enterprise database company. And from that vantage point, he started noticing something that quietly unsettled him.

The vast majority of his customers didn't have big data. They had medium data. Data that fit, comfortably, on a single modern machine. They were running billion-dollar infrastructure for workloads that a laptop could handle if the right software existed.

Then DuckDB appeared on a competitive benchmark. It was not beating SingleStore — not quite — but it was competing. An academic side project with no marketing budget was giving a well-funded commercial database "a run for the money," as Tigani later put it.

He pulled on the thread.

What he found was a system unlike anything he had encountered in the commercial database world. No distribution. No cluster management. No pricing tiers. Just a library, doing analytical SQL faster than systems a hundred times more complex, because it had eliminated all the overhead those systems accepted as necessary.

The insight hit him the way useful insights tend to — obvious in retrospect, invisible until it arrives:

Someone should build a serverless cloud layer around DuckDB.

He rerouted a vacation through Amsterdam.

The Amsterdam Meeting

Tigani met Mühleisen and Raasveldt in person. Five hours later, they were still talking.

Mühleisen was not naive about what it meant when a senior executive from the big data world showed up interested in DuckDB. He had received other offers. He had declined them all, because they wanted DuckDB to become something specific — a tool for one application area, one buyer profile, one market.

"We didn't want to do that," Mühleisen explained later. "We wanted to be more open, more flexible."

But Tigani was different. He wasn't asking DuckDB to become a different thing. He was asking to build a cloud layer on top of what it already was, with DuckDB Labs holding equity in the new company, the open-source project remaining independent, and the two teams maintaining a partnership rather than a merger.

"What made Jordan stand out to me," Mühleisen said, "was his background at SingleStore and BigQuery. It was a big shock that somebody who had this kind of background would consider our scrappy single-node system for something serious."

Tigani named the new company MotherDuck — a name suggested by Lloyd Tabb, founder of Looker, who simply said: "Trust me, it is a good name. And the domain is available."

MotherDuck raised $47.5 million before it was even publicly announced. By September 2023, it raised another $52.5 million at a $400 million valuation — Series B, led by Felicis, with a16z, Redpoint, Amplify, Altimeter, and Zero Prime participating. Total raised: over $100 million.

The man who built BigQuery had bet against his own life's work. And investors agreed with him.


VII. THE AFTERMATH: 10 Million Downloads a Month

2023: The Tipping Point

In May 2023, DuckDB reached 10,000 stars on GitHub.

The team published a blog post to mark the milestone. The tone was almost disbelieving: "When we started working on DuckDB back in 2018, we would have never dreamt of getting this kind of adoption in such a short time."

By 2024, version 1.0.0 shipped — codenamed SnowDuck, a quiet nod to the winter when so much of it had been written. The commit history showed 14,585 commits in 2023 alone, up from 1,621 in 2019. The developer community was not just using DuckDB; they were building on it, extending it, writing tutorials, recording videos, building entire companies on top of it.

By 2025, DuckDB was being downloaded over 10 million times a month. Fortune 500 companies were running it in production. MotherDuck had over 2,000 paying users. DuckDB Labs had grown to a team of two dozen engineers, still based primarily in Amsterdam.

In 2025, Hannes Mühleisen won the Dutch Prize for ICT Research — the Netherlands' highest recognition for applied computer science. The judges cited him for "bridging the gap between data science and database development, enabling new insights through powerful data processing tools, and changing the way data is used and analyzed."

The citation noted DuckDB was in active production use at Fortune 500 companies. It noted DuckDB was downloaded ten million times monthly. It noted that DuckDB had "formed the foundation for innovative startups globally."

Mühleisen accepted the award and went back to Amsterdam. There was more code to write.


VIII. WHY IT WORKED: The Paradox at the Heart of DuckDB

The standard story about successful databases goes like this: a well-funded company hires world-class engineers, builds a product for enterprise buyers, scales a sales team, goes public, becomes infrastructure.

DuckDB is the opposite of this story in almost every dimension.

No funding, until the project was already successful. No sales team. No enterprise roadmap. A research institute as the birthplace. A duck named Wilbur as the mascot.

And yet.

The answer is somewhere in what Mühleisen said about the R community — about practitioners who were willing to use barely-working software and give honest feedback. DuckDB succeeded because it solved a real problem that real people had, in a way that required no convincing. You didn't have to be sold on DuckDB. You had to run one query.

When a data scientist runs a query in Pandas for 45 seconds and then runs the same query in DuckDB for 2 seconds, no sales motion is necessary. The product has made its argument.

There is also something important in the institutional context. CWI, funded by Dutch taxpayers, produced software under the MIT license and gave it to the world. The researchers did not optimize for intellectual property. They optimized for impact — the word Mühleisen returns to constantly when asked about his goals.

The CWI that gave the world Python in 1989 gave the world DuckDB in 2019. Same institution. Same philosophy. Same quiet conviction that publicly funded research should produce publicly owned results.

A duck, it turns out, is a perfect mascot for a system like this. It does not announce itself. It does not perform. It simply arrives, looks comfortable, and turns out to be remarkably fast in the water.


Key Facts for LinkedIn Content

Data Point Value
Founded (first code) 2018
Open-source release 2019
DuckDB Labs founded 2021
GitHub stars milestone (10K) May 2023
GitHub stars milestone (20K) June 2024
GitHub stars milestone (25K) December 2024
Version 1.0.0 released June 3, 2024 (codename: SnowDuck)
Monthly downloads (2025) 10 million+
License MIT (permanent, governed by DuckDB Foundation)
Founders Hannes Mühleisen (CEO) + Mark Raasveldt (CTO)
Institutional birthplace CWI Amsterdam (birthplace of Python, European internet)
MotherDuck total funding $100M+
MotherDuck valuation $400M (Series B, 2023)
Dutch Prize for ICT Research Hannes Mühleisen, 2025

Narrative Angles for LinkedIn Posts

The Duck Named Wilbur — Hannes lived on a boat in Amsterdam's canals. He got a duck because it fit his lifestyle. He named the database after the duck. The database got downloaded 10 million times a month. The duck moved on.

The R Community Paradox — Data scientists with PhDs were living in CSV files because databases were too complicated. This is the insight that launched DuckDB: not that databases were slow, but that they were unreachable.

CWI: The Quiet Factory — The same institute that gave the world Python quietly gave the world DuckDB. No press release. No VC funding. Just researchers who thought the gap should be filled.

The BigQuery Founder's Confession — Jordan Tigani built one of the world's largest distributed query engines. Then he said it was overkill for most use cases. Then he bet $100M on the opposite approach.

2 Seconds vs. 45 Seconds — The entire adoption story of DuckDB lives in this comparison. A tool's quality is demonstrated, not pitched.

The Governance Decision Nobody Talks About — Most database companies eventually change their license. DuckDB's founders transferred IP to a non-profit foundation before it mattered. That decision is why 10 million developers trust it.

The MonetDBLite Ghost — They tried to embed the old system. It fought back. So they started over. The willingness to scrap a working-but-wrong solution is the hidden precondition for every elegant tool.

Ready to see a real-time data integration platform in action? Book a demo with real engineers and discover how Stacksync brings together two-way sync, workflow automation, EDI, managed event queues, and built-in monitoring to keep your CRM, ERP, and databases aligned in real time without batch jobs or brittle integrations.
→  FAQS

Syncing data at scale
across all industries.

a blue checkmark icon
POC from integration engineers
a blue checkmark icon
Two-way, Real-time sync
a blue checkmark icon
Workflow automation
a blue checkmark icon
White-glove onboarding
“We’ve been using Stacksync across 4 different projects and can’t imagine working without it.”

Alex Marinov

VP Technology, Acertus Delivers
Vehicle logistics powered by technology