Engineering Philosophy: Fei-Fei Li, Data Is the Foundation

Fei-Fei Li, creator of ImageNet and pioneer of human-centered AI

Key Takeaways

  • The dataset was the breakthrough, not the model. For a decade the field tuned algorithms while assuming better models were the bottleneck. Fei-Fei Li made the opposite bet: that the missing ingredient was data at scale. She conceived and led ImageNet – roughly 14 million hand-labeled images across more than 20,000 categories, organized on the WordNet hierarchy and annotated by tens of thousands of Amazon Mechanical Turk workers.34
  • AlexNet won on her data. When a deep convolutional network crushed the 2012 ImageNet challenge – 15.3% top-5 error, more than ten points ahead of the runner-up – it validated the data-centric thesis. The algorithm had existed for years; what changed was that it finally had enough of the right data to learn from.45
  • AI must be human-centered. Li’s second principle is that there is “nothing artificial” about AI: it is built by humans, behaves toward humans, and impacts human lives. She co-founded Stanford HAI (2019) and AI4ALL (2017) to make that conviction institutional.678
  • From immigrant to founder. Born in Beijing in 1976, she arrived in New Jersey at sixteen, ran her family’s dry-cleaning shop through Princeton, took a physics BA and a Caltech PhD, and now leads World Labs, a spatial-intelligence startup building models that understand the 3D world.1210

The Principle

“Our hypothesis of A.I. needs to be data-driven, and data-centric was the right hypothesis.” – Fei-Fei Li, on the bet behind ImageNet9

For most of the 2000s, the dominant instinct in machine learning was to improve the model: a cleverer architecture, a better optimizer, a sharper feature extractor. The data was treated as a fixed, modest backdrop against which algorithms competed. Fei-Fei Li’s central move was to invert that hierarchy. She argued that the algorithms were not really the bottleneck – the data was. The way to advance machine perception was not to keep polishing the model on a few thousand examples, but to give it orders of magnitude more of the right examples and let it learn the world the way a child does: by encountering enough of it.9

The analogy was not decorative; it was the argument. No one teaches a child to see by enumerating rules. A child learns by being immersed in a torrent of visual experience – millions of glimpses of objects, scenes, faces – until the structure of the visual world settles into place. Li’s wager was that a learning algorithm needed the same thing: not a better teacher, but a vastly larger and richer stream of examples. So she did the unglamorous, enormous thing the field had avoided. She built the stream.3

That is the principle in one line: data is the foundation, and AI must be human-centered. The first half is an engineering claim – intelligence emerges from learning on enough of the right data, and whoever supplies that data shapes what the field can do. The second half is a claim about purpose. Li insists that AI is not some alien force; it is a human artifact, and its only justification is that it benefits people. “There’s nothing artificial about AI,” she likes to tell her students; “it’s made by humans, it’s intended to behave [for] humans, and it impacts humans’ lives and human society.”6 She gave the field its eyes by building a dataset, not a model – and then spent the next decade insisting those eyes stay pointed at human good.

Context

Fei-Fei Li was born in Beijing in 1976.1 Her father emigrated to Parsippany, New Jersey when she was twelve; she and her mother followed when she was sixteen, arriving with almost no English.12 The family’s American foothold was a dry-cleaning shop, and Li worked it – weekends through high school, and most weekends home from college, helping run the business and keeping the books while she studied.12 It is a detail worth holding onto: the person who would later marshal one of the largest labeling operations in the history of computing learned operational discipline at a counter, processing other people’s laundry.

She went to Princeton on scholarship and took a Bachelor of Arts in physics in 1999, running the family business remotely for part of that time.12 Physics, not computer science – a training in looking for the simple law beneath messy phenomena, which is precisely the instinct she would later apply to vision. She moved to Caltech for graduate school, earning a master’s in electrical engineering in 2001 and a PhD in 2005, working at the intersection of neuroscience and computer vision.1

By the time she joined Princeton’s faculty and then Stanford (2009), she had absorbed a conviction that ran against the grain of her field: that the path forward was not a better model on a small dataset but a radically larger dataset that no one had been willing to build.14 Everything that follows – ImageNet, the challenge, the human-centered turn, the startup – is the working-out of that one contrarian bet.

The Work

ImageNet: giving the field its eyes (2009)

The cleanest way to feel Li’s bet is to watch what happens when you hold a model fixed and grow its training data. The classifier does not get cleverer – the algorithm is unchanged – but its picture of the world sharpens with every labeled example you feed it. The widget below makes that tangible: start with a handful of points and the same model guesses a crude, wrong boundary; add more and more labeled data and it traces the true shape, its accuracy climbing toward the ceiling. This is the ImageNet thesis in miniature.

The history is concrete, and the attribution matters. Li conceived the project and began work on the idea around 2006-2007, while at Princeton, collaborating with WordNet co-creator Christiane Fellbaum to use WordNet’s hierarchy of concepts as the dataset’s organizing backbone.4 ImageNet was emphatically a team effort that she led: the landmark paper, “ImageNet: A Large-Scale Hierarchical Image Database,” presented at CVPR 2009, was authored by Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li (listed last, as the senior author).3 The version of the dataset that reshaped the field grew to roughly 14 million hand-annotated images across more than 20,000 categories – the full ImageNet-21K release comprises 14,197,122 images in 21,841 classes.4

The engineering problem was not the model; it was the labeling. Hand-annotating fourteen million images was beyond any research group, so Li’s team turned to Amazon Mechanical Turk, distributing the work across roughly 49,000 workers in 167 countries between July 2008 and April 2010, filtering more than 160 million candidate images and labeling each retained image multiple times for quality.4 That logistical feat – crowdsourcing perception at planetary scale – was the contribution. Anyone could have proposed “use more data.” Li built the apparatus that made it real.

Fei-Fei Li speaking

ILSVRC and the 2012 AlexNet validation

A dataset alone proves nothing; you have to make the field use it. So from 2010 through 2017, Li’s group ran the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) – an annual competition on a standardized 1,000-category subset with about 1.28 million training images, 50,000 validation images, and 100,000 test images.4 The challenge turned ImageNet into a shared benchmark and a leaderboard, the gravitational center of computer-vision research for most of a decade.

The validation arrived on September 30, 2012. A deep convolutional neural network – AlexNet, from Geoffrey Hinton’s lab in Toronto – won the challenge with a 15.3% top-5 error rate, more than 10.8 percentage points ahead of the runner-up.45 That margin is the hinge of the modern AI era. But the often-missed point is why it was possible: the convolutional architecture was not new, and gradient-based training was not new. What was new was that, for the first time, there was a dataset large and rich enough for a deep network to learn from without simply memorizing. Li had supplied the missing ingredient. As her own student would later quantify, even human accuracy on ImageNet is hard-won – Andrej Karpathy, who did his PhD under Li and ran the challenge’s human-benchmark experiments, estimated a top-5 human error around 5.1% only with concentrated effort.5 The machines were now closing on a bar that taxes people.

This is the cleanest illustration of the data-centric thesis in the whole series. The algorithm that won belonged to Geoffrey Hinton’s lineage, and the convolutional architecture descends from Yann LeCun’s LeNet. But neither would have won in 2012 without Li’s data. The model and the data are two halves of one breakthrough – and for a decade the field had been holding the wrong half fixed.

Human-centered AI and Stanford HAI

Having handed the field a more powerful eye, Li spent the next phase of her career worrying about where it was pointed. Her framing is deceptively simple: AI is not an external force that happens to humanity; it is a human creation whose purpose is human benefit. “I often tell my students not to be misled by the name ‘artificial intelligence’ – there is nothing artificial about it,” she has written. “A.I. is made by humans, intended to behave by humans and, ultimately, to impact humans’ lives and human society.”6

In March 2019 she made that conviction institutional, co-founding the Stanford Institute for Human-Centered Artificial Intelligence (HAI) with philosopher and former Stanford provost John Etchemendy, serving as its founding co-director.7 HAI’s mission – to advance AI research, education, policy, and practice to improve the human condition – is the deliberate counterweight to a field that often optimizes capability in isolation. It is the same instinct that drives me to treat taste as a technical system rather than a vibe: the question of what the work is for is not soft, and it does not come after the engineering. It is part of the engineering.

Fei-Fei Li

AI4ALL, Google Cloud, The Worlds I See, and World Labs

Li’s career is unusually broad for a researcher of her stature, and each chapter expresses the same two principles. In 2017 she co-founded AI4ALL, a nonprofit working to increase diversity in AI – a direct application of “human-centered” to the question of who builds the systems.8 On sabbatical from Stanford between January 2017 and the fall of 2018, she served as Chief Scientist of AI/ML and Vice President at Google Cloud, carrying the research-to-product translation into industry.1

In 2023 she published a memoir, “The Worlds I See,” which Barack Obama named to his reading list and the Financial Times listed among its best books of the year – “half memoir, half science,” the story of the immigrant scientist and the data-centric bet in one volume.9 And in September 2024 she co-founded World Labs, a spatial-intelligence startup – with Justin Johnson, Christoph Lassner, and Ben Mildenhall – building foundation “world models” that perceive, generate, and reason about the 3D world, backed by roughly $1 billion in funding.10 The throughline is exact: ImageNet gave machines static vision; World Labs is her bet on giving them spatial understanding – the next, harder kind of seeing.

The Method

Li’s method is consistent from the dry-cleaning counter to the spatial-intelligence lab: find the constraint everyone is ignoring, build the unglamorous thing that removes it, and keep the work pointed at people.

Attack the data, not just the model. When a field is stuck, ask whether the bottleneck is really the algorithm or whether it is the data the algorithm has to learn from. Li’s defining move was to suspect the data – and then to build, at enormous logistical cost, the dataset that proved it.34

Borrow structure from a domain that already solved it. ImageNet did not invent its own taxonomy; it stood on WordNet’s hierarchy of human concepts. When a known structure fits the problem, encode it rather than reinvent it – the same instinct behind LeCun’s baking translation invariance into convolution.34

Make the field use your work. A dataset in a drawer changes nothing. The ILSVRC challenge turned ImageNet into a shared benchmark with a leaderboard, which is what actually moved the research community.4

Crowdsource at the scale the problem demands. Labeling fourteen million images was impossible for one lab, so Li built an annotation pipeline across tens of thousands of workers. The operational solution was the scientific contribution.4

Keep asking what it is for. Capability without a human purpose is, in Li’s view, incomplete engineering. HAI and AI4ALL are not philanthropy bolted onto the research; they are the method extended – the Steve test of whether the work deserves to exist, applied to a whole field.678

Influence Chain

Who Shaped Her

Christiane Fellbaum and WordNet. ImageNet’s organizing backbone is WordNet’s hierarchy of concepts, and Li built the dataset in direct collaboration with WordNet co-creator Christiane Fellbaum. The taxonomy of human language became the skeleton of machine vision. (Direct influence)

The cognitive-science view of learning. Li’s central analogy – that a model should learn vision the way a child does, by exposure to enough of the world – comes from her training at the seam of neuroscience and computer vision at Caltech. The bet on data over rules is a bet on how biological perception actually develops. (Formative influence)

A physicist’s instinct. Her undergraduate training was in physics, not CS – a discipline of finding the simple structure beneath messy phenomena. The willingness to believe that “more of the right data” was a law of the field, not a brute-force hack, is a physicist’s kind of confidence. (Formative influence)

Who She Shaped

Modern computer vision. Every vision system trained or pre-trained on ImageNet – which is, effectively, all of them for a decade – inherits Li’s data. She did not just contribute to the field; she supplied the substrate it learned on.

The deep-learning era itself. AlexNet’s 2012 win, the event most often cited as the start of the modern AI boom, ran on her dataset and inside her challenge. The data-centric half of that breakthrough is hers.

A generation of researchers. Through her Stanford lab she advised students who became central figures in their own right, including Andrej Karpathy, and through AI4ALL she has worked to widen who gets to build the field at all.

The Throughline

Li is the data root of this series’ deep-learning branch, and the connection to her neighbors is unusually literal. Geoffrey Hinton’s lab built the algorithm that won ImageNet in 2012, and Yann LeCun designed the convolutional architecture that algorithm descends from – but AlexNet ran on Li’s data. The model and the dataset are two halves of one event, and for years the field had been polishing the model while the dataset was the thing that didn’t exist yet. The forward line runs to Andrej Karpathy, her own PhD student, who ran the ILSVRC human-accuracy benchmark and later coined “Software 2.0” – the idea of a network as a program compiled from data, which is the natural generalization of Li’s wager that the data, not the code, is where the intelligence comes from. LeCun says learn the world; Hinton says the learning machine works; Li says: here is the world to learn from, now go. (Series bridge)

What I Take From This

The lesson I keep from Li is that the unglamorous foundation is often the actual breakthrough. The field spent a decade competing on models because models are where the cleverness feels like it lives – and the person who moved the field furthest did so by building a dataset, an act of operational endurance more than algorithmic insight. That reorders my instincts. When something is stuck, I now ask first whether the bottleneck is the clever part I want to work on or the boring part I am avoiding – the data, the labeling, the foundation no one wants to build. It is the same discipline as the evidence gate: not “what is the most interesting thing to optimize,” but “what is actually the constraint.”

The second lesson is harder and quieter. Li gave the field a genuinely more powerful capability and then spent the next decade insisting it stay accountable to people – founding HAI, founding AI4ALL, repeating that there is nothing artificial about a thing humans make for humans. That is not a coda to the engineering; it is the engineering’s reason for existing, which is exactly why I hold that quality is the only variable and that the Steve test – does this deserve to exist? – is a question you ask of capability itself, not just of polish. Li built the eyes and then made sure they stayed pointed at human good. The foundation, and what the foundation is for.

FAQ

What is Fei-Fei Li’s engineering philosophy?

Data is the foundation, and AI must be human-centered. Li’s defining bet was that the bottleneck in machine perception was not the model but the data – that intelligence emerges from learning on enough of the right examples, the way a child learns to see by experience. She acted on it by conceiving and leading ImageNet, a roughly 14-million-image labeled dataset.34 Her second principle is that AI is a human artifact for human benefit – “there is nothing artificial about it” – which she institutionalized through Stanford HAI and AI4ALL.678

What is ImageNet, and did Fei-Fei Li build it alone?

ImageNet is a large-scale labeled image database that became the foundational training set for modern computer vision – roughly 14 million hand-annotated images across more than 20,000 categories, organized on the WordNet concept hierarchy and labeled via Amazon Mechanical Turk.4 Li conceived and led the project, but it was a team effort: the 2009 CVPR paper “ImageNet: A Large-Scale Hierarchical Image Database” was authored by Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li (as senior author), and the WordNet backbone came from a collaboration with WordNet co-creator Christiane Fellbaum.34

How did ImageNet lead to the deep-learning boom?

From 2010 to 2017, Li’s group ran the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual competition on a 1,000-category subset.4 On September 30, 2012, the deep convolutional network AlexNet won with a 15.3% top-5 error rate – more than 10.8 points ahead of the runner-up – an event widely treated as the start of the modern AI era.45 The decisive point is that the convolutional algorithm already existed; what was new was a dataset large enough for it to learn from. The data, not the model, was the missing ingredient.4

What is Fei-Fei Li doing now?

After co-founding Stanford HAI (2019) and AI4ALL (2017), serving as Chief Scientist of AI/ML at Google Cloud (2017-2018), and publishing her 2023 memoir “The Worlds I See,” Li co-founded World Labs in September 2024.1789 World Labs is a spatial-intelligence startup building foundation “world models” that perceive, generate, and reason about the 3D world – her bet on the next frontier of machine perception, backed by about $1 billion in funding.10


Sources


  1. “Fei-Fei Li,” Wikipedia. Born July 3, 1976, in Beijing, China; father immigrated to Parsippany, New Jersey when she was 12, she joined at 16; family ran a dry-cleaning business; Princeton BA in physics (1999); Caltech MS in electrical engineering (2001) and PhD (2005); Stanford professor from 2009; director of the Stanford AI Lab (2013-2018); Chief Scientist of AI/ML and Vice President at Google Cloud on sabbatical from January 2017 to fall 2018; founding co-director of Stanford HAI; co-founder of AI4ALL (2017); founded World Labs (2024). 

  2. Jane Thier, “She ran her parents’ dry-cleaning business at 18. Today, the ‘godmother of AI’ is advising world leaders and running a billion-dollar startup,” Fortune, November 24, 2025. Details Li’s immigration to New Jersey, her work in the family dry-cleaning shop through high school and college, and her trajectory from Princeton to World Labs. 

  3. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, June 20-25, 2009, doi:10.1109/CVPR.2009.5206848. The paper introducing ImageNet, built on the WordNet hierarchy and populated using Amazon Mechanical Turk; Fei-Fei Li is the senior (last) author. Author list and publication details also documented at Scientific Research Publishing

  4. “ImageNet,” Wikipedia. ImageNet contains more than 14 million hand-annotated images across more than 20,000 categories; the full ImageNet-21K release comprises 14,197,122 images in 21,841 classes. Fei-Fei Li began the idea in 2006 and in 2007 collaborated with WordNet co-creator Christiane Fellbaum. Labeling ran July 2008 to April 2010 via Amazon Mechanical Turk, with roughly 49,000 workers in 167 countries filtering over 160 million candidate images. The ILSVRC ran annually 2010-2017 on a 1,000-category subset (1,281,167 training, 50,000 validation, 100,000 test images). On September 30, 2012, AlexNet won with a 15.3% top-5 error, more than 10.8 points ahead of the runner-up. 

  5. “AlexNet,” Wikipedia, on AlexNet’s 2012 ILSVRC win and the convergence of large-scale labeled data, GPU computing, and improved training methods. On the human benchmark, see Andrej Karpathy’s role as a Stanford PhD student under Fei-Fei Li running the ILSVRC human-accuracy experiments (estimating a ~5.1% top-5 human error with concentrated effort), documented at “Andrej Karpathy,” AI Wiki, and his Stanford page, cs.stanford.edu/people/karpathy

  6. Fei-Fei Li, post on X, February 2018: “I often tell my students not to be misled by the name ‘artificial intelligence’ – there is nothing artificial about it. A.I. is made by humans, intended to behave by humans and, ultimately, to impact humans’ lives and human society.” (X requires authentication for automated retrieval; the quotation is widely reproduced, including in coverage of her human-centered-AI advocacy. See also her remarks at the Axios AI+ Summit, “AI pioneer Fei-Fei Li: Give scientists more access to advanced AI models,” Axios, November 9, 2023, that there is “nothing artificial” about AI.) 

  7. “Stanford University launches the Institute for Human-Centered Artificial Intelligence,” Stanford Report, March 18, 2019. Stanford HAI launched in March 2019, co-directed by Fei-Fei Li (professor of computer science, former director of the Stanford AI Lab) and John Etchemendy (philosopher and former provost), with a mission to advance AI research, education, policy, and practice to improve the human condition. 

  8. On AI4ALL: “Fei-Fei Li,” Wikipedia. In 2017 Li co-founded AI4ALL, a nonprofit working to increase diversity and inclusion in the field of artificial intelligence. 

  9. Fei-Fei Li, The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI (Flatiron Books, 2023). On the data-centric thesis, see also “‘Godmother of A.I.’ Fei-Fei Li on technology development,” CBS News, and the NPR coverage, “Fei-Fei Li’s memoir ponders artificial intelligence ethics,” NPR, November 10, 2023. Li’s framing: “Our hypothesis of A.I. needs to be data-driven, and data-centric was the right hypothesis.” The book was named to Barack Obama’s reading list and a Financial Times best book of 2023. 

  10. “About,” World Labs. World Labs, founded in 2024 by Fei-Fei Li with Justin Johnson, Christoph Lassner, and Ben Mildenhall, is a spatial-intelligence company building foundation world models that perceive, generate, reason, and interact with the 3D world. On its ~$1 billion in funding, see “Fei-Fei Li’s World Labs raises \$1bn to advance spatial intelligence,” Silicon Republic. 

Articles connexes

Engineering Philosophy: Yann LeCun, Learning the World, Openly

Yann LeCun built deep vision on convolution, then bet that real intelligence comes from learning the world unsupervised …

21 min de lecture

Engineering Philosophy: Roberto Ierusalimschy

Roberto Ierusalimschy designed Lua around one principle -- mechanisms, not policy -- a small, fast, embeddable language …

25 min de lecture

Apple Vision Framework: On-Device CV Most Devs Skip

Apple Vision ships more than two dozen on-device CV operations. Most devs default to OpenAI Vision for tasks Vision perf…

14 min de lecture