Engineering Philosophy: Yann LeCun, Learning the World, Openly

Yann LeCun, deep learning pioneer and Meta Chief AI Scientist

Key Takeaways

  • Learning the world, mostly on its own. LeCun’s core bet is that intelligence comes from a machine learning how the world works by observation – self-supervised learning – with human labels as a thin garnish on top. The famous “cake” analogy makes the proportions literal: the bulk is self-supervised, the icing is supervised, the cherry is reinforcement learning.
  • Convolution made vision learnable. At Bell Labs he applied backpropagation to handwritten digits (1989) and built the LeNet family of convolutional networks; LeNet-5 (1998) became the architecture that, deployed commercially, was reading roughly 10% of all checks in the United States by 2001.
  • Open research is a method, not a slogan. As founding director of Facebook/Meta’s FAIR (2013) and Chief AI Scientist, he made publishing and open-sourcing the default – I-JEPA’s code and checkpoints shipped with the paper – on the conviction that open science compounds faster than secrecy.
  • The optimist who is a skeptic about LLMs. A 2018 Turing co-laureate with Hinton and Bengio, LeCun argues autoregressive language models are “an off-ramp” on the road to human-level AI; the road, he says, runs through world models like JEPA. In November 2025 he left Meta to build exactly that.

The Principle

“We are not going to get to human-level AI just by scaling LLMs” – they “simply predict text rather than truly understand the world.” – Yann LeCun, on leaving Meta, 202510

The principle underneath that sentence is older than the LLM era, and LeCun has held it consistently for forty years: a machine should learn how the world works mostly on its own, by predicting what it observes, rather than by being spoon-fed human labels – and the science of how to do that should be done in the open. Most of what any animal knows, it learns without a teacher. A baby learns that unsupported objects fall, that occluded things still exist, that the world has stable structure, long before anyone names a single thing for it. LeCun’s wager is that this kind of learning – absorbing the structure of the world from raw, unlabeled, high-bandwidth sensory input – is the bulk of intelligence, and that supervised learning and reinforcement learning are comparatively thin layers on top of it.

That is the content of his most-quoted line, the cake: “If intelligence is a cake, the bulk of the cake is self-supervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning.”5 He originally said unsupervised in 2016 and deliberately corrected it to self-supervised by 2019 – a precision that matters, because it names the mechanism: the data supplies its own labels. Mask part of the input, predict it from the rest; the world is its own teacher.5 The proportions in the analogy are the whole argument. If the cake is mostly self-supervised, then a field pouring its effort into ever-larger systems trained to predict the next human-written token is, in his view, optimizing the icing.

The second half of the principle is openness. LeCun built his career and his lab on the belief that AI research advances fastest when it is published, reproducible, and open-sourced – a conviction that runs against the industry’s instinct to hoard frontier work. The two halves are connected: if the road to real intelligence is long and uncertain, no single lab is going to walk it alone, and the only way to find out which architectures actually learn the world is to put them in the open and let everyone test them. Learning the world, openly – prediction as the engine, openness as the method.

Context

Yann André Le Cun was born July 8, 1960, in Soisy-sous-Montmorency, near Paris.1 He earned his PhD from the Université Pierre et Marie Curie (now part of Sorbonne University) in 1987, and his thesis already contained an early form of backpropagation – the idea that would define the field.1 He was a connectionist before connectionism was respectable, working on brain-like learning networks during the same stretch when the rest of AI had written them off.

In 1988 he joined AT&T Bell Laboratories, the institution where his most consequential engineering happened.1 Bell Labs gave him real data and a reason to make networks work, not just theorize about them: handwritten digits from the US Postal Service, then the amounts written on bank checks. The constraint was unforgiving – a check-reader that hallucinates a digit costs money – and it pushed him toward an architecture that took the structure of images seriously rather than treating a picture as an undifferentiated bag of pixels.

After Bell Labs he moved to NYU in 2003 as a professor of computer science.1 Then, in December 2013, Mark Zuckerberg hired him to build and lead Facebook AI Research (FAIR), and he became the company’s Chief AI Scientist – a post he held for over a decade while remaining a part-time professor at NYU.110 FAIR became one of the most prolific open industrial research labs of the era, a direct expression of his belief that the science should be shared. That tenure ended in November 2025, when he left to found his own world-models startup – the clearest possible statement of which side of the LLM bet he is on.10

The Work

Convolutional networks and LeNet: reading the world’s checks (1989-1998)

The central technical idea of LeCun’s career is the convolutional neural network, and the cleanest way to feel why it matters is to watch one work. A naive network connecting every pixel to every neuron is both enormous and blind to structure: it has no notion that a stroke in the top-left of an image is the same kind of thing as the identical stroke in the bottom-right. LeCun’s insight – building on Kunihiko Fukushima’s Neocognitron and grounded in the brain’s visual cortex – was to slide a small filter (a kernel) across the image, computing the same handful of weights at every location, so the network learns a feature once and detects it everywhere. The widget below is exactly that operation: pick an edge-detecting kernel, watch it sweep a digit, and see the “feature map” light up wherever the pattern it matches appears.

The history is concrete. In 1989, at Bell Labs, LeCun and colleagues were the first to apply the backpropagation algorithm to a practical problem – recognizing handwritten zip codes from US Postal Service mail – producing the prototype that became LeNet-1.2 Nearly a decade of refinement led to the landmark 1998 paper, “Gradient-Based Learning Applied to Document Recognition,” by LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, which described LeNet-5 and made the case that gradient-based learning could replace hand-engineered feature extractors for document recognition.3

This was not a benchmark curiosity. NCR deployed LeNet-based check-readers commercially starting in June 1996, and by 2001 the system was estimated to read about 20 million checks a day – roughly 10% of all the checks in the United States.2 At a moment when neural networks were still dismissed by much of the field, LeCun had one quietly reading a tenth of a nation’s checks. The architecture he designed for that problem – convolution, pooling, learned features stacked into a hierarchy – is, structurally, the same idea that powers modern computer vision.

Yann LeCun speaking

Open research and FAIR

When LeCun built FAIR in 2013, he made a choice that was not obvious for a corporate lab: the work would be open. Publish the papers, release the code, share the models.110 The bet was that an open lab attracts the best researchers (who want their work seen and cited), moves the whole field faster, and – not incidentally – lets the world audit and improve what you build.

This is the philosophical cousin of why open source is not a security boundary: openness is not a guarantee of correctness, but it is the mechanism by which correctness gets found. LeCun’s version applies that to science. You cannot know which architectures actually learn the structure of the world by reasoning about them in private; you publish them, others reproduce or refute them, and the truth survives the scrutiny. Meta’s release of I-JEPA in 2023 is the pattern in miniature – the training code and model checkpoints shipped alongside the paper, not months later, not never.8 In an era when frontier labs increasingly treat their best work as a trade secret, LeCun’s open stance is a deliberate, contrarian position about how knowledge compounds.

Self-supervised learning: the cake

The deepest of LeCun’s ideas is also the one with the catchiest packaging. He has argued for years that the field has the proportions of intelligence backwards, and he made the point with food. At his NYU Future of AI talk in early 2016 and his NeurIPS keynote that year, he put up a cake: “If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning.”5 By 2019 he had deliberately revised “unsupervised” to “self-supervised” – a variant where the data provides its own supervision, by hiding part of the input and training the model to predict it from the rest.5

The revision is not cosmetic; it names the engine. “Unsupervised” describes the absence of labels. “Self-supervised” describes a positive mechanism: the world, observed, becomes its own training signal. This is, he argues, how humans and animals acquire the overwhelming majority of what they know – the common-sense physics and structure that no one labels for us. His phrase for it is the “dark matter of intelligence”: the vast, unlabeled mass of learning that supervised and reinforcement methods only decorate.5 If he is right about the proportions, then the most important research problem in AI is not bigger labeled datasets or more reward signals, but better self-supervised objectives – learning the world from watching it.

Yann LeCun

World models and JEPA: the LLM-skeptic stance

LeCun’s most public and most contested position follows directly from the cake. If real intelligence is mostly self-supervised learning of the world’s structure, then a system trained purely to predict the next human-written token is learning a word model, not a world model – and will inherit the limits of that substrate. He has said so bluntly: autoregressive LLMs are “an off-ramp” on the road to human-level AI, useful but not the path; “we are not going to get to human-level AI just by scaling LLMs.”910

His proposed alternative is the Joint Embedding Predictive Architecture (JEPA), introduced in his 2022 position paper “A Path Towards Autonomous Machine Intelligence.”6 The key move is where the prediction happens. A generative model tries to reconstruct missing input pixel-by-pixel, and so wastes capacity modeling unpredictable detail – which is why generative models famously struggle with things like the exact number of fingers on a hand.8 A JEPA predicts in an abstract representation space instead, free to ignore details that are not predictable and concentrate on the structural, low-entropy regularities of a scene.68 Meta’s I-JEPA (2023) was the first concrete image model built on this vision, and it shipped open.8

The contrast with his fellow “godfather” is the sharpest in this series. Geoffrey Hinton shared the same Turing Award and the same lonely decades of conviction, then turned in 2023 to warn that the technology may be dangerous. LeCun is the optimist of the trio – skeptical not of AI’s safety but of the current architecture’s ceiling – and in November 2025 he backed that conviction with his career, leaving Meta to found Advanced Machine Intelligence (AMI) Labs in Paris, a startup built expressly around world models rather than larger language models.10 Two godfathers, one award, opposite messages: Hinton says slow down because it might work too well; LeCun says this particular road doesn’t go where everyone thinks, and points at a different one.

The Method

The method is consistent from the check-readers to the world-model startup: build the architecture the structure of the problem demands, learn as much as possible without labels, and do it in the open.

Encode the structure into the architecture. Convolution works because it bakes a true fact about images – translation invariance – directly into the network’s wiring, instead of forcing the model to learn it from scratch. The lesson generalizes: when you know something about the problem, build it into the architecture rather than hoping the data teaches it.23

Make the data its own teacher. Labels are scarce and expensive; raw observation is abundant. The self-supervised stance – mask part of the input, predict it from the rest – is how LeCun proposes to learn the bulk of what a system needs to know. Reach for human supervision last, not first.5

Predict in representation space, not pixel space. Don’t spend capacity modeling detail you can’t predict. JEPA’s central engineering choice is to predict abstract representations and deliberately discard the unpredictable – a discipline about what is worth modeling at all.68

Publish it. Open papers, open code, open models. LeCun’s conviction is that science compounds in the open and stalls in secret, and he ran one of the industry’s largest labs on that principle for a decade.1810

Hold the unfashionable position when the evidence supports it. He believed in neural networks through the winters, and he now holds that LLMs are a detour while the entire industry pours capital into them. The willingness to be the loud skeptic at the peak of a hype cycle is the same muscle that kept him on convolution when no one cared.910

Influence Chain

Who Shaped Him

Kunihiko Fukushima. Fukushima’s Neocognitron (1980) – a layered, shift-tolerant visual network inspired by the cortex – was the direct ancestor of the convolutional network. LeCun added end-to-end learning by backpropagation, turning a hand-tuned architecture into one that learns its own filters. (Direct influence)

David Hubel and Torsten Wiesel. Their Nobel-winning neuroscience of the visual cortex – simple cells detecting local features, complex cells pooling over position – is the biological blueprint that convolution and pooling formalize. LeCun, like Hinton, reasoned from how brains actually see. (Formative influence)

The connectionist backpropagation lineage. LeCun developed an early form of backprop in his 1987 thesis, converging on the same engine that the Rumelhart-Hinton-Williams work made famous in 1986. He inherited and extended the connectionist program when it was deeply out of fashion. (Direct influence)

Who He Shaped

Modern computer vision. Every convolutional vision system – the ones reading medical scans, driving perception stacks, and powering phone cameras – descends structurally from LeNet. He didn’t just contribute to the field; he supplied its foundational architecture.

The self-supervised turn. The industry-wide pivot toward learning from unlabeled data – masked pretraining, contrastive methods, joint-embedding objectives – runs straight through LeCun’s “cake” framing and his decade of insisting that this is where the bulk of intelligence lives.

A generation of FAIR researchers. By running one of the largest open industrial labs, LeCun shaped how a generation publishes and shares work, and seeded much of the open-model ecosystem that now exists outside the most secretive frontier labs.

The Throughline

LeCun is the computer-vision root of this series’ deep-learning branch, and the cleanest line runs forward to Andrej Karpathy, whose work lives squarely in the field of learned vision that convolution opened – and whose “Software 2.0” reframe (a network as a program compiled from data) is the natural generalization of LeCun’s move from hand-engineered check-readers to learned ones. The sharpest contrast is with his Turing co-laureate Geoffrey Hinton: the two “godfathers” share the award, the lonely decades, and the bet on brain-like learning, but split on the present moment. Hinton, the worried one, left Google in 2023 to warn the thing might be too powerful; LeCun, the optimist, left Meta in 2025 to argue the dominant architecture isn’t powerful enough and to build a different one. Where Hinton fears the result, LeCun disputes the route. Two routes, one mountain: Hinton trusts the danger is real; LeCun trusts that the world, learned openly, is the road. (Series bridge)

What I Take From This

The lesson I keep from LeCun is that the architecture should carry the part of the problem you actually understand. Convolution works because it builds translation invariance into the network instead of hoping a billion examples teach it – and that is exactly the move I reach for when I design systems: when I know something is true about the domain, I encode it in the structure rather than leaving it to chance. It is the same instinct as treating taste as a technical system you can defend, not a vibe you hope emerges – put the known constraint into the design where it can be checked, not into the prayer that the output behaves.

The harder lesson is the LLM skepticism. LeCun is standing at the absolute peak of a hype cycle – the entire industry, the capital, the attention all pointed at scaling language models – and saying, on the record, that it is an off-ramp. He may be wrong; the point is that he is willing to be the dissenting voice when dissent is expensive, anchored to a specific technical argument about prediction in representation space rather than to contrarian instinct. That is the evidence gate pointed at the consensus itself: not “everyone’s excited, so it must be the path,” but “what is this architecture actually learning, and is that the thing we want?” And the openness is the part I take most directly – the conviction that you find out who is right by publishing the work and letting it be tested, which is why I treat quality as the only variable and the Steve test of whether the work deserves to exist as something you submit to scrutiny, not something you assert. LeCun bet his whole career, twice, on learning the world rather than memorizing its labels – and on doing it where everyone can see.

FAQ

What is Yann LeCun’s engineering philosophy?

Learning the world, openly. LeCun argues that intelligence is mostly self-supervised – a machine learning the structure of the world by predicting what it observes, with human labels and reward signals as thin layers on top, captured in his “cake” analogy where self-supervised learning is the bulk, supervised learning the icing, and reinforcement learning the cherry.5 He pairs this with a commitment to open research: publishing papers, code, and models so the science can be reproduced and tested.18 His engineering signature is encoding known structure directly into the architecture – as convolution bakes translation invariance into a vision network.23

What did Yann LeCun invent, and how was it used commercially?

He is the principal inventor of the modern convolutional neural network. At Bell Labs in 1989 he was among the first to apply backpropagation to a practical task – recognizing handwritten zip codes – producing the LeNet prototype, and the 1998 paper “Gradient-Based Learning Applied to Document Recognition” (with Bottou, Bengio, and Haffner) described LeNet-5.23 Commercially, NCR deployed LeNet-based check-readers starting in 1996, and by 2001 the system was estimated to read about 20 million checks a day – roughly 10% of all checks in the United States.2

Why is Yann LeCun skeptical of large language models?

Because, in his view, autoregressive LLMs learn a model of text rather than a model of the world, and so cannot reach human-level intelligence by scaling alone – he calls them “an off-ramp” on the road to human-level AI.910 His proposed alternative is the Joint Embedding Predictive Architecture (JEPA), from his 2022 paper “A Path Towards Autonomous Machine Intelligence,” which predicts in abstract representation space and ignores unpredictable detail, rather than generating every pixel or token.68 In November 2025 he left Meta to found AMI Labs in Paris to pursue world models directly.10

Did Yann LeCun win the Turing Award?

Yes. He shared the 2018 ACM A.M. Turing Award with Geoffrey Hinton and Yoshua Bengio – the three “godfathers of deep learning” – “for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.”4 LeCun’s recognized contributions center on convolutional networks and his broader work making deep learning practical. He differs publicly from Hinton on the present: where Hinton (who left Google in 2023) warns of AI’s dangers, LeCun is the optimist who argues today’s dominant LLM architecture is not the route to human-level intelligence.10


Sources


  1. “Yann LeCun,” Wikipedia. Yann André Le Cun, born July 8, 1960, in Soisy-sous-Montmorency, France; PhD from the Université Pierre et Marie Curie (now Sorbonne University), 1987; joined AT&T Bell Laboratories in 1988; professor at New York University from 2003; joined Facebook in December 2013 as founding director of Facebook AI Research (FAIR) and Chief AI Scientist; 2018 ACM Turing Award shared with Geoffrey Hinton and Yoshua Bengio. 

  2. “LeNet,” Wikipedia. LeNet is a series of convolutional neural network architectures developed at AT&T Bell Laboratories (c. 1988-1998) centered on Yann LeCun; in 1989 LeCun et al. were the first to apply backpropagation to a practical task, recognizing handwritten US Postal Service zip codes (the LeNet-1 prototype). NCR deployed LeNet-based bank check readers starting in June 1996; by 2001 the system was estimated to read about 20 million checks a day, or 10% of all checks in the US. 

  3. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE 86, no. 11 (1998): 2278-2324, doi:10.1109/5.726791. The paper describing LeNet-5 and arguing that gradient-based learning can replace hand-engineered feature extractors for document recognition. Citation and significance also documented at “LeNet,” Wikipedia. 

  4. 2018 ACM A.M. Turing Award citation for Yoshua Bengio, Geoffrey Hinton, and Yann LeCun: “for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.” The official ACM page (awards.acm.org) blocks automated requests; citation wording is documented verbatim at “Turing Award,” Wikipedia, and “Yann LeCun,” Wikipedia. 

  5. On the “cake” analogy and the unsupervised-to-self-supervised revision: “Yann LeCun Cake Analogy 2.0,” Synced, February 22, 2019. The cake first appeared at LeCun’s NYU Future of AI Symposium talk in early 2016 and his NIPS 2016 keynote, originally as “the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning”; LeCun revised “unsupervised” to “self-supervised” by the 2019 ISSCC conference. On self-supervised learning as the “dark matter of intelligence,” see also LeCun’s discussion at “Self-supervised learning: The plan to make deep learning data-efficient,” TechTalks, March 23, 2020. 

  6. Yann LeCun, “A Path Towards Autonomous Machine Intelligence,” OpenReview, version 0.9.2 (June 27, 2022). Position paper proposing a configurable predictive world model, intrinsic-motivation-driven behavior, and hierarchical Joint Embedding Predictive Architectures (JEPA) trained with self-supervised learning, predicting in representation space rather than reconstructing inputs. 

  7. On convolution and its biological/architectural roots (Fukushima’s Neocognitron; Hubel and Wiesel’s visual-cortex neuroscience) and pooling: “Convolutional neural network,” Wikipedia, and “LeNet,” Wikipedia. LeCun added end-to-end learning by backpropagation to a Neocognitron-style architecture, so the network learns its own filters rather than having them hand-designed. 

  8. “I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI,” Meta AI, June 13, 2023. I-JEPA learns by predicting abstract representations of unseen image regions rather than reconstructing pixels; Meta open-sourced the training code and model checkpoints with the announcement. The post contrasts generative methods, which “try to fill-in every bit of missing information, even though the world is inherently unpredictable,” with JEPA’s prediction at “a high level of abstraction rather than predicting pixel values directly.” 

  9. Yann LeCun, “LLMs are useful, but they are an off ramp on the road to human-level AI,” post on X, June 1, 2024: “LLMs are useful, but they are an off ramp on the road to human-level AI. If you are a PhD student, don’t work on LLMs. Try to discover methods that would lift the limitations of LLMs.” (X requires authentication for automated retrieval; the quotation is widely reproduced, including in coverage of LeCun’s 2025 Meta departure cited below.) 

  10. Jeremy Kahn / Beatrice Nolan, “Yann LeCun is targeting a \$3.5 billion valuation for his new startup,” Fortune, December 19, 2025. LeCun left Meta on November 18, 2025, after 12 years (five as founding director of FAIR, seven as Chief AI Scientist) to found Advanced Machine Intelligence (AMI) Labs, headquartered in Paris, focused on “world models” – systems that understand physics, maintain persistent memory, and plan complex actions. LeCun: “We are not going to get to human-level AI just by scaling LLMs,” which “simply predict text rather than truly understand the world.” Departure and startup also reported by “Meta chief AI scientist Yann LeCun is leaving the company to create his own startup,” CNBC, November 19, 2025. 

Artigos relacionados

Engineering Philosophy: Fei-Fei Li, Data Is the Foundation

Fei-Fei Li gave AI its eyes by building ImageNet -- betting that data at scale, not a cleverer model, was the missing in…

20 min de leitura

Engineering Philosophy: Geoffrey Hinton, Conviction Over Fashion

Geoffrey Hinton bet on brain-like neural networks through two AI winters when the field mocked them -- conviction over f…

20 min de leitura

Apple Vision Framework: On-Device CV Most Devs Skip

Apple Vision ships more than two dozen on-device CV operations. Most devs default to OpenAI Vision for tasks Vision perf…

14 min de leitura