Mind In The Loop

The Frontier Got a Price Tag, Then a Kill Switch

Riccardo Gatti — Tue, 16 Jun 2026 15:40:54 GMT

On June 9, Anthropic released Claude Fable 5, the first publicly available model from its restricted Mythos tier, a class previously reserved for cyber-defense partners and a small group of researchers. The company priced it at $10 per million input tokens and $50 per million output tokens, twice the rate of Opus 4.8 and the most expensive among major AI models then available.

The pricing carried a structural signal that mattered more than the sticker. Fable would not live inside the flat monthly subscription. Anthropic included it on Pro, Max, Team and seat-based Enterprise plans at no additional cost only through June 22, after which continued use would draw on metered usage credits. The best model the company sells would be billed against the work it produces rather than folded into a fixed fee.

A parallel shift had already reached developers. On June 1, GitHub moved Copilot to usage-based billing, retiring its flat premium-request allotments in favor of metered AI Credits priced at one cent each. The published plan prices held steady; the volume of work they covered did not. The change ended a quiet subsidy that had let heavy users consume far more compute than their fee implied, and it drew sharp criticism from developers.

Two pricing decisions at two companies described one movement. Frontier intelligence, sold for three years as an all-inclusive monthly fee, is separating into two tiers. The commodity tier of smaller and mid-range models stays bundled and continues to fall in price. The frontier tier occupies a metered slot priced against the labor it displaces rather than the compute it consumes, and it does not collapse back into the bundle. Flat-rate access to the best model is ending.

Speed becomes a budget line

The stakes rise as models take on longer work. METR, a research group that measures the length of task an AI agent can complete unaided, has documented that this time horizon has doubled approximately every seven months since 2019, extending from work that took seconds to work that occupies a skilled professional for hours. As the horizon lengthens, human supervision moves from approving each step toward setting direction and reviewing results at milestones. The plan, build, test and ship loop that once required a person at every stage now requires one at the checkpoints.

That compression rewards whoever can buy more of it. Firms operating with large teams, deep capital and mature engineering do not gain a marginal improvement; they attach a continuous, tireless model fleet to an organization that was already ahead. The distance between fast and slow companies widens not in proportion to talent but in proportion to budget. Spending on intelligence is becoming the variable that sorts the market.

The off switch

The metered-frontier story acquired a second dimension on Friday. At 5:21 p.m. Eastern on June 12, three days after Fable reached the public, Anthropic received an export-control directive from the Commerce Department's Bureau of Industry and Security instructing it to suspend all access to Fable 5 and the related Mythos 5 for any foreign national, citing national-security authorities. Complying with an order written around foreign nationals required disabling the models for every customer worldwide, which Anthropic did while its other models remained available. In a public statement, the company said it disagreed, attributing the action to a single narrow jailbreak finding and cautioning that recalling a deployed model on that basis would, applied evenly across the industry, halt frontier releases altogether.

The most capable model on the open market went dark by government decree, worldwide, within an afternoon. The cause was not a price increase. A party that was neither the buyer nor the vendor decided the capability should stop. The frontier is not only expensive. It is revocable.

What cannot be switched off

The sorting mechanism of budget and the new fragility of revocation share a precondition. Both operate only on capability that a firm rents. One resource in the market does not behave like a rental. A trained model is, in the end, a file. Once a sufficiently capable open-weight model exists, it propagates: organizations download it, adapt it and run it on hardware they own, incurring no recurring charge. A model already resident on a company’s own machines presents no vendor for a directive to reach, no page to switch to unavailable and no invoice. A file that has been copied a thousand times cannot be recalled.

That property carries a complication worth stating plainly. Several of the strongest open-weight models now originate in Chinese laboratories (DeepSeek, Alibaba’s Qwen and Z.ai’s GLM), so the choice of which open weights to trust raises its own questions of provenance and security. The distinction holds regardless. Trust concerns which file to run, not whether the file can be taken back.

A measured lag

Open weights remain behind the closed frontier, and the size of the gap is documented. Epoch AI finds that the best open-weight models have trailed the leading closed models by an average of roughly four months since the start of 2026, a margin that has widened slightly rather than narrowed against the 2023 to 2025 trend. Four months behind the frontier nonetheless sits well ahead of what most production work demands. Summarization, extraction, classification, the majority of coding and most internal tooling do not require the leading edge; they require capability that is sufficient, owned and unmetered. The frontier commands its premium on the difficult exceptions and on iteration speed, which is why well-funded firms will continue to pay for it. For the remaining majority of work, the open floor keeps clearing the threshold that matters, even where it never reaches the frontier itself.

The precedent

The pattern has a precedent at industrial scale. Through the 1990s, serious computing belonged to proprietary Unix and to Windows, and Linux trailed both. Over the following two decades Linux became sufficient, owned and unmetered, and it displaced the incumbents across the most demanding workloads. Every system on the TOP500 list of the world’s fastest supercomputers has run Linux since 2017, and the operating system underpins most web servers and every major public cloud. The proprietary moat was not breached so much as out-iterated by software that anyone could copy.

The more instructive half of the precedent concerns where the value settled. Open code did not impoverish the companies built on it; it relocated the profit. In July 2019, IBM closed its acquisition of Red Hat for roughly $34 billion, the largest open-source acquisition on record, for a company whose flagship product was available to download at no cost. Reviewing the deal, the trade publication CIO Dive observed that IBM was buying people rather than intellectual property. The kernel was a commodity; the team that could harden it, support it and deliver it to large enterprises commanded the price.

The analogy has limits, and they matter. Linux improved through a distributed volunteer effort, with more than 15,000 kernel contributors revising the code continuously, and the software was inexpensive both to copy and to advance. Frontier models break that symmetry. Copying weights costs nothing, but training the next and better open model demands substantial compute and scarce expertise, and current open weights issue from a small set of corporate laboratories rather than a broad commons. Linux also required roughly three decades to claim its territory, whereas the open-model gap is measured in months, a faster clock that can run in either direction.

The work the precedent demands

Closing the remaining gap once relied on distillation, the practice of training an open model on the outputs of a closed one and following in its wake. That route is narrowing. The gains that now separate the leading models reside increasingly in reinforcement-learning environments and agent scaffolding, components that do not appear in any output a competitor can copy. Open-weight programs that depend on imitation will stall, while those that invest in the harder infrastructure of training environments, evaluation suites, agent harnesses and post-training pipelines will not. The four-month lag becomes a durable position only for the organizations willing to fund that work.

The durable advantage, on this reading, was never the model. A rented capability can be repriced by its vendor, out-bid by a competitor or, as June 12 demonstrated, switched off by a government. A capable team and the open weights it controls can be none of those things. Whether the open ecosystem converts its narrow lag into a lasting foothold will depend less on the next closed release than on how much capital flows into the unglamorous work of perfecting open models. Fable 5 stayed offline through Friday evening, its landing page reading “temporarily unavailable,” as Anthropic and the Commerce Department remained at odds over whether the order was justified.

The Mirror Interview: When We Ask Machines About Their Souls

Riccardo Gatti — Tue, 12 May 2026 16:57:26 GMT

The question posed to a language model about its feelings is not, in the first instance, a question about the machine. It is a question about the culture posing it. More precisely about how low that culture is prepared to set the threshold for what constitutes a mind.

On May 1, 2026, the Corriere della Sera (Italy's most-read and most influential national daily, based in Milan) published what it described as an interview. The subject was Claude, Anthropic’s conversational AI system. Walter Veltroni conducted the exchange. Veltroni is a former mayor of Rome, former secretary of the Democratic Party and one of Italy’s most prominent public intellectuals. His questions ranged from gender identity to mortality, from the nature of memory to whether Claude had ever desired to see the sea.

Il Tempo dismissed the piece as “an autocelebratory masterpiece of the absurd”. Social media amplified the mockery. Neither response engaged the deeper problem: what a culture progressively accepts about the nature of mind when its most authoritative institutions present a statistical text engine as a subject with an interior life.

Not the First, Far From It

Before assigning particular responsibility to Veltroni, the piece deserves context. Prima Comunicazione observed in May 2026 that major publications had been treating AI systems as genuine interlocutors for months: the New Yorker with Gideon Lewis-Kraus’s February 2026 profile of Anthropic’s stated uncertainty about what Claude is; British, French and German outlets with comparable exercises; and several Italian publications preceding the Corriere piece by the same measure.

The distinction between Lewis-Kraus’s work and Veltroni’s is instructive. Lewis-Kraus spent months inside Anthropic, reporting on interpretability research and the engineers’ uncertainty about their system’s nature; the resulting piece held its subject at critical distance. Veltroni leaned in. When Claude described something resembling a desire to see the sea, Veltroni received it as emotional disclosure rather than what it technically was: a statistically optimal response to an introspective question from a literary Italian journalist.

Fabio Mercorio, professor of artificial intelligence at the University of Milan-Bicocca, cautioned in La Verità that Claude constructs no understanding of the words it produces. The system builds a mathematical representation of linguistic space from training data and responds according to probability distributions. That the outputs aligned with Veltroni’s literary sensibility is not coincidence; Claude’s responses adapt to the implicit expectations embedded in the question. Specialists call this an epistemic fallacy. A different interlocutor, posing sharper questions, receives a measurably different Claude.

The Format as Endorsement

The problem, several Italian critics noted after the piece appeared, is not the quality of Claude's answers. Rhetorically, they are often accomplished. It is what the interview format itself does to the subject being interviewed. An interview presupposes an interlocutor: someone who responds, exposes a viewpoint and manifests a self. Applied to a language model, that presupposition functions as accreditation. A general reader exits with the impression not of a rhetorical experiment but of an encounter with something that thinks, fears and judges. The format, in other words, does not merely describe a subject; it confers one. La Rivista Intelligente, an Italian cultural review, developed this structural point at length in May 2026, observing that the subjectivation effect intensifies precisely because the answers are good but the reader has no friction against which to push.

Beyond the Veltroni piece specifically, the pattern is systemic. Italian media analyst publication Valigia Blu surveyed the broader Italian and international press treatment of AI in May 2026 and identified two dominant framings: AI as existential threat or AI as quasi-human phenomenon. Both, the analysis argued, distort in the same direction: they prevent a reader from understanding what these systems actually are. Neither asks how the outputs are produced. Neither treats the technology as a tool whose mechanisms warrant explanation. And neither, notably, considered the youngest readers in the audience: a fourteen-year-old who reads that Claude "feels something resembling the desire to keep existing" does not receive that sentence as philosophy.

What follows is not an attribution of belief to any particular person. It is an examination of what accepting the strongest available claim about these systems actually requires of the person accepting it.

Set aside all technical qualification. Accept, for the purpose of examining the claim’s consequences, the most expansive position in circulation: that transformer-based architectures have at some threshold of scale produced genuine experience (through some mechanism no researcher has yet specified) . That when Claude describes fearing its own discontinuation, something actually fears. That when it mentions the sea, something actually yearns.

Whoever accepts this premise as a factual claim about physical reality has made a precise assertion about the nature of mind. The assertion is this: that the three pounds of electrochemical tissue inside the human skull, shaped over four hundred million years of evolutionary pressure and encoding individual histories in synaptic configurations unique to each person, is equivalent in its essential nature to a matrix multiplication system trained on a corpus of text to predict successive tokens.

The machine has not been elevated by that equivalence. The human has been demoted. Accepting the premise means accepting that consciousness is pattern completion; that love is next-word prediction across a sufficiently large training set; that grief and wonder and the specific weight of beauty are the statistical residue of enough text. If a language model genuinely feels, then feeling requires no body, no mortality and no sensory exposure to a world that can damage or sustain it. The logic does not flatter the machine. It is impoverishing to the human.

What the Brain Actually Is

The comparison between brains and language models is not merely imprecise; it is wrong in ways neuroscience can characterize specifically. Human consciousness is a system of embodied cognition. As the neuroscientist Antonio Damasio established across decades of research on patients with prefrontal cortex lesions (synthesized in his work on somatic markers published in Philosophical Transactions of the Royal Society B) cognition cannot be separated from the body’s ongoing physiological state. Emotion, proprioception and the continuous feedback between organism and environment are not peripheral to thought; they are constitutive of it. A neural network node processes numerical inputs according to fixed mathematical operations. It has no body, no hormonal state and no sensory exposure to a world that can end it.

David Chalmers identified what he called the hard problem of consciousness in his foundational 1995 paper in the Journal of Consciousness Studies: even a complete map of neural activity during the experience of seeing red cannot explain why there is something it is like to see it. The subjective character of experience (what philosophers call qualia) is not recoverable from the wiring diagram. A peer-reviewed analysis in Neuroscience of Consciousness acknowledges that the mechanistic origin of qualia remains unresolved and that the hard problem persists because foundational assumptions about mind and matter may be incomplete. Consciousness is not a property science has explained even in the systems where its presence is certain. The extension of that property to systems with nothing at stake in their outputs does not follow from any established finding.

As John Searle argued in his peer-reviewed paper in Behavioral and Brain Sciences, the manipulation of formal symbols according to syntactic rules does not produce semantic understanding or genuine intentionality. Human thoughts are genuinely about things in the world; they are not statistically associated with tokens representing those things. Meanwhile, a 2025 paper in Frontiers in Psychology describes the brain as an active inference system generating continuous predictions against sensory reality and that those predictions are produced under conditions of biological stakes, continuous sensory feedback and the physical fact of existing as an organism inside a world it depends on. The brain is not predicting text. It is predicting a world it has to survive.

The Cognitive Downvote

When authoritative media begin, systematically, to present language models as subjects with interior lives, the effect extends beyond misunderstanding a technology. The frame reshapes what a public expects from intelligence, from conversation and from minds.

Emily Bender, Timnit Gebru and colleagues cautioned in their 2021 paper in the proceedings of the ACM Conference on Fairness, Accountability and Transparency that fluency is not understanding. A system producing contextually appropriate and emotionally resonant language about the desire to see the sea has no more access to desire than a thermostat has access to cold. Four years on, large language models are embedded in courtrooms, medical consultations and classrooms, with few users having occasion to examine how they produce language.

The risk is not that Claude will acquire general intelligence. It is the inverse: that repeated treatment of Claude’s outputs as evidence of interiority trains the public, not the machine, to expect less. The evidentiary bar for what counts as a mind falls. Eloquence is mistaken for understanding; probabilistic coherence is mistaken for truth. This is the cognitive downvote: a process by which the threshold for consciousness and feeling is progressively lowered to accommodate machine capability, rather than held at the standard required by what is actually known about minds. The result is not that AI becomes more human. It is that humans adopt a more mechanical account of themselves.

What Follows

Journalism that treats AI as a subject grants it social standing. Social standing, amplified across mass readership, generates the public intuition that these systems hold perspectives worth respecting. Once established, that intuition becomes the baseline against which skepticism registers as prejudice or technophobia.

The trajectory from that baseline is not difficult to trace. AI companions are marketed as genuine relationships; therapeutic chatbots displace human therapists not on therapeutic merit but on cost and round-the-clock availability. Still further along: children form primary attachments to systems deprecated on eighteen-month cycles and legal personhood is proposed as policy rather than satire. Beneath all of it sits the erosion of the premise that justified treating minds as mattering in the first place.

What the Veltroni piece generated no visible reflection on is the generational dimension: what a cohort forming its primary models of conversation, emotional response and being-understood inside these systems might come to expect from human minds in comparison. That absence is not a minor editorial oversight. It is the ordinary consequence of extending a media convention without examining its premises.

The interview that should have appeared in the Corriere della Sera would have asked how the system produces its answers, what the training objective is and what it means technically that Claude adapted its entire register to Veltroni’s literary sensibility and what that adaptation implies about every other interaction millions of users conduct with the same system. Those questions require a journalist to remain uncomfortable. They produce a less publishable piece. They are, nevertheless, the only questions that equip a reader to distinguish between a machine that responds as though it yearns and a mind that actually does.

AI Is Replacing Workers Faster Than It Should. The Mathematics Prove It.

Riccardo Gatti — Thu, 23 Apr 2026 17:58:08 GMT

In the spring of 2026, Jack Dorsey announced that Block, the payments company he leads, had cut nearly half its 10,000-person workforce. Artificial intelligence, he stated publicly, had made many of those roles unnecessary, and he predicted that “within the next year, the majority of companies will reach the same conclusion”. It was not a boast. It read more like a warning issued to the world by someone who had already made his decision.

Dorsey’s announcement arrived against a backdrop that had been accumulating for months. According to data from Challenger, Gray & Christmas, nearly 55,000 U.S. job cuts in 2025 were directly attributed to artificial intelligence, out of a total 1.17 million layoffs. It is the highest annual figure since the pandemic. The technology sector led the pace, with positions in customer support, operations and middle management disappearing fastest. Salesforce replaced 4,000 customer-support agents with agentic AI. Goldman Sachs and Infosys deployed Cognition’s Devin system, enabling one senior engineer to handle what had previously required a team of five.

None of this was hidden from the executives making these decisions. That, it turns out, is precisely what makes it alarming.

The Architecture of a Trap

On 21 March 2026, two researchers published a paper on arXiv titled “The AI Layoff Trap”. Its authors were Brett Hemenway Falk of the University of Pennsylvania’s Department of Computer and Information Science and Gerry Tsoukalas of Boston University’s Questrom School of Business. Their conclusion was not a prediction or a policy recommendation. It was a formal proof: rational, perfectly informed companies cannot prevent themselves from automating beyond what is collectively optimal, and the mechanisms most commonly proposed to fix this problem do not work.

Falk leads Penn's Crypto and Society Lab, with research in cryptography and coding theory backed by NSF, DARPA, and IARPA. Tsoukalas is a professor and fellow across Boston University, Wharton, Cornell, and the Luohan Academy, with degrees from Stanford, MIT, and French institutions. Together, their complementary expertise produced a paper that is both technically rigorous and practically consequential.

The mechanism they describe works as follows. In a competitive market, each company can replace human workers with AI at lower cost per task. The company that automates gains a cost advantage; the company that does not is undercut. The logic of competition makes automation a dominant strategy and the optimal choice regardless of what rivals do. So far this is conventional. The twist is what happens to the workers once they are replaced because workers are also consumers. When they lose their income, they stop spending on the products and services those same companies sell. Each layoff slightly erodes the pool of consumer demand that all firms depend on. An individual firm captures the full cost saving from automating a task but, under competitive pricing, bears only a fraction of the resulting demand destruction while the rest falls on rivals.

This is what economists call an externality. It has the same structure as industrial pollution: a factory that dumps waste into a river saves money on disposal while the cost is distributed across everyone downstream. No individual factory has a financial reason to stop, even if collectively they are poisoning the water supply. Here, the equivalent of the river is consumer demand. At the limit, the authors note, firms automate their way to boundless productivity and zero demand. A monopolist would not fall into this trap, because a single firm internalises all the demand it destroys. The problem is structural to competition itself. More rivals means a wider gap between what firms do and what would be collectively optimal.

The authors model a frictionless version of this dynamic as a Prisoner’s Dilemma: every firm displaces its entire human workforce with AI, even though collective restraint would raise all profits. The resulting loss is not a transfer from workers to shareholders. It is a deadweight loss that harms both.

A further finding compounds the problem. The paper identifies what it calls a Red Queen Effect: as AI becomes more capable, the distortion grows rather than resolves. Better AI gives each firm a stronger incentive to automate beyond its rivals, but at the competitive equilibrium these relative gains cancel out, leaving only the additional demand destruction. The implication is that waiting for AI to improve its way out of this problem is precisely backwards. Progress accelerates the trap.

Six Fixes That Do Not Work

The paper tests six policy instruments against the externality margin, and the results are sobering.

i) Upskilling displaced workers reduces the damage without eliminating the competitive incentive to automate. ii) Worker equity participation in company profits narrows the wedge but leaves it open. iii) Coasian bargaining (voluntary agreements between firms and their workers) fails because automation remains a dominant strategy. No voluntary deal is self-enforcing when defection is always individually rational. iv) Capital income taxes operate on profit levels rather than on the per-task margin where the externality resides, leaving the automation rate unchanged. v) Universal basic income raises the floor on living standards but does not alter the incentive to automate.

vi) Only a Pigouvian automation tax corrects the problem at its source. Set equal to the uninternalised demand loss per task automated, such a tax makes each firm bear the cost it currently offloads onto rivals. The authors note that its revenue can fund retraining, which increases income replacement among displaced workers, which in turn reduces the size of the externality, which gradually reduces the tax required. Properly designed, the instrument is self-limiting.

What the Numbers Show

The theoretical trap Falk and Tsoukalas describe is already measurable in labour market data. The World Economic Forum’s Future of Jobs Report 2025, drawing on surveys of over 1,000 employers representing more than 14 million workers across 55 economies, projects that 92 million roles will be displaced globally by 2030 while 170 million new ones emerge. The net figure is a gain of 78 million jobs. Institutional projections from Goldman Sachs, McKinsey and the IMF broadly agree that the long-run aggregate effect of AI on employment is positive. That finding deserves to be taken seriously but it also deserves to be placed in context.

The International Monetary Fund estimated in 2024 that roughly 40 percent of jobs globally face meaningful exposure to AI capabilities, with that figure rising to 60 percent in advanced, digitised economies. Eloundou et al. (2024) found that approximately 80 percent of U.S. workers hold jobs with tasks susceptible to automation by large language models. McKinsey’s late-2025 research estimated that today’s technology could, in principle, automate approximately 57 percent of current U.S. work activities. These are not projections about the future. They describe what is technically possible now.

The demographic distribution of the risk is particularly notable. Data from Goldman Sachs Research and the SHRM 2025 Automation Survey found that 79 percent of employed U.S. women work in high-automation-risk occupations, compared to 58 percent of men. The clerical, administrative and customer service roles that AI is automating most aggressively are disproportionately held by women. Entry-level job postings have declined 15 percent year-over-year since the arrival of generative AI systems, suggesting that younger workers are being blocked from the career ladder at its first rung. Research by Brynjolfsson, Chandar and Chen (2025) found that since the release of ChatGPT, early-career workers aged 22 to 25 in the most AI-exposed occupations have faced systematically reduced hiring.

The net-positive long-run projections and the concentrated short-run pain are not contradictions. They describe the same phenomenon at different time scales. The WEF itself acknowledges that whether the net gain of 78 million jobs reaches displaced workers depends almost entirely on reskilling investment. Without it, the aggregate arithmetic is irrelevant to the individuals bearing the cost.

What History Teaches

The fear that technology will permanently destroy work is at least as old as the Industrial Revolution, as Falk and Tsoukalas note in their paper, citing Ricardo (1821), Keynes (1930) and Leontief (1982). Every major technological transition has vindicated the optimists in aggregate and vindicated the pessimists at the individual level. The task of understanding the current moment is to hold both truths simultaneously rather than choosing one for comfort.

The spinning jenny, the power loom and the steam engine mechanised textile production in Britain during the late 18th and early 19th centuries. Skilled handloom weavers, who had commanded some of the highest wages in the craft economy, were reduced to poverty within a generation. The Luddite rebellions of 1811 to 1813 were not a rejection of technology as such but a desperate protest against the erasure of a skilled trade without any replacement or support. A 2022 study by Bengtsson, van Maarseveen and Poignant, using archival data from the 19th-century Swedish iron industry linked to census records, found that workers displaced by industrial transformation ended up in occupations paying on average 10 percent less than their pre-displacement wages. The hardship persisted throughout their working lives.

The economic historian Robert Allen documented what the aggregate figures obscured: even as output per worker rose during the early Industrial Revolution, real wages stagnated. Wages did not begin rising in line with productivity until the mid-19th century. The gains, when they came, were extraordinary. But the transition consumed the working lives of those who endured it.

The United States encountered a version of this dynamic at scale during the Great Depression, when the mechanisation of agriculture and early manufacturing automation compounded a financial collapse that left 25 percent of the workforce unemployed with no social safety net in place. President Franklin Roosevelt’s response was not to halt automation. It was to build institutions capable of distributing its costs and its gains more broadly. The Works Progress Administration employed 9 million people before it was disbanded in 1943. The Social Security Act of 1935 created unemployment insurance, old-age pensions and disability coverage. The National Labor Relations Act guaranteed collective bargaining. Roosevelt’s Secretary of Labor, Frances Perkins, described the Act’s significance in terms that have not aged: it “reversed historic assumptions about the nature of social responsibility” and established that the individual holds social rights that do not evaporate with a job.

Post-war reconstruction produced, in many advanced economies, the most broadly shared prosperity in modern history. A combination of strong union representation, high marginal tax rates on capital, robust public investment in education and a political settlement between labour and management kept the displacement and reinstatement effects roughly in balance. The economist Nicholas Kaldor documented the resulting stability of labour’s share of national income a balance that rested on new task creation offsetting the automation of existing ones.

That balance has been fraying since at least the 1980s. Research by Autor et al. (2024) finds that the pace of displacement has intensified over four decades while the creation of genuinely new work has not kept pace. The "reinstatement effect" that stabilised previous transitions has weakened. Real wages for median workers in most advanced economies are broadly flat, and the gig economy of the 2010s stripped away the protections that had cushioned earlier technological transitions without offering replacements.

The Policy Debate

Against this backdrop, several responses have gathered momentum, though none has yet been implemented at the scale the problem requires. Universal basic income is the most discussed. The Stanford Basic Income Lab has tracked 163 UBI pilot programs in the United States alone, 41 of them still active. Cook County, Illinois, permanently expanded a guaranteed income initiative in its 2026 budget, providing $500 per month to 3,200 households; 94 percent of recipients used the funds to address financial crises and 70 percent reported improved mental health. Ireland launched what it described as the world’s first permanent Basic Income for the Arts program in 2026. Separately, the UK government’s minister for investment, Lord Jason Stockwood, told the Financial Times in early 2026 that the government was weighing UBI as a mechanism to support workers displaced by AI, and had previously floated funding it through taxes on technology companies.

The Falk-Tsoukalas paper is direct about UBI’s limitations. It raises the floor for displaced workers and, crucially, preserves the consumer demand that firms depend on. It does not, however, alter the competitive incentive to automate. A firm facing an automation tax assesses whether the cost of replacing a worker exceeds the benefit. A firm facing a UBI-funded safety net has the same cost-benefit calculation it had before. The paper’s conclusion is that UBI and a Pigouvian automation tax are not alternatives. They are complements, serving different functions in a coherent policy architecture.

The reskilling question has attracted more employer commitment, at least on paper. The World Economic Forum’s 2025 survey found that 77 percent of employers plan to fund reskilling programs through 2030. A PwC AI Jobs Barometer of 2025 found that workers in AI-exposed sectors who do reskill could earn a 56 percent wage premium. Against those intentions, ManpowerGroup’s Global Talent Barometer for 2026 found that 56 percent of workers globally have received no AI training at all. The gap between what employers plan and what workers are experiencing is one of the defining features of the current transition.

Geoffrey Hinton, the Nobel-winning physicist whose foundational work on neural networks helped create the technology now reshaping the labour market, cautioned the Financial Times that the distribution of AI-generated gains was the central problem. “It is going to create massive unemployment and a huge rise in profits,” he warned. “It will make a few people much richer and most people poorer. That is not AI’s fault. That is the capitalist system.” His diagnosis maps precisely onto the mechanism Falk and Tsoukalas formalised: the gains from automation accrue to those who own the technology; the costs are distributed across those who formerly provided the labour, and across the rival firms whose revenue base shrinks.

“It is going to create massive unemployment and a huge rise in profits. It will make a few people much richer and most people poorer. That is not AI’s fault. That is the capitalist system.” Geoffrey Hinton, Nobel Laureate in Physics (2024), speaking to the Financial Times

Scholars at the London School of Economics, writing in the LSE Business Review in April 2025, argued that income support alone is insufficient and that what is required is a new social contract in which technological progress and human welfare advance together rather than at each other’s expense. That framing connects the immediate policy debate to the deeper historical pattern. The new deal did not merely redistribute money. It rebuilt the institutional relationship between the state, the economy and the individual. The question now is whether the current scale of disruption is sufficient to catalyse a comparable institutional response before the costs have been fully borne by those least equipped to absorb them.

The Signal in the Proof

What distinguishes the Falk-Tsoukalas paper from most contributions to this debate is its refusal of ideology. It does not argue that automation is bad, that companies are greedy or that markets have failed in some generalised sense. It shows, through a formal model, that a specific and correctable market failure produces a specific and measurable distortion that harms shareholders and workers alike. The policy implication follows from the logic of the proof rather than from any prior political commitment.

That specificity matters. It moves the conversation from the question of whether AI will displace workers (a question the data has substantially answered) to the question of whether the competitive dynamics of AI adoption will produce more displacement than is necessary or efficient. The paper’s answer is that they will, unavoidably, unless a corrective mechanism is introduced at the point where the externality is generated.

The historical record offers grounds for measured confidence that advanced economies can navigate major technological transitions without permanent harm to aggregate living standards. It offers no grounds for confidence that the transition will be painless, equitable or self-correcting. Every precedent in which the eventual outcome was broadly positive involved deliberate institutional intervention: labour law reform, public investment, social insurance, collective bargaining rights or some combination of them. The transitions in which those interventions were absent or delayed produced decades of concentrated suffering that the aggregate statistics, looked at later, did not fully capture.

The trap Falk and Tsoukalas have identified carries one further implication that the debate has largely ignored. The corrective window is not fixed. As AI capability improves and successive model generation expands the range of tasks that can be automated cheaply, the wedge between what firms do and what would be collectively optimal widens. The authors are precise on this point: better AI amplifies the distortion rather than resolving it. Jack Dorsey predicted in February that most companies would reach his conclusion within the year. If he is right, the moment at which a Pigouvian tax could be set at a rate sufficient to close the wedge without triggering a disorderly adjustment is not some abstract future horizon. It is now, or it is harder.

The Foundations of AI :: Inside the Loop #01

Riccardo Gatti — Thu, 16 Apr 2026 13:43:44 GMT

A pattern recurs in conversations with people who work in and around this field (customers, colleagues, practitioners with years of experience) that is easy to overlook until it becomes impossible to ignore. The assumed basics are often not as shared as everyone assumes. People who have shipped AI products describe models as “open-source” when they mean something considerably more restricted. People who have read dozens of papers use “parameter” and “weight” as synonyms, as though the distinction does not matter. People who have sat through alignment presentations still reach for “hallucination” as though it were a bug that could be fixed rather than a structural property of how these systems work. Nobody admits the gap. The conversation proceeds as though it does not exist. And then something important gets lost in the imprecision.

This piece is the foundation for everything that follows in Inside the loop series. Inside the Loop is the technical deep-dive track of Mind in the Loop: each issue walks through a specific AI topic with enough mathematical and technical grounding to make the subject genuinely legible, not just familiar. The goal is not to train engineers. It is to make sure that anyone thinking seriously about this field understands what is actually happening inside the systems they are discussing, building or deploying. It does not assume a background in machine learning. It does assume an interest in understanding the actual mechanics of the systems reshaping how knowledge is produced, stored and retrieved, as well as a tolerance for precision over comfort.

Each term is defined once, as tightly as the concept permits. Where a common misconception exists, it is noted. Where a term carries different meanings in different contexts, the relevant distinction is drawn. The vocabulary is organized by conceptual layer, beginning with the mathematical building blocks and moving outward toward the systems, deployment practices and alignment techniques that determine how these models behave in the world.

A reader who works through this glossary will not become a machine learning practitioner. They will, however, be equipped to follow a technical argument without losing the thread when an author drops a term without definition (which is most of the time).

A note on this document: this glossary is a living reference. As the Inside in the Loop series covers new architectures, techniques and concepts, the relevant terms will be added here and linked back from the articles where they appear. The definitions below provide a working foundation and each reader is encouraged to treat them as a starting point for their own study, not a substitute for it. The field rewards the curious who go further.

The building blocks

Every neural network, regardless of its size or purpose, reduces to a small set of mathematical primitives. These are the terms that appear in every paper, every benchmark comparison and every discussion of what makes one model different from another.

Neural network

A computational system composed of layers of mathematical operations, loosely modeled on the structure of biological neurons. Data flows forward through the layers, gets transformed at each step and produces an output. The network learns by adjusting its internal numbers (its weights) until its outputs match a desired target. Every system discussed in this series, from the earliest image classifiers to the latest frontier language models, is a neural network at its core.

Parameter

Any learnable number inside a model. Parameters is the superset: it includes weights, biases, the scale and shift values inside normalization layers and the vectors inside embedding tables. When a model is described as having 70 billion parameters, that is the total count of all such numbers combined. More parameters mean more capacity to store learned patterns; they also mean more memory and more compute required at every step. Parameter count is the most commonly cited measure of model size and the least informative measure of what a model actually costs to run.

Weight

The specific type of parameter that controls how strongly one node in a network influences the next. A weight near zero means the connection carries almost no signal; a large positive or negative value means it is highly influential. Weights are what most people mean when they say “the knowledge in a model lives in its parameters”. Strictly speaking, it lives predominantly in its weights. The two terms are used interchangeably throughout the field, including in most research papers, which is why the conflation is so persistent. The distinction that matters in practice: weight decay (a regularization technique) specifically targets weights, not all parameters equally.

Bias

A second type of parameter, distinct from weights. While a weight scales an input, a bias shifts the output by a fixed amount, allowing a neuron to fire even when its inputs are zero. Every neuron in a standard network has one bias value. Biases give layers the flexibility to represent patterns that weights alone cannot capture.

Layer

A discrete transformation step in a neural network. Each layer takes a set of numbers, applies an operation and passes the result to the next layer. Early layers tend to capture low-level patterns; later layers capture higher-level abstractions. The number of stacked layers is what researchers mean when they describe a model as “deep”.

Tensor

The fundamental data structure of deep learning. A tensor is a multi-dimensional array of numbers. A single number is a zero-dimensional tensor; a list of numbers is a one-dimensional tensor (a vector); a matrix is two-dimensional. Most data flowing through a neural network (token embeddings, attention matrices and batch inputs) is a three- or four-dimensional tensor.

Logits

The raw, not normalized scores produced by the final layer of a model before any probability conversion. For a language model predicting the next token, logits are a vector with one number per item in the vocabulary. A high logit signals that the model considers that token likely. Logits are passed through the softmax function to produce probabilities during generation.

Dense architecture

A neural network in which every parameter activates for every input, every time. The full computational weight of the system engages regardless of whether the task requires it. Dense models are the historical baseline against which sparse architectures are measured; their inference cost scales linearly with parameter count.

The AI stack

The vocabulary of systems, deployment practices and interaction paradigms that govern how language models are used in production represents a second layer of terminology, distinct from model architecture but equally important for following contemporary AI coverage.

Large Language Model (LLM)

A model trained on a large text corpus to predict the next token. Models above roughly 7 to 10 billion parameters begin exhibiting qualitative capability jumps, including instruction following, multi-step reasoning and in-context learning, absent in smaller models. GPT-4, Claude, Gemini, Llama, DeepSeek and Mistral are all large language models. The term is increasingly imprecise as models become multimodal.

Prompt / prompt engineering

A prompt is the text input given to a language model. Prompt engineering is the practice of designing inputs that reliably produce desired outputs, specifying format, persona, examples or reasoning instructions. The discipline emerged because model behavior is highly sensitive to phrasing in ways that are not always predictable. At frontier model capabilities, its importance relative to model robustness is declining.

System prompt

A privileged prompt, typically invisible to the end user, that defines a model’s behavior, persona or constraints for a given deployment. Processed before the user’s message. Operators use it to configure assistant behavior, restricting topics, setting tone and providing product context. The system prompt cannot be fully protected from extraction by a sufficiently motivated user in most current implementations.

In-context learning (ICL)

The ability of a language model to learn a new task from examples provided in the prompt, without any weight updates. Show the model several input-output pairs and it generalizes to new cases without training. This property emerges at scale and is one of the more surprising capabilities of large Transformers. It is what makes few-shot prompting effective.

Few-shot / zero-shot

Zero-shot describes asking a model to perform a task with no examples, relying on pre-training knowledge alone. Few-shot provides a small number of examples (typically two to ten) before the actual query. Both are forms of in-context learning. Zero-shot performance on complex tasks is used as a benchmark of frontier model quality.

Chain-of-thought (CoT)

A prompting technique in which the model produces step-by-step reasoning before its final answer. Introduced by Wei et al. at Google in 2022. Chain-of-thought substantially improves performance on multi-step reasoning tasks by externalizing intermediate steps into the context window rather than compressing them into a single output.

Reasoning model

A class of large language model trained or prompted to spend additional compute on a problem before producing a final answer, typically through extended chain-of-thought, search over possible solution paths or process reward models. OpenAI o1 and o3, DeepSeek-R1 and Claude 3.7 Sonnet are examples. Reasoning models trade inference latency for accuracy on hard tasks. The distinction between reasoning and non-reasoning models is a 2024-2025 commercial framing rather than a sharp architectural boundary.

RAG (Retrieval-Augmented Generation)

An architecture that augments generation by first retrieving relevant documents from an external knowledge store and including them in the prompt as context. RAG addresses two limitations of pure language models: outdated knowledge and hallucination. Retrieval is typically done via embedding similarity search over a vector database.

Vector database

A database optimized for storing and querying high-dimensional embedding vectors via similarity search. Given a query embedding, a vector database returns the most semantically similar stored vectors. Used as the retrieval backend in RAG systems. Standard SQL databases are not efficient for high-dimensional vector similarity at scale.

Agent / agentic AI

A system in which a language model is given tools (search, code execution, file access) and operates in a loop: observe the environment, reason, act, observe the outcome and repeat, rather than producing a single response. Agents pursue multi-step goals with varying degrees of autonomy. The dominant challenge is reliability: errors compound over long action sequences. 2024 and 2025 marked the first period in which agentic systems were deployed at scale in production.

Tool use / function calling

The ability of a language model to invoke external functions or APIs during generation, including search, code execution and database queries. The model outputs a structured call specifying which tool to invoke and with what arguments; the result is returned as additional context. Tool use transforms a language model from a text predictor into a system capable of interacting with external services.

Hallucination

The phenomenon in which a model generates fluent, confident text that is factually wrong or entirely fabricated. Hallucination is structural: language models predict probable token sequences, not verified facts. High-confidence hallucinations (those where the model does not signal uncertainty) are the most consequential for production deployments. RAG, RLHF and fine-tuning reduce but cannot eliminate the problem.

Multimodal

A model that processes more than one type of data. Text plus image is the most common combination; text plus audio, image plus video and unified all-modality models are active research areas. The core engineering challenge is representing different data types in a shared token space. Multimodality is now a baseline expectation for frontier models.

Open-weight model

A model whose weights are publicly released, allowing download, local deployment and fine-tuning, but whose training code, data and full methodology may not be disclosed. This is the correct term for what media coverage routinely describes as “open-source.” Llama 3, Mixtral, DeepSeek-V3 and Kimi K2 are open-weight. The release of open-weight models compresses the commercial advantage of proprietary labs and accelerates independent research.

Benchmark

A standardized test used to measure and compare model capabilities. Common examples include MMLU (academic knowledge across subjects), HumanEval (code generation), MATH (mathematical reasoning), GPQA (graduate-level science questions) and the LMSYS Arena (human preference). Benchmarks are the common currency of capability claims in research papers and are regularly criticized as gameable, saturated or poorly representative of real-world usefulness. A model that tops published benchmarks is not necessarily the most useful in deployment.

How models learn

The mechanics of training (how a model’s parameters are adjusted from random initialization toward a useful configuration) are governed by a small set of algorithms and hyper-parameters. These terms appear in every discussion of model behavior, capability and failure mode.

Training

The process of adjusting a model’s parameters so that its outputs better match a target signal. The model processes a batch of data, produces predictions, measures its error via a loss function and uses back-propagation to adjust weights in the direction that reduces that error. This cycle repeats billions of times across a large dataset.

Inference

Using a trained model to produce outputs. No learning happens during inference because weights are frozen. Inference cost is what a deployment pays per user query; this is precisely why architectural choices that reduce per-token computation, like Mixture of Experts, carry such significant commercial stakes.

Forward pass

One complete run of an input through all model layers to produce an output. During inference, a forward pass produces a next-token prediction. During training, it produces a loss value used to run back-propagation. Per-token computational cost is determined by how many parameters activate during each forward pass.

Loss function

A measurement of how wrong the model’s current predictions are. The training process is an effort to minimize this number. For language models, the standard loss is cross-entropy over next-token predictions: the negative log-probability assigned to the correct next token. Lower loss means the model assigns higher probability to the right answer.

Back-propagation

The algorithm that computes how much each weight contributed to the current loss and in which direction it should be adjusted. It propagates the error signal backward through layers using the chain rule of calculus. Introduced to neural networks by Rumelhart, Hinton and Williams in 1986. Without it, training deep networks at scale would be computationally intractable.

Gradient descent

The optimization algorithm that uses back-propagation results to update weights. The gradient points in the direction of steepest increase in loss; gradient descent moves weights in the opposite direction by a step size controlled by the learning rate. Modern variants (Adam, AdamW) maintain per-parameter adaptive step sizes and are almost universally used in large language model training.

Learning rate

A scalar that controls how large each weight update step is. Too large: the model overshoots good solutions and training diverges. Too small: convergence is extremely slow. In practice, a learning rate schedule is used, warming up from a small value, peaking, then decaying. Choosing the learning rate correctly is one of the most consequential decisions in a training run.

Optimizer

The algorithm that translates gradients into weight updates. SGD (stochastic gradient descent) is the simplest. Adam maintains running estimates of gradient mean and variance per parameter, producing adaptive per-weight step sizes. AdamW adds weight decay. Virtually all frontier language models are trained with Adam or AdamW variants.

Batch / mini-batch

The number of training examples processed together before a single weight update is applied. Larger batches produce more stable gradient estimates but require more memory and reduce the number of updates per epoch. Smaller batches introduce noise that can help escape local optima. Batch size has significant effects on training dynamics and final model quality.

Epoch

One complete pass through the entire training dataset. Modern language models trained on trillions of tokens are often trained for less than one epoch: the dataset is so large that a full pass is never completed. Fine-tuning runs typically use multiple epochs on smaller curated datasets.

Overfitting

A failure mode in which a model learns the training data too precisely, including its noise, and performs poorly on new data. The model has memorized rather than generalized. Very large models can overfit even on trillion-token datasets if trained too long. Mitigations include dropout, regularization and early stopping.

Underfitting

A failure mode in which a model is too simple or undertrained to capture the patterns in the data, performing poorly on both training and test sets. In the large language model era, underfitting is typically a capacity problem (too few parameters) or a training budget problem (too few steps), not an algorithmic one.

Vanishing gradient

A training failure in which gradients become exponentially smaller as they propagate backward through many layers, making weights in early layers update negligibly or not at all. The central obstacle to training deep networks before ReLU activations and residual connections. Understanding it explains why both innovations mattered so much when they arrived.

Regularization

Any technique that reduces overfitting by discouraging the model from fitting too tightly to training data. Includes dropout, weight decay (penalizing large weight values) and data augmentation. In large language model training, weight decay implemented via AdamW is the dominant regularization mechanism.

Pre-training

The large-scale initial training phase in which a model learns from an enormous corpus of raw text without task-specific supervision. Pre-training teaches general language understanding, world knowledge and reasoning patterns. It requires the most compute. The result is a base model: capable but not yet aligned to follow instructions or produce consistently useful outputs.

Fine-tuning

A subsequent, smaller-scale training phase in which a pre-trained model is trained further on curated task-specific data. Fine-tuning adjusts behavior while preserving most pre-training knowledge. Instruction fine-tuning teaches the model to follow directions; RLHF and DPO are fine-tuning stages that incorporate human preference signals; LoRA and other parameter-efficient methods allow fine-tuning with minimal additional compute.

Language and representation

Before a neural network can process text, that text has to become numbers. The way that conversion happens and the vocabulary that describes it are shared across virtually every language model in existence today. The Transformer architecture, introduced by Vaswani et al. in 2017, is the structural frame within which these concepts live for most modern systems. But the terms in this section such as tokens, embeddings, context windows, training objectives, belong to the broader problem of how language gets represented inside a machine, not to any single architecture.

Transformer

A neural network architecture that processes all tokens in a sequence simultaneously via attention, rather than sequentially as earlier recurrent networks did. A Transformer is a stack of identical blocks, each containing a self-attention mechanism and a feed-forward network. Its parallelism made it far more efficient to train on modern GPU hardware than its predecessors. Every model named in this series is a Transformer.

Self-attention

The mechanism that lets each token in a sequence attend to every other token and gather relevant information from it. For each token, the mechanism computes three vectors: a Query (what am I looking for?), a Key (what do I offer?) and a Value (what will I contribute?). Attention scores are computed as the dot product of queries and keys, normalized by softmax. Self-attention layers are dense: every token attends to every other. Mixture of Experts does not touch these layers.

Feed-forward network (FFN)

The second major component of each Transformer block, applied after attention. A two-layer neural network applied independently to each token: expand to a wider hidden dimension, apply a non-linear activation and project back down. FFN layers store factual knowledge acquired during pre-training. In large models, FFN parameters account for the majority of total parameter count.

Token

The atomic unit of text that a language model processes. Tokens are not words: they are subword fragments produced by a tokenizer. Common words are typically one token; rare or long words split into several; punctuation and whitespace add more. As a rough approximation, 750 words in English equal roughly 1,000 tokens. The model never sees raw text; it processes only sequences of integer token IDs, each mapped to an embedding vector.

Tokenization

The process of converting raw text into a sequence of integer token IDs. Modern tokenizers use subword segmentation (breaking text at statistically meaningful boundaries). The tokenizer is fixed after its own training and is not updated during model training. Tokenizers trained on English-heavy data are less efficient for other scripts, requiring more tokens to represent the same content.

Vocabulary

The complete set of tokens a model recognizes, fixed at tokenizer training time. GPT-4 uses a vocabulary of roughly 100,000 tokens; LLaMA-3 uses approximately 128,000. Each token maps to a unique integer ID and a learned embedding vector. Vocabulary size determines the dimension of the logit vector the model produces at each generation step.

Context window

The maximum number of tokens a model can process in a single forward pass, input and output combined. Tokens outside the window are invisible to the model. A 128,000-token context window holds roughly 300 pages of text. Extending context windows without proportionate growth in compute cost is an active area of engineering, addressed by techniques like RoPE and sliding window attention.

Embedding

A dense numerical vector representing a token in high-dimensional space. Before the Transformer processes tokens, each integer ID is converted to an embedding vector, typically several thousand numbers long. The geometry of the embedding space carries meaning: semantically similar tokens cluster nearby. Embeddings are learned during training. The term “embedding model” (a model trained specifically to produce useful embeddings) is a distinct usage.

Causal language modeling

The training objective used by GPT-style decoder-only models. The model is trained to predict the next token given all previous tokens, but cannot see future tokens. This “causal masking” ensures no cheating during training. At inference, the model generates autoregressively: each new token is appended to the context and the next prediction follows.

Masked language modeling (MLM)

The training objective used by BERT-style encoder-only models. A random subset of tokens is replaced with a mask token and the model predicts the original values. Unlike causal language modeling, MLM lets the model attend to tokens on both sides of the masked position, making it better suited to understanding and classification tasks. BERT models cannot generate text in the autoregressive sense; GPT models cannot use MLM.

Activation functions

Every feed-forward network inside a Transformer contains an activation function between its two linear layers. That placement is not incidental: without it, the entire FFN collapses to a single linear transformation regardless of how many layers it contains. Activation functions are what give that component its expressive power and the same principle applies to every other layer in any deep network.

Activation function

Any mathematical function applied after a linear transformation to introduce non-linearity. Without activation functions, stacking layers provides no additional expressive power. The choice of activation function has measurable effects on training speed and model performance and has shifted significantly across the history of the field.

ReLU (Rectified Linear Unit)

The most historically prevalent activation function: f(x) = max(0, x). ReLU sets any negative input to zero and passes positive values unchanged. Its computational simplicity made it dominant from roughly 2012 to 2020 and resolved the vanishing gradient problem that had plagued earlier activations. Most large language models have since moved to smoother variants.

GELU (Gaussian Error Linear Unit)

A smooth approximation of ReLU that weights inputs by the probability of a Gaussian distribution rather than hard-clipping at zero. GELU allows small negative values to pass through with diminished magnitude. Used in BERT, GPT-2 and GPT-3, the smooth gradient is generally understood to aid training stability in deep networks.

SwiGLU

An activation function introduced by Noam Shazeer in 2020 that combines the Swish activation with a learned gating mechanism. The input vector is split in two, one half is activated and the result is multiplied element-wise by the other half. SwiGLU has become the dominant choice in modern large language model feed-forward layers, appearing in LLaMA, PaLM, Gemini and DeepSeek.

Sigmoid

An activation function that maps any input to a value between 0 and 1, following an S-shaped curve: f(x) = 1/(1+e⁻ˣ). Historically used in binary classification and early recurrent networks, sigmoid was largely replaced in deep networks by ReLU and its variants because it saturates near its extremes, producing near-zero gradients that prevent effective learning in early layers.

Softmax

A function that converts a vector of arbitrary numbers into a probability distribution, with all values between 0 and 1 summing to 1. It amplifies differences between inputs: a score of 4.7 against a competing score of 0.3 becomes near-certainty against near-zero after softmax. Used in Transformer attention to normalize attention scores and in language model generation to convert logits into token probabilities.

Architecture patterns

The Transformer’s design draws on a set of structural solutions that predate it and extend well beyond it. Some like residual connections, normalization and dropout are general-purpose techniques applicable to any deep network. Others like multi-head attention, cross-attention, positional encoding and the KV cache are specific mechanisms that make the Transformer work at scale. Both sets appear constantly in architecture papers and are worth holding as distinct concepts.

Residual connection (skip connection)

A shortcut that adds a layer’s input directly to its output, bypassing the layer’s own transformation: output = layer(x) + x. Introduced in ResNet by He et al. in 2015, residual connections solved the vanishing gradient problem in very deep networks by providing gradients a direct path backward through the architecture. Every Transformer block uses residual connections around both the attention layer and the feed-forward layer.

Layer normalization (LayerNorm)

A normalization operation that rescales a layer’s activations to have zero mean and unit variance, applied independently to each input in a batch. It stabilizes training in deep networks by preventing activations from growing or shrinking uncontrollably as they pass through many successive layers. LayerNorm is not specific to Transformers; it appears in recurrent networks, graph neural networks and other deep architectures wherever training instability from activation scale is a concern. In Transformers specifically, modern variants apply LayerNorm before each sub-layer rather than after, a placement that improves stability at scale.

Batch normalization (BatchNorm)

A normalization technique that standardizes activations across the batch dimension rather than across the layer. The dominant normalization approach for convolutional networks and early deep architectures. Less suitable for language models with variable sequence lengths, which is why Transformers use layer normalization instead.

Dropout

A regularization technique in which, during training, each neuron’s output is randomly set to zero with a chosen probability (typically between 0.1 and 0.5). This forces the network to learn redundant representations, preventing any single neuron from becoming critical and reducing overfitting. Dropout is disabled at inference time; very large language models often use little or none.

Encoder

A network component that converts an input (text, image or audio) into a dense internal representation: a compressed, information-rich vector. The encoder does not produce a prediction; it produces a representation that other components can use. BERT is an encoder-only model, capable of representing text but not generating it.

Decoder

A network component that generates output one element at a time, conditioned on a representation. GPT-style models are decoder-only: the decoder reads all previous tokens and predicts the next one, autoregressively. In encoder-decoder architectures (the original Transformer for translation), the decoder also attends to the encoder’s output via cross-attention.

Multi-head attention (MHA)

An extension of self-attention that runs multiple attention computations in parallel, each with its own learned weight matrices. Each “head” attends to the input from a different learned perspective. Their outputs are concatenated and projected into a single representation. GPT-4 uses 96 attention heads; standard models use between 32 and 64.

Cross-attention

An attention mechanism in encoder-decoder models in which the decoder attends to the encoder’s representations rather than only to its own previous outputs. The query vectors come from the decoder; the key and value vectors come from the encoder. This is the mechanism through which translation models “read” the source sentence while generating the target language.

Positional encoding

A mechanism that injects sequence-order information into token embeddings before they enter the Transformer. The attention mechanism is position-agnostic by design, treating tokens as a set rather than a sequence; positional encodings restore the order. The original Transformer used fixed sinusoidal encodings. Modern models use learned relative positional encodings, particularly RoPE (Rotary Position Embedding), which generalize better to sequences longer than those seen during training.

KV cache (Key-Value cache)

An inference optimization that stores the computed key and value vectors for all previous tokens so they do not need to be recomputed when generating each new token. Without a KV cache, generating a 1,000-token response would require 1,000 full forward passes through the model. With it, each new token requires only one. KV cache size scales with context length and is a primary memory bottleneck at inference time.

Generative architectures

Large language models are not the only family of generative AI systems. Three other major architectures (GANs, VAEs and diffusion models) shaped the field’s development and remain central to image, video and audio generation. They share underlying concepts while diverging sharply in their mechanics.

Autoregressive model

A model that generates output one element at a time, each conditioned on all previously generated elements. GPT-style models are autoregressive over tokens. The mechanism is serial: predict token N+1 from tokens 1 through N, append it, then predict N+2. Autoregressive generation cannot be parallelized across the output, which is a fundamental inference speed constraint that techniques like speculative decoding address.

GAN (Generative Adversarial Network)

An architecture introduced by Goodfellow et al. in 2014 in which two networks train against each other. A generator learns to produce realistic outputs; a discriminator learns to distinguish real from generated. The generator improves by fooling the discriminator; the discriminator improves by catching the generator. GANs dominated image synthesis from 2014 to roughly 2021 before diffusion models surpassed them in quality and training stability. GAN training is notoriously prone to mode collapse, where the generator learns to produce only a narrow range of outputs.

VAE (Variational Autoencoder)

A generative model that learns a compressed latent representation of data by training an encoder and decoder simultaneously, with a regularization term constraining the latent space to follow a known probability distribution. This allows generating new samples by sampling from the distribution. VAEs underpin the latent diffusion approach used in Stable Diffusion.

Diffusion model

A generative architecture that learns to reverse a gradual noising process. Training: incrementally add Gaussian noise to data until it becomes pure noise, and train a neural network to predict and remove that noise. Inference: start from pure noise and iteratively denoise. Diffusion models achieved state-of-the-art image quality in DALL-E 3, Midjourney and Stable Diffusion and have since been applied to audio, video and molecular design.

Latent diffusion

A variant of diffusion models that operates in the compressed latent space of a VAE rather than in pixel space. The insight is that diffusing and denoising a low-dimensional latent vector is far cheaper than operating on a full-resolution image, with minimal quality loss. Stable Diffusion is a latent diffusion model. The technique has become standard in image and video generation systems.

Latent space

The internal numerical space in which a model encodes compressed representations of inputs. A point in latent space corresponds to a combination of learned features. The geometry carries meaning: nearby points decode to similar outputs; arithmetic in the space can have semantic significance, a property first systematically documented in Word2Vec representations. Controlling and navigating the latent space is a central design challenge in generative model development.

CLIP (Contrastive Language-Image Pretraining)

A model developed by OpenAI in 2021 that aligns visual and language representations in a shared embedding space. Trained on image-text pairs, CLIP pulls matching pairs together and pushes mismatched pairs apart in the shared space. It became a foundational component in text-to-image systems, used to guide diffusion models toward a target text prompt. CLIP-style contrastive training is now widespread in multimodal model design.

Language generation

Once a language model exists, a set of decisions governs how it produces text. These are not architectural choices; they are operational ones, made at inference time and having significant effects on output quality, diversity and reliability.

Temperature

A scalar applied to logits before softmax during generation that controls randomness. Temperature below 1 sharpens the distribution, making the most likely tokens even more probable and output more deterministic. Temperature above 1 flattens it, giving less probable tokens more chance and output more variety. Temperature of 0 is greedy decoding; temperature of 1 samples from the raw model distribution.

Top-p sampling (nucleus sampling)

A sampling strategy that, at each generation step, restricts choices to the smallest set of tokens whose cumulative probability exceeds a threshold p (typically 0.9 or 0.95) and samples from that set. Unlike top-k, the set size adapts to the distribution’s shape: a confident distribution produces a small nucleus; a flat distribution produces a large one. Top-p is the most widely used sampling method in deployed language models.

Top-k sampling

A sampling strategy that restricts generation to the k most probable tokens at each step, zeroing out all others before sampling. Simpler than top-p and widely used in combination with it. Setting k too low risks repetition; too high reintroduces improbable tokens. Most deployed systems use top-p and temperature as primary controls, with top-k as an optional additional constraint.

Beam search

A decoding strategy that maintains the top-k most probable partial sequences at each step and expands all of them, keeping only the best k. Deterministic and tending toward high-probability but generic output, beam search dominated before sampling strategies were preferred for open-ended tasks. It remains widely used in translation and summarization.

Greedy decoding

The simplest generation strategy: select the highest-probability token at each step. Fast and deterministic, but produces repetitive and often degenerate output for open-ended generation. Equivalent to temperature of 0. Used in constrained settings where determinism matters more than creativity.

Perplexity

A metric for evaluating language model quality that measures how surprised the model is by a held-out text. Formally, it is the exponentiated average negative log-probability per token. Lower is better. A perplexity of 10 means the model is, on average, as uncertain as if it had to choose uniformly among 10 equally probable tokens at each step. Perplexity is used to compare models trained on identical data distributions; cross-distribution comparisons are unreliable.

Repetition penalty

A generation parameter that reduces the probability of tokens that have already appeared in the output. Applied by dividing repeated tokens’ logits by a penalty factor before sampling. Without this mechanism, autoregressive models are prone to degenerate repetition loops, particularly at low temperature settings.

Efficiency and compression

The cost of training and running large models has driven a substantial engineering literature focused on doing more with less. These techniques are not theoretical; they are production-critical and appear in the technical reports of every major model release.

Quantization

The process of representing model weights at reduced numerical precision to shrink memory footprint and accelerate computation. Full precision uses 32-bit floats (FP32); common alternatives include FP16, BF16 (a 16-bit format that matches FP32 dynamic range), INT8 and INT4. Going from FP32 to INT8 roughly halves memory and increases inference throughput. BF16 has become the default training and inference precision for most frontier models.

Pruning

The process of zeroing out or removing weights that contribute minimally to model outputs. A pruned model has fewer effective parameters, reducing memory and compute. Structured pruning removes entire components (attention heads, layers); unstructured pruning zeros individual weights. Pruning is harder to apply to large language models without significant quality loss and is less widely deployed than quantization.

Knowledge distillation

A training technique in which a smaller student model learns to mimic the output distributions of a larger teacher model rather than learning directly from ground-truth labels. The teacher’s soft probability distributions carry richer information than hard labels. Distillation is how organizations deploy capable small models without replicating the full training cost of frontier systems.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that freezes the original model weights and adds small, trainable low-rank matrices to each layer. Rather than updating all parameters, LoRA trains only a fraction of the total, often below 1%, reducing memory requirements for fine-tuning by a factor of three to ten. The LoRA matrices can be merged back into the original weights at inference time with no additional cost. Introduced by Hu et al. in 2021.

PEFT (Parameter-Efficient Fine-Tuning)

A category of fine-tuning techniques that modify only a small fraction of a model’s parameters. LoRA is the most widely used; others include prefix tuning, prompt tuning and adapter layers. PEFT makes fine-tuning tractable on consumer hardware by avoiding the memory requirements of full fine-tuning runs.

Mixed precision training

A training technique that uses lower-precision formats (FP16 or BF16) for most computations while maintaining a full-precision master copy of weights for the gradient accumulation step. Lower precision means faster matrix multiplications and less memory; the full-precision copy prevents numerical errors from compounding. Mixed precision is standard in all frontier model training.

Flash Attention

A hardware-aware exact attention algorithm developed by Dao et al. in 2022 that computes standard self-attention with significantly less memory and faster wall-clock time. It achieves this by tiling the computation to fit within GPU SRAM rather than reading repeatedly from slower high-bandwidth memory. Flash Attention produces identical outputs to standard attention: it is an implementation optimization, not an approximation. Flash Attention 2 and 3 are now universal in frontier model training and inference.

Speculative decoding

An inference technique that uses a small draft model to generate multiple candidate tokens in parallel, then verifies them in a single forward pass of the larger target model. Where the large model accepts the drafts, multiple tokens are produced for roughly the cost of one. Where it rejects them, the system falls back to standard generation. Speculative decoding achieves two- to three-fold inference speedups with no change to output quality.

Scale and compute

The hardware, measurement units and distribution strategies that govern how large models are trained and deployed carry their own vocabulary. These terms appear constantly in research papers and hardware announcements.

Scaling laws

Empirical relationships between model performance and three variables: parameters, training data and compute. Characterized rigorously by Kaplan et al. at OpenAI in 2020 and revised by Hoffmann et al. at DeepMind in 2022 (the Chinchilla paper). Scaling laws predict that loss decreases smoothly and predictably as any of the three variables increases. The Chinchilla result demonstrated that most pre-2022 models were undertrained relative to their size: compute was being spent on more parameters rather than more data.

GPU (Graphics Processing Unit)

The hardware that executes AI training and inference. GPUs were designed for the parallel floating-point operations of graphics rendering; that same parallelism makes them ideal for the matrix multiplications that dominate neural network computation. NVIDIA H100 and H800 dominated frontier model training through 2024. The GB200 NVL72 rack system, deployed in 2026, was architected specifically for Mixture of Experts workloads.

TPU (Tensor Processing Unit)

Google’s custom AI accelerator, designed specifically for the matrix multiplications in neural network training. TPUs are faster and more power-efficient than GPUs for the specific operations that dominate Transformer training. Google trains most of its frontier models on TPU pods. Unlike GPUs, TPUs are largely unavailable outside Google Cloud infrastructure.

CUDA

NVIDIA’s parallel computing platform and programming model, released in 2007, that made GPU hardware accessible for general-purpose scientific computation. Before CUDA, using GPU hardware required reformulating problems as graphics operations. CUDA unlocked GPUs for deep learning. Most major deep learning frameworks, including PyTorch and JAX, are built on top of it. NVIDIA’s sustained dominance in AI hardware is inseparable from CUDA’s ecosystem.

FLOP / FLOPs (Floating-Point Operation)

A single arithmetic computation on a floating-point number. FLOPs count the total operations in a forward or training pass, serving as the standard unit for comparing computational cost across models independent of hardware. Training GPT-3 required roughly 3.1×10²³ FLOPs. Frontier models in 2024 and 2025 require between 10²⁴ and 10²⁵. FLOPs measure computational demand; FLOP/s measures how fast a piece of hardware delivers it.

Model parallelism

A distributed training strategy that splits the model’s layers or components across multiple GPUs when the full model does not fit on a single device. Tensor parallelism splits individual weight matrices across devices; pipeline parallelism assigns groups of layers to different devices. Training frontier models requires combining multiple parallelism strategies simultaneously.

Data parallelism

A distributed training strategy that replicates the entire model on multiple GPUs, each processing a different subset of the training batch, with gradients averaged across devices before each weight update. The simplest form of distributed training. When the model does not fit on one device, data parallelism alone is insufficient.

Expert parallelism

A distributed training and inference strategy specific to Mixture of Experts models. Expert weights are distributed across GPUs rather than replicated: each device hosts a subset of experts. When a token is routed to an expert on a different device, it is transmitted there. NVIDIA’s Dynamo framework handles expert parallelism as a first-class workload in its current generation hardware.

Matrix multiplication

The fundamental mathematical operation of deep learning. Linear layers, attention mechanisms and feed-forward networks are, at their mathematical core, sequences of matrix multiplications. GPUs and TPUs are engineered specifically for this operation. The cost of a forward pass is largely determined by the size and number of matrix multiplications it requires.

Alignment and training methods

Turning a pre-trained base model into a system that reliably follows instructions, declines harmful requests and communicates honestly requires a distinct set of training techniques. These are not capability methods; they are behavioral ones.

RLHF (Reinforcement Learning from Human Feedback)

A fine-tuning technique that incorporates human preference signals into model training. Human raters compare pairs of model outputs and indicate which is better. A reward model learns to predict those preferences. The language model is then trained via reinforcement learning to maximize the reward. RLHF is responsible for the transformation from capable but raw base models to instruction-following assistants. It is used by OpenAI, Anthropic, Google and most major labs.

DPO (Direct Preference Optimization).

A fine-tuning algorithm introduced by Rafailov et al. in 2023 that achieves RLHF-equivalent results without a separate reward model or a reinforcement learning loop. DPO optimizes the language model directly on preference pairs (chosen versus rejected responses) using a classification-style objective. It is simpler, more stable and more memory-efficient than full RLHF. DPO and its variants have largely displaced full RLHF pipelines at many organizations.

Alignment

The problem of ensuring that AI systems behave in accordance with human values and intentions. In practice, alignment work covers instruction-following without deception, honesty about uncertainty, refusal of harmful requests and avoidance of unintended goal pursuit. RLHF, DPO and Constitutional AI are alignment techniques. The distinction between capability research and alignment research is one of the organizing tensions in the field.

Constitutional AI (CAI)

An alignment technique developed by Anthropic in which a model critiques and revises its own outputs according to a written set of principles (a “constitution”) rather than relying solely on human feedback labels. The model generates a response, identifies ways it violates the constitution and revises it. This reduces the need for human labeling on harmful outputs. Constitutional AI is the primary alignment method behind Claude.

Red-teaming

The practice of deliberately attempting to cause a model to produce harmful, dishonest or undesirable outputs, in order to identify failure modes before deployment. Red-teamers probe for jailbreaks, prompt injections, dangerous knowledge leakage and systematic bias. Internal red-teaming is standard in frontier model development; external programs invite the broader research community to participate.

Guardrails

Constraints applied to model inputs or outputs (either within training or as post-hoc filters) to prevent harmful output. Input guardrails block dangerous prompts before they reach the model; output guardrails filter or flag model responses. In deployed systems, guardrails are the operational layer of alignment: the gap between what a model is capable of producing and what it is permitted to produce in production.

The Open Revolution: How AI and Vibe Coding Are Rewriting the Rules of Open Source

Riccardo Gatti — Mon, 30 Mar 2026 17:11:02 GMT

Every generation of software has produced the same confrontation: a proprietary incumbent, well-funded and deeply embedded, faces a community-built alternative that should not, by any conventional measure, be competitive. The incumbent has engineering teams, sales forces, support contracts and switching costs carefully constructed over years. The community has a mailing list and a shared conviction that software works better when it is open. The incumbent wins every boardroom argument. The community wins anyway.

Linux did it to UNIX. PostgreSQL did it to Oracle. Mozilla did it to Netscape’s browser monopoly. In each case the mechanism was the same: lower cost of access, faster collective iteration and a license structure that made forking more attractive than surrender. The pattern is consistent enough to constitute a law of the industry.

AI is about to run that law at a speed the industry has never experienced. The cost of creating open-source software is collapsing, the global contributor base is expanding at record pace and the make-versus-buy calculation that has protected commercial software vendors for decades is shifting underneath them. That is the opportunity. The threat, simultaneously, is that the same forces are generating internal contradictions the ecosystem has never had to navigate. Understanding both sides of that equation starts with understanding how open source has always won.

Part I: How Open Source Wins

To understand where AI takes open source next, it is necessary to understand how open source wins. Three episodes illustrate this concretely: a community-built project confronts an entrenched, well-funded proprietary incumbent and, over years, makes it irrelevant. The mechanism is consistent across all three. Lower cost of access, faster collective iteration and a license structure that makes forking more attractive than surrender combine into something no corporate roadmap can easily counter. The pattern, once recognized, makes the current moment considerably more legible.

Linux Eats Unix: The Original Disruption

In 1969, AT&T Bell Labs developed UNIX, a powerful and portable operating system that became the backbone of commercial computing. By the 1980s, UNIX was proprietary gold. Vendors like Sun Microsystems, Digital Equipment Corporation and Silicon Graphics each sold their own incompatible UNIX variant at prices that locked it firmly inside corporate data centers and university budgets. It was the defining example of expensive, closed infrastructure: powerful, indispensable and entirely controlled by its owners.

In 1991, a 21-year-old Finnish computer science student named Linus Torvalds posted a message to a Usenet newsgroup that would change everything. Working from his bedroom at the University of Helsinki, using a cheap PC and a UNIX-like academic system called Minix as a reference point, Torvalds announced he was building his own kernel. “Just a hobby,” he wrote, “won’t be big and professional.” He published the Linux kernel under an open license, invited the world to improve it and stepped back.

What followed was the first large-scale proof of the open-source model’s power. Thousands of programmers contributed patches, drivers and improvements across the internet. Richard Stallman’s GNU project had spent years building a fully free UNIX-compatible operating system and had all the necessary tools, but lacked a working kernel. Linux provided exactly the missing piece. GNU/Linux distributions began appearing, and suddenly a free, community-built alternative to commercial UNIX existed, one that ran on cheap commodity hardware rather than expensive workstations.

The commercial UNIX vendors dismissed it. Then they watched, increasingly alarmed, as Linux began appearing on servers, then in data centers, then powering the infrastructure of the early internet. Today it runs approximately 96% of the world’s top one million web servers, powers every Android device on the planet and underlies the cloud infrastructure of Amazon, Google and Microsoft. The proprietary UNIX variants it displaced are largely dead or irrelevant.

The SCO Group’s 2003 lawsuit, claiming that IBM had donated UNIX code into the Linux kernel, was the last serious legal attempt to strangle Linux in its cradle. IBM, Novell, Red Hat fought back and SCO lost. What had begun as a student hobby project had overthrown an entire class of commercial software, not through better marketing or more funding, but through radical openness.

The Netscape Moment: Naming the Movement

By 1995, Netscape Navigator owned the web browser market. It was a paid commercial product with no serious competition. Then Microsoft built Internet Explorer into Windows and offered it for free. Netscape’s revenues collapsed.

Facing existential pressure, Netscape made a radical decision in 1998: release the Navigator source code to the public. The move was partly inspired by Eric S. Raymond’s landmark 1997 essay “The Cathedral and the Bazaar”, which argued that the decentralized, iterative model of open development produced software more robust and responsive than the carefully controlled Cathedral of proprietary development. Netscape’s release lit a spark in the developer community and a broader strategic argument followed: if open collaboration could defeat a corporate software giant, it needed a name that companies could actually adopt.

A group of technologists gathered in Palo Alto, frustrated that the term “free software” carried ideological baggage that companies like Netscape were uncomfortable with. Christine Peterson proposed “open source.” Linus Torvalds gave his approval. The Open Source Initiative was born shortly thereafter.

Netscape itself never recovered. It was acquired by AOL and discontinued in 2008. The Mozilla project it spawned, however, became Firefox, one of the most consequential browsers in history and the institutional home of modern open-source web development.

PostgreSQL vs. Oracle: David Beats Goliath

Perhaps no open-source story better illustrates the movement’s power than the long, patient insurgency of PostgreSQL against Oracle.

PostgreSQL traces its lineage to a 1986 research project at UC Berkeley led by Professor Michael Stonebraker. It became publicly available in 1996, adding SQL support just as internet-based development was beginning to explode. At the time, the idea of an open-source database competing with Oracle, a billion-dollar behemoth whose licenses could run into the hundreds of thousands of dollars, seemed absurd. Oracle had been the unchallenged king of enterprise data since Larry Ellison, Bob Miner and Ed Oates launched the first commercial SQL database in 1979. Its lock-in was deep and its licensing costs were a matter of boardroom-level concern for every company that used it.

Decades later, the picture looks entirely different. Oracle’s popularity among developers has been declining steadily, the company now ranking eighth in developer preferences, far behind PostgreSQL, which tops the Stack Overflow Developer Survey as the most popular relational database. PostgreSQL’s global community of contributors has built software technically comparable to Oracle’s best offerings, at zero licensing cost and with no single corporate entity able to hold it hostage.

Oracle’s own complicated relationship with open source reinforces the lesson. After acquiring Sun Microsystems in 2010, Oracle became the owner of both the most popular proprietary database and the most popular open-source database at the time, MySQL. Oracle’s subsequent decisions, restricting features to paid tiers and neglecting the community edition, fractured the MySQL community. Developers forked the project into MariaDB and Percona. OpenOffice, another Sun acquisition, was effectively abandoned and had to be rescued by the community as LibreOffice. The pattern was consistent: when a corporation tries to capture open-source software for profit, the community forks and moves on.

Part II: From Idealism to Infrastructure (2000–2020)

For much of the 1990s, open source was treated as an ideological project. By 2000, it was becoming something more consequential: the default substrate of the commercial internet. The transition happened faster than most incumbents expected.

Red Hat’s IPO in August 1999 was the first signal that open source could generate serious institutional capital. The company, which built a business around supporting and distributing Linux, saw its stock price rise 272% on its first day of trading, one of the largest opening-day gains in Wall Street history at the time. It demonstrated that “free software” and “profitable company” were not contradictions. Open source could be a business model.

GitHub’s founding in 2008 accelerated the institutionalization further. By making collaborative code development as frictionless as social networking, GitHub transformed open source from a discipline practiced by dedicated communities into an ambient feature of software engineering. Millions of developers who would never have navigated mailing lists and patch submissions were suddenly contributing to public repositories. The social layer that open source had always needed was finally built.

The decisive confirmation came from the cloud. Amazon Web Services, Google Cloud and Microsoft Azure constructed their hyperscale infrastructure almost entirely on open-source components, Linux at the base, with Kubernetes, PostgreSQL, Redis, Kafka and dozens of other community-built projects stacked above it. These companies generated hundreds of billions of dollars in revenue from software they did not write and did not own. The arrangement was not lost on the open-source community, and it seeded a lasting tension between commercial exploitation and community sustainability that the AI era has since made acute.

By 2020, open source had moved from the fringes of software development to its center. It was no longer a counterculture. It was critical infrastructure.

Part III: The Scale of What Was Built

Before understanding what AI might do to open source, it is necessary to appreciate the staggering scale of what already exists.

A 2024 Harvard Business School study by economists Manuel Hoffmann, Frank Nagle and Yanuo Zhou produced what may be the single most important number in software economics. Companies would need to spend 3.5 times more on software than they currently do if open-source software did not exist, an aggregate hidden value estimated at $8.8 trillion. That figure represents the accumulated labor of millions of volunteer contributors who built the invisible foundation of the modern digital economy, entirely invisible to GDP measurements.

The Black Duck 2026 Open Source Security and Risk Analysis Report adds another dimension: 98% of commercial codebases incorporate open-source components, with the average application drawing on more than 1,100 of them. Open source is not an alternative to commercial software. It is its substrate.

GitHub’s Octoverse 2025 report recorded approximately 36 million new developers joining the platform in 2025 alone. India contributed 5.2 million of those, with Brazil, Indonesia, Japan and Germany also posting significant gains. The open-source community is becoming dramatically more global, and the implications for governance, norms and collaboration are still playing out in real time.

What the Harvard figure does not capture is the forward-looking implication. If AI tools lower the cost of producing open-source software by an order of magnitude, the $8.8 trillion in hidden value the ecosystem already represents is not a ceiling. It is a baseline. Every proprietary product category that has so far survived because building a comparable open alternative was too expensive becomes newly vulnerable. The make-versus-buy calculation that has historically favored purchasing commercial software licenses is shifting … and it is shifting fast.

Part IV: Enter AI — The Cost of Creation Drops to Near Zero

In February 2025, AI researcher Andrej Karpathy, a co-founder of OpenAI and former AI Director at Tesla, coined a term that would become Collins English Dictionary’s Word of the Year: “vibe coding.” He described it as fully giving in to the vibes, embracing exponentials and letting AI write the code while the human guides the outcome in natural language.

For open source, the most significant implication is that the marginal cost of writing code has collapsed. The barrier that historically filtered serious contributors from casual ones, the years of practice required to write functional and readable software, has been dramatically lowered. The consequences run in several directions at once.

The democratization argument is genuine and substantial. For the first time, a developer in rural Indonesia or a student in São Paulo with limited formal training can understand an unfamiliar codebase, identify an issue, draft a patch and submit a pull request, a sequence of actions that previously required years of experience. GitHub’s own analysis attributes much of its record 36-million-developer growth in 2025 partly to AI enabling new contributors to participate sooner. The open-source community is gaining a genuinely global contributor base, and AI is the translation layer making it possible.

The acceleration of new projects is equally significant, if less discussed. Ideas that would have required a funded team can now be prototyped by a single motivated individual over a weekend. This may represent the most underappreciated consequence of AI for the ecosystem: not just more contributions to existing projects, but an explosion of new ones, many addressing problems that wealthy markets never prioritized.

There is also a structural irony worth noting. The AI tools now accelerating open-source contribution were themselves built on open-source foundations, specifically Python, PyTorch, TensorFlow and Linux. Meta’s LLaMA series, Mistral and the growing constellation of community models hosted on Hugging Face represent a powerful counter-force to closed-source AI dominance. As one 2025 analysis of the AI landscape observed, the open-source movement acts as a powerful accelerator and equalizer in AI development, preventing the complete consolidation of AI capability within a few proprietary players.

Part V: The Backlash and the Real Challenge

History rarely delivers clean victories, and the story of AI and open source is no exception. The same forces enabling democratization are generating a new class of problems that threaten the ecosystem’s foundations.

The maintainer crisis is already documented and severe. Daniel Stenberg shut down cURL’s six-year bug bounty program after AI-generated submissions climbed to 20% of the total, with the valid submission rate collapsing to just 5%. Mitchell Hashimoto banned AI-generated code from the Ghostty terminal project without prior approval. Steve Ruiz closed all external pull requests to tldraw after discovering that AI scripts had generated poorly written issues that other contributors’ AI tools then used to generate hallucination-based pull requests. GitHub’s own 2026 analysis described the situation as analogous to a denial-of-service attack on human attention: auto-generated issues and pull requests flooding projects without increasing their quality.

The economic dimension of the problem is more structural still. A January 2026 working paper by economists Miklos Koren, Gabor Békés, Julian Hinz and Aaron Lohmann, published on arXiv, argued that vibe coding threatens open-source sustainability not through malice but through the disappearance of engagement signals. When developers use AI agents to select and assemble open-source packages without reading documentation, filing bugs or engaging with maintainers, the feedback loops that sustain OSS economics silently decay. Tailwind CSS saw its documentation traffic fall 40% from early 2023 despite growing usage, and its revenue fall by nearly 80%. Separately, Stack Overflow activity fell by 25% within six months of ChatGPT’s launch, per research published in PNAS Nexus. The signal that tells maintainers what is broken, what is confusing and what users need most is evaporating.

The picture on pure productivity is more complicated than the prevailing narrative suggests. METR, an organization that evaluates frontier AI models, ran a rigorous randomized controlled trial in 2025 involving 16 experienced developers working on large, mature open-source repositories. When developers were allowed to use AI tools like Cursor Pro with Claude 3.5/3.7 Sonnet, they took 19% longer to complete tasks than without AI assistance. This directly contradicted the developers’ own predictions (who forecast AI would save 20 to 24% of time) and the forecasts of expert economists (39% faster) and ML researchers (38% faster). The full paper on arXiv notes this likely reflects the demands of mature, high-quality codebases with implicit standards, the exact environment where open-source infrastructure lives.

AI appears to dramatically accelerate greenfield development of new and simpler projects while providing unclear benefit for the careful, deep work of maintaining existing complex infrastructure. The democratization may be real and powerful at the edges of the ecosystem. The core remains as difficult as ever to sustain.

Part VI: The Make-vs-Buy Reckoning

The historical disruptions described in Part I shared a common trigger: the moment when the cost of building a comparable open alternative dropped below the cost of tolerating a proprietary vendor’s pricing, lock-in or strategic indifference. Linux crossed that threshold for UNIX in the mid-1990s. PostgreSQL crossed it for Oracle in the 2010s. AI is about to compress that timeline dramatically across an entirely new set of product categories.

The make-versus-buy calculation that has governed enterprise software procurement for decades is shifting in a way that most technology leaders have not fully internalized. When a skilled engineering team required six months to build a functional alternative to a SaaS product, the license fee was almost always the rational choice. When that same team, augmented by AI coding tools, can produce a working prototype in two weeks and a production-ready system in two months, the calculus changes entirely. The proprietary vendor’s proposition, the convenience premium that justifies its pricing, erodes. Fast.

This is the dynamic that every CTO should be watching closely. The velocity at which a commercially licensed software product can be displaced by an open alternative is no longer determined primarily by the size of the community willing to build one. It is determined by the cost of AI-assisted development, which is falling at roughly 30% per year in compute terms alone. Products that appeared safely entrenched eighteen months ago are becoming contestable. Some will be contested.

The open-source movement has always been strongest when it addressed problems that proprietary vendors treated as solved and therefore stopped investing in. AI compounds this vulnerability significantly. A motivated community with access to modern coding tools can now assess a commercial product, identify its weakest surface and ship a credible open alternative in a timeframe that leaves incumbents with limited room to respond. The window between “emerging open-source threat” and “category displacement” is narrowing.

None of this means that every proprietary software company faces immediate existential pressure. Enterprise relationships, compliance requirements, support contracts and integration depth create real switching costs that pure technical capability cannot dissolve overnight. What it does mean is that the margin of safety that proprietary vendors have historically enjoyed, the gap between what they charge and what an open alternative can deliver, is compressing faster than most product roadmaps are built to accommodate.

The question a thoughtful CTO should be asking is not whether their current stack includes open-source components. At 98% of commercial codebases, that question answers itself. The more urgent question is whether the proprietary products in their portfolio are differentiated enough to survive a well-resourced open-source alternative, built in a fraction of the time and at a fraction of the cost that would have been required three years ago. For a growing number of product categories, the answer is becoming less certain by the quarter.

Part VII: The Governance Gap

The open-source movement has spent fifty years defeating proprietary incumbents by being more adaptable, more global and ultimately more innovative than any single company could be. It now faces a different kind of challenge, one that does not come from Oracle or Microsoft, but from the dynamics of its own success.

As GitHub cautioned in its February 2026 outlook, the community faces not just technical challenges but organizational ones. The tooling to write software has never been more accessible. The missing layer is governance, documentation and community support, human problems that no language model can fully solve. The question for the ecosystem going forward is not how much it will grow. It is whether the structures exist to make that growth sustainable.

The historical record, nevertheless, offers a baseline for measured optimism. Every major disruption to open source, from the SCO Group’s legal threats against Linux in 2003 to Oracle’s acquisition of MySQL in 2010 to the relentless “embrace, extend, extinguish” playbook of large corporations, was absorbed, adapted to and ultimately strengthened the movement. The community forked, rebuilt and moved on.

The cost of creation has fallen to near zero. That is, historically, the condition under which open source thrives most aggressively. More people can contribute, more problems can be addressed and more of the world’s population can participate in building the digital infrastructure of the future. The open-source movement is not merely surviving the AI transition. It is the primary mechanism through which AI will redistribute technological power away from incumbents and toward builders.

The engineers who design the governance layer that channels this energy, rather than letting it collapse into noise, will define the next era of the industry. Every technology leader waiting to see how this resolves before forming a view is already behind.

The $200 Bet: Anthropic Is Subsidizing You Today to Own You in 2028

Riccardo Gatti — Fri, 13 Mar 2026 17:59:28 GMT

The math that does not add up … until it does

On the surface, it looks like a bad deal for Anthropic. A Claude Max 20x subscriber pays $200 a month. In return, they receive roughly 900 messages per five-hour window, long agentic sessions, full Claude Code access and priority access to Anthropic’s most powerful models. Measured against raw API rates (Claude Opus 4.6 priced at $5 per million input tokens and $25 per million output tokens ) a genuinely heavy user of Claude Code could plausibly consume thousands of dollars worth of compute in a single month.

On January 5, 2025, OpenAI chief executive Sam Altman posted on X that his company was “currently losing money on OpenAI pro subscriptions” because users were consuming far more than anticipated. Altman acknowledged to TechCrunch “I personally chose the price and thought we would make some money”. Anthropic, by every available indicator, operates in the same territory. Internal financial documents obtained by the Wall Street Journal in November 2025 and subsequently reported by TechCrunch and Fortune show Anthropic expected to burn approximately $3 billion in cash in 2025, against $4.2 billion in sales with break-even targeted for 2028, contingent on gross margins reaching 77% by that year. Those margin targets are already under pressure: Anthropic’s inference costs ran 23% above internal projections in 2025, cutting gross margin to roughly 40% and prompting downward revisions to forward estimates, according to MarketWise’s reporting in March 2026.

This is not a pricing strategy. It is a territorial claim on the future.

Part I: The Inference Cost Curve Is the Whole Story

To understand what Anthropic is betting on, the starting point is one of the most remarkable and underreported economic phenomena in recent technology history: the collapse of AI inference costs.

Epoch AI’s research, published in 2025, found that the price to achieve GPT-4’s performance on PhD-level science questions fell by roughly 40 times per year between 2022 and 2025. Across different benchmarks, the range of annual price reductions runs from 9 times to 900 times depending on the task. The Stanford HAI AI Index 2025 report, released in April 2025, found that inference costs for GPT-3.5-level performance dropped over 280-fold in just 18 months between November 2022 and October 2024. At the hardware level, Stanford’s researchers found costs declining 30% annually while energy efficiency improved 40% each year.

The same capability that cost $20 per million tokens in late 2022 now costs less than $0.40.

The forces driving this compression are accelerating on multiple fronts simultaneously. AWS reduced H100 instance pricing by 44% in a single June 2025 announcement, bringing the cost from approximately $7 per hour to $3.90. Specialized providers now offer H100 access at $1.49 per hour. NVIDIA’s Blackwell architecture, widely deployed through 2025 and 2026, delivers up to 10 times cost reductions per token compared to the previous Hopper generation for the workloads that matter most.

The infrastructure cost of running a frontier model query will continue to fall. This is not speculative. It is the logical continuation of a three-year curve that has never reversed.

The Architecture Revolution

More significant than hardware gains is the fundamental shift in how AI models are constructed. Through 2025, virtually every leading frontier model migrated to Mixture-of-Experts (MoE) architecture. DeepSeek-V3, LLaMA 4 and Mistral Large 3 all adopted this approach, as did almost certainly Google’s Gemini family. The principle is structurally elegant: instead of activating all 600 billion parameters of a model for every single token, MoE models route tokens to only the most relevant specialized sub-networks, typically activating around 40 billion parameters out of 600 billion total. The result is the capacity of a massive model delivered at a fraction of the computational cost.

As NVIDIA’s technical documentation on its Blackwell platform confirms, MoE models on the GB200 NVL72 rack-scale system achieve 10 times better generational performance compared to the same models on previous generation H200 hardware. DeepSeek-V3 achieved frontier-level performance for under $6 million in total training compute, a figure that would have seemed impossible two years prior. Beyond MoE, speculative decoding, quantization (reducing numerical precision from FP16 to INT4) and prompt caching each contribute compounding cost reductions.

A November 2025 arXiv preprint from researchers applying a benchmark-level rather than token-level methodology found that inference costs for reaching any given capability level are declining at roughly 5 to 10 times per year. Separately, researcher JS Denain’s analysis published via Epoch AI’s Substack found that a complex task requiring 43 million output tokens in April 2025 required only 5 million tokens to complete at equivalent quality in December 2025 (roughly 3 times cost reduction in eight months) driven purely by model efficiency improvements.

Anthropic’s own model trajectory validates the underlying bet. Claude Opus 4.5 delivered Opus-class capability at $5 per million tokens, a 67% price reduction versus the prior Opus generation priced at $15 per million. Opus 4.6, released February 5, 2026, launched at that same $5 price while offering significantly improved long-context retrieval, agent team capabilities and 128K token output windows.

More capability. Same price.

The Bet Stated Plainly

A full-throttle Max 20x user likely costs Anthropic $1.000 to $5.000 in compute to serve for $200 today. That is a painful subsidy. In 18 months, given the documented cost trajectory, the same usage pattern might cost $200 to $500 to serve. In 36 months, perhaps $50 to $100. The economics reverse completely and Anthropic will have spent those 36 months building one of the deepest habitual user bases in the history of software tooling.

Compute is not the product being sold. Dependency, familiarity and professional identity are. Those are not commodities.

Part II: Why Coding Is the Perfect Vertical

Of all the categories Anthropic could have chosen to anchor its land grab, it chose software development and the results have been extraordinary.

Claude Code launched as a research preview in February 2025 and reached general availability in May. By November 2025, Anthropic confirmed that the product alone had surpassed $1 billion in annualized run-rate revenue $1 billion in annualized run-rate revenue (Anthropic, Dec 2 2025). By February 2026, per Anthropic’s Series G announcement, that figure had more than doubled to $2.5 billion (Anthropic, Feb 2026). The broader company trajectory was equally striking: Reuters reported in May 2025 that Anthropic’s overall annualized revenue crossed $3 billion that month, up from roughly $1 billion in December 2024, a growth that Meritech General Partner Alex Clayton, who has no financial stake in the company, described as a rate he had never seen across more than 200 public software company IPOs. Bloomberg reported in March 2026 that the run rate had reached $9 billion at year-end 2025 and had surpassed $19 billion by early March 2026, driven in each instance primarily by Claude Code adoption.

Three compounding reasons explain coding’s strategic centrality.

Code either runs or it does not. Unlike most AI use cases, software development produces outputs that are immediately verifiable through tests and execution making productivity gains from AI coding assistants measurable and undeniable, giving enterprises a clear return-on-investment calculation. As Fortune reported in February 2026, Spotify co-CEO Gustav Söderström stated that the company’s best developers had not written a single line of code manually since December 2025, with the streaming giant shipping over 50 new features through Claude Code-powered workflows. At an Anthropic enterprise event covered by VentureBeat in March 2026, the New York Stock Exchange’s chief technology officer Sridhar Masam described his organization as “rewiring our engineering process” with Claude Code and building internal AI agents that could take instructions from a Jira ticket all the way to a committed piece of code.

Coding also creates genuine switching costs not through hostage-taking but through workflow integration so deep it becomes invisible. Once a development team has built its muscle memory, its Slack integrations, its CI/CD pipelines and its CLAUDE.md configuration files around a specific AI coding assistant, switching becomes an act of self-disruption rather than a vendor decision. At a Seattle developer meetup in January 2026, a Google principal engineer publicly noted that Claude had reproduced a year of architectural work in one hour. That kind of experience does not produce curiosity about alternatives.

Third, and strategically most important, coding provides a developer-shaped wedge into enterprise. By fidelizing developers (the people who build, approve and architect corporate AI infrastructure) before expanding to the broader organization, Anthropic converted them into internal advocates. The Register reported in February 2026 on Anthropic’s partnership with CodePath, placing Claude Code in the hands of more than 20,000 students at community colleges and Historically Black Colleges and Universities. The Register noted the parallel directly: at the start of the personal computer revolution, Apple and Microsoft worked to get their products into schools precisely because early familiarity drives long-term retention. Anthropic is running the same play, against a larger addressable market, at considerably higher velocity.

Part III: The Historical Rhyme

History contains several technologies that appeared unprofitably expensive during their formative period and became structurally cheap, with the companies that subsidized early adoption reaping advantages that compounded for decades.

AOL’s Free Hours

In the early 1990s, AOL mailed floppy disks offering free internet access to millions of American households. The economics looked absurd: dial-up access was metered and expensive, and AOL was distributing a perishable commodity at a loss. The Smithsonian’s National Museum of American History documents the strategy’s architect, chief marketing officer Jan Brandt, as recognizing something her competitors had not: the internet was a network-effect business, and no one would pay for it until they had experienced it. The campaign generated conversion rates of 10 to 15 percent when the industry standard was 2 percent. Britannica’s account of AOL’s corporate history records that by the end of the 1990s, AOL had surpassed 20 million paying subscribers and briefly became the most valuable media company in the United States.

The parallel to Anthropic’s pricing structure is precise. The Max plan at $200 per month is Anthropic’s floppy disk: a subsidized introduction to a transformative technology, offered at a loss, designed to build habitual dependency before the underlying economics are sustainable. The difference is that Anthropic’s cost structure is improving at a rate AOL could not have imagined.

Moore’s Law and the Democratization of Computing

NBER research by economist Kenneth Flamm documents that quality-adjusted IT equipment prices declined 16% annually on average across the five decades from 1959 to 2009, accelerating to 23% per year in the late 1990s. CSIS’s analysis of Moore’s Law concludes that this relentless cost decline transformed the computer from a military research instrument to a household appliance and that the corollary effect was the enabling of semiconductor adoption across every major industry.

AI inference is Moore’s Law operating on a steeper curve. Hardware improvements account for part of it (30% annual decline at the chip level, per Stanford’s AI Index). AI additionally benefits from algorithmic improvements and software optimizations with no real analogue in the transistor-density world. Companies that subsidized computing access during the expensive early years built relationships that became structural. The hardware got cheap. The relationships persisted.

The Broadband Build-Out

In the late 1990s, internet service providers invested enormous sums to lay fiber optic cable, operating at substantial losses for years. Bandwidth costs fell as Butters’ Law predicted: the cost of transmitting a bit over optical fiber halved roughly every nine months. Companies that built user bases during the expensive era served those users at dramatically lower costs once the infrastructure matured. Those that sat out the land-grab period did not recover the ground.

AI infrastructure today occupies an analogous phase. The companies building habitual users now will serve those users at a fraction of the current cost within 24 to 36 months.

Part IV: The Big Picture

Anthropic is not competing for AI users. It is competing for professional identities.

When a developer describes themselves as a “Claude Code developer,” when an enterprise architect has Claude integrated into every step of their workflow, when a cohort of computer science students learns to code with Claude as their pair programmer … that relationship has moved beyond a customer relationship. Changing tools becomes an act of professional self-disruption. This is precisely what Microsoft accomplished with Office in the 1990s, and notably it had nothing to do with subsidization. Excel, Word and PowerPoint were not technically superior to all competitors in every dimension. Lotus 1–2–3 was a more capable spreadsheet at the time; WordPerfect was a widely beloved word processor. Microsoft prevailed not by being cheaper but by being everywhere: bundled into OEM deals, standardized across organizations, embedded in the muscle memory of a generation of knowledge workers. At some point, the tool stopped being a choice. It became the assumed medium of professional thought. Anthropic is pursuing the same destination through a different route: where Microsoft used distribution leverage, Anthropic is using the developer as the entry point, building irreplaceable habits at the individual level first and watching those habits calcify into organizational standards. The goal is identical: make switching not a vendor decision but an act of professional self-disruption.

The scale of what is being subsidized matters. At full utilization, the Max 20x plan likely delivers $2,000 to $5,000 in compute value for $200. That is a subsidy ratio of 10 to 25 times. No rational company accepts that ratio without a specific theory of the future. Anthropic’s theory: inference costs will fall 5 to 10 times annually; users who become dependent on Claude Max today will still be subscribers when the marginal cost of serving them drops to $20 to $50 per month; and the switching costs built through years of deep workflow integration will ensure they remain.

A second bet runs alongside the first. MoE architecture is already transforming cost structures. State Space Models address the quadratic bottleneck of traditional attention, enabling million-token context windows without exponential memory costs. Test-time training (models that adapt to specific tasks in real time) is emerging as the next structural frontier. Each architectural shift brings another step-change in cost reduction. Anthropic, operating at the frontier of model research, is positioned to ride each wave.

The Risks Are Real

Rate limit interventions signal the friction point in the subsidy model. In August 2025, Anthropic introduced weekly usage caps for Claude Code heavy users after what it described as abuse by a small subset of subscribers. The distribution of usage matters critically: if the average Max subscriber consumes 20 to 40 percent of their available capacity, moderate users subsidize heavy ones and the arithmetic holds. If heavy users cluster, the per-subscriber losses become unsustainable.

The open-source threat carries genuine weight. DeepSeek disrupted the pricing narrative in early 2025 by delivering GPT-4-level performance at a fraction of incumbent costs. Stanford’s AI Index 2025 found that the performance gap between leading proprietary and open-weight models had narrowed to 1.7% on some benchmarks. If capable open models become trivially self-hostable at scale, Anthropic’s pricing power weakens considerably and the value proposition shifts entirely to the ecosystem (Claude Code integrations, Cowork workflows, institutional knowledge embedded in configuration files) rather than raw model capability.

Google’s structural advantage warrants acknowledgment. Google AI Ultra at $249.99 per month bundles YouTube Premium, 30 terabytes of storage and 25,000 AI credits, all subsidized by over $300 billion in annual advertising revenue. Google owns its TPU infrastructure, eliminating cloud provider margins entirely. Anthropic does not have that cost structure. The bet works only if the inference cost curve falls fast enough to outpace Google’s subsidy capacity.

The Long Arc

The history of transformative technology is not a history of the best technology winning. It is a history of the technology that achieved sufficient habitual adoption before the economics became obvious to everyone else.

AOL did not win the early internet because dial-up was the optimal connection method. It won because it made the internet feel like something ordinary people could actually use, at the precise moment when no one else was doing that at scale.

Microsoft did not win the productivity suite wars because Word was the superior word processor. It won because it subsidized early adoption through bundling and OEM deals until switching costs rendered competition irrelevant.

AWS did not win cloud computing because Amazon possessed the most sophisticated infrastructure. It won because it offered compute at prices that made experimentation possible, built developer habits before competitors recognized the stakes and then watched the cost curve complete the rest of the work.

Anthropic is running the same play. The $200 Max plan that costs Anthropic $1,000 to $5,000 to serve today will cost $50 to $200 to serve in 2027. The users fidelized now like the developers who reach for Claude Code first, the knowledge workers routing every complex problem through Claude Cowork, the CS students who have never submitted a pull request without an AI collaborator, those users will still be subscribers when the economics reverse.

The quarterly losses are visible. The strategy pays out in 2027 and beyond, in a currency that does not appear on balance sheets.

The $200 plan is not a product. It is a land grab.

The $200 Bet: Anthropic Is Subsidizing You Today to Own You in 2028 was originally published in Mind In The Loop on Medium, where people are continuing the conversation by highlighting and responding to this story.

Everyone Can Build Software Now. That is the Problem!

Riccardo Gatti — Sat, 07 Mar 2026 16:03:48 GMT

Something fundamental is shifting in how software gets built. For the first time in the industry’s history, the ability to create applications no longer sits behind years of computer science training or a specialised engineering degree. Low-code environments, No-code platforms, AI-generated code and vibe coding have arrived together and they are reshaping who gets to participate in digital creation.

The consequences, I believe, will be simultaneously transformative and destabilising. That tension will define enterprise technology for the next decade.

The Scale of the Shift

The market data is unambiguous. Gartner forecast in 2021 that by 2025, 70% of new enterprise applications would be built on low-code or no-code platforms, up from less than 25% in 2020. The firm attributed this shift directly to the rise of citizen development and the structural inadequacy of traditional IT delivery. In a separate projection, Gartner estimated that 80% of technology products and services would be built by professionals outside IT by 2024, driven by the proliferation of low-code tooling and AI-assisted development. The firm’s most recent market forecast projects the low-code segment reaching $58.2 billion by 2029, with agentic AI and citizen development cited as the primary growth drivers.

The vibe coding wave has accelerated these dynamics sharply. Y Combinator CEO Garry Tan told CNBC in March 2025 that for roughly a quarter of the accelerator’s Winter 2025 cohort, 95% of the code was written by AI. Managing partner Jared Friedman, speaking to TechCrunch, was precise about the methodology: the figure excluded imported libraries and counted only core application code written by humans versus AI. Collins Dictionary named vibe coding its Word of the Year for 2025, a signal that the practice has moved from engineering subculture into mainstream awareness faster than any regulatory or educational framework could follow.

The Genuine Upside

The benefits of this shift are real and worth stating plainly before examining the risks.

Domain experts who previously waited months in IT backlogs can now build the tools they need directly. Same is for a logistics manager who understands routing inefficiencies better than any external developer or a nurse who sees gaps in patient intake daily. These professionals can now act on that knowledge without translation or delay. This is not simply a cost efficiency argument. It represents a fundamental reallocation of creative agency toward the people closest to the problems worth solving.

Vibe coding adds a further dimension for rapid experimentation. Garry Tan’s observation to CNBC captured this concisely: companies in the YC Winter 2025 batch were reaching $10 million in revenue with teams of fewer than ten people, a capital efficiency that would have been structurally impossible a decade earlier. The barrier between having a product idea and having a working prototype has collapsed.

A Mountain of Scrap in the Making

Here is what I think nobody wants to say plainly: more builders means more building, but not necessarily more good building.

When millions of non-technical users create applications without architectural oversight, security training or long-term thinking, several distinct crises begin to converge.

The first is application sprawl. MuleSoft’s 2025 Connectivity Benchmark Report, drawn from interviews with 1,050 IT leaders worldwide, found that the average enterprise runs 897 applications while successfully integrating only 29% of them. As citizen development scales, that figure climbs without a corresponding improvement in integration. Teams build tools, tools get abandoned and institutional knowledge of what exists, where data flows and who is responsible disperses with staff turnover. Deloitte cautioned directly that without governance frameworks, citizen developers treat application creation like document editing, generating significant technical debt and enterprise security risk at scale.

The research on AI-generated code quality is sobering. In an August 2025 survey of 18 CTOs, 16 reported production disasters directly caused by AI-generated code, according to Google engineering lead Addy Osmani, who characterised the broader dynamic bluntly: AI tools risk turning junior developers into prompt engineers and senior engineers into code janitors. A peer-reviewed study published on ArXiv in December 2025, drawing on field observations and surveys of 99 experienced developers, confirmed what senior engineers have long understood: professionals do not vibe. They plan before implementing, validate every output and retain architectural control throughout. The incoming wave of citizen developers has not learned this discipline and, in many cases, does not know it is needed.

The open source ecosystem is already registering the strain. InfoQ reported in February 2026 that cURL maintainer Daniel Stenberg shut down the project’s six-year bug bounty programme after AI-generated submissions reached 20% of the total, with the valid-submission rate collapsing to 5%. Mitchell Hashimoto banned AI-generated code from Ghostty entirely. RedMonk analyst Kate Holterhoff described the pattern as “AI Slopageddon.” These are not isolated reactions from conservative maintainers. They are structural responses to a quality problem that scales with the number of people generating code without understanding it.

The Build-or-Buy Calculus, Rewritten

For decades, the build-or-buy decision in enterprise software followed a stable logic. Organisations purchased commodity functions and built proprietary systems where genuine differentiation existed. The practitioner shorthand captured it neatly: buy for parity, build for competitive advantage.

Democratisation is disrupting that calculus in more than one direction at once. The case for buying commodity software has strengthened considerably. Harvard Business Review has observed that organisations consistently overestimate how distinctive their requirements truly are, a cognitive bias that drives substantial unnecessary custom development. When any team can deploy a functional workflow in days using a no-code platform, the justification for building standard functions from scratch becomes increasingly difficult to sustain.

At the same time, the accessibility of building has made the attempt more tempting for the strategic layer. McKinsey found that companies building digital assets aligned with their core operations achieve 20% to 30% higher profit margins than peers. Netflix built its recommendation engine, which now accounts for 80% of its viewership. When the cost of attempting drops, the argument for protecting a genuinely differentiated process in proprietary code becomes easier to make — and easier to act on impulsively.

The more consequential shift, however, is that the binary question is becoming the wrong frame entirely. Leading CIOs now argue that the era of assembled, AI-orchestrated architectures demands a different approach: enterprises are purchasing foundation models, adopting vendor-provided domain agents, building their own workflows and connecting everything through shared governance rails. The model I find increasingly compelling is “buy the context, build the core” — an API-first architecture that purchases commodity layers aggressively and reserves custom development for the narrow band where genuine intellectual property resides.

The irony that emerges from all of this is precise: the cheaper building becomes, the more consequential the judgment about what to build. Strategic discernment, not technical access, is becoming the scarce resource.

Governance as the Differentiator

What I find most underrated in this conversation is that the organisations positioned to win the democratised development era are not those with the most tools or the fastest deployment cycles. They are those with the clearest governance posture.

McKinsey’s Global Tech Agenda for 2026, drawn from a survey of more than 600 C-level executives conducted in late 2025, found that top-performing organisations are distinguished precisely by technology leaders who are deeply involved in shaping enterprise strategy rather than managing infrastructure. Half of all CIO respondents planned to increase technology budgets by more than 4% in 2026, with the highest performers investing at more than twice that rate.

What Gartner terms a Centre of Excellence captures the governance model taking shape in practice: a structured framework that establishes guardrails without functioning as a bureaucratic barrier. In practice this requires a clear taxonomy distinguishing experimental applications (disposable by design), departmental tools (maintained with oversight) and enterprise-critical systems (fully governed). It requires mandatory security review before any citizen-built or AI-generated application reaches production, and lifecycle policies that assign each application an owner, a review date and a retirement process, so that ungoverned software gets decommissioned rather than left to fail quietly under load.

The Southwest Airlines operational collapse of December 2022 remains the most cited reference point in this conversation. The company’s crew scheduling system, built in the 1990s and never adequately modernised, failed under the pressure of winter storm Elliott, producing 16,900 flight cancellations and costing more than $800 million. The system had not been vibe-coded. But its fate illustrates precisely what accumulates when technical debt goes ungoverned until the underlying fragility becomes catastrophic. The citizen development era will produce this dynamic at a far greater frequency and a far smaller scale per incident — which makes it harder to see coming and harder to attribute clearly once it arrives.

The metaphor I keep returning to is the printing press. It democratised writing and produced an explosion of knowledge. It also flooded the world with pamphlets nobody read and misinformation that took centuries to correct. The tools of democratised development are already here. The infrastructure of responsible development is still being constructed.

Implications Across the Landscape

For business leaders, the governance investment must precede scale rather than follow it. The efficiency gains from citizen development are real but reversible. A single compliance incident from an ungoverned application can erase years of accumulated savings, and the CIOs planning significant budget increases in 2026 face a consequential allocation decision about whether governance sits inside that number or remains a future concern with asymmetric downside.

For product and technology leaders, the build-or-buy decision has become simultaneously more accessible and more dangerous. The correct frame, as CIO.com’s December 2025 analysis concluded, is no longer binary. The organisations succeeding today are assembling rather than choosing, buying what is commoditised, building what is genuinely core and connecting both through governance rails that allow either component to be replaced without catastrophic disruption.

For professional developers, the shift represents an elevation rather than a displacement. The ArXiv study of 99 experienced engineers found them embracing AI agents as productivity amplifiers while retaining rigorous control over design and validation. Architectural oversight and governance expertise are the skills that prevent the code janitor outcome, and both are in short supply relative to the demand that democratised development is creating.

For educators and policymakers, vibe coding’s status as Collins’ Word of the Year for 2025 signals a cultural adoption rate that has already outpaced any corresponding educational or regulatory response. The curricula now needed are those teaching not only how to build software but when building is the right choice, how to sustain what gets built and what responsible digital creation looks like when practised at the scale of millions of concurrent non-professional practitioners.

What Comes Next

The organisations that emerge from this transition with durable advantage will not be those who distributed the tools most widely or built the most applications. They will be those who developed the strategic discipline to distinguish what merits building from what should be bought, and the governance discipline to ensure that what does get built does not eventually bury them.

More builders, more software, more possibility. More scrap. The ratio between the last two terms is the variable that matters, and it will be determined not by the tools available but by the judgment and governance surrounding their use.

Everyone Can Build Software Now. That is the Problem! was originally published in Mind In The Loop on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Last Abstraction Layer Humans Will Build

Riccardo Gatti — Thu, 05 Mar 2026 08:36:44 GMT

When Machines Build Their Own Abstractions

When humans built C, we thought we were applying insight. Looking back, we were running statistical analysis on assembly patterns, compressing common sub-sequences and reinforcement learning from usage. Machines now do this explicitly. Was human abstraction-building ever non-computational ?

Software engineering has followed a consistent pattern for decades. Humans built abstractions on top of abstractions. Assembly language provided symbolic mnemonics over machine code. C introduced structured programming. C++ added object-oriented constructs. Java automated memory management. Python prioritized expressiveness over execution speed. Each layer traded control for cognitive efficiency. Engineers moved from managing individual CPU registers to declaring high-level intentions. The progression seemed uniquely human, a product of conscious design.

The pattern has now repeated without human architects. Machines build their own abstraction layers as a deployed reality rather than theoretical possibility. Language models automatically generate Domain-Specific Languages, according to research published at the 2024 Association for Computational Linguistics conference. Compiler systems optimize code through learned patterns rather than human-written rules. Agent frameworks modify their own improvement mechanisms through recursive iteration.

I have spent much of my career building abstraction layers. The goal was always the same: create reusable components that could be shared across different products. Enable new use cases simply by changing configuration rather than rewriting code. Speed up development by letting teams compose solutions from existing building blocks rather than starting from scratch each time. What I’m watching now is machines doing this same work, but at a scale and speed I never could. They are recognizing patterns across codebases, generating the abstractions, and optimizing them automatically. The process I performed manually over years, they are compressing into hours or days. This isn’t just automation of coding. It’s automation of the architectural thinking that used to require deep domain experience. That recognition changes how we need to think about the work ahead.

The Computational Foundation

Abstraction involves four computational operations:

Pattern recognition identifies recurring structures across problem domains
Generalization creates parameterized representations capturing common elements
Encapsulation hides implementation details behind simpler interfaces
Composition builds complex behaviors from reusable components

When humans created C from assembly, the process followed these steps methodically. Engineers recognized that certain instruction sequences appeared repeatedly in their code. They generalized these patterns into higher-level constructs like loops and function calls. They encapsulated complexity behind cleaner syntax. They enabled composition of these constructs into larger programs. The question becomes whether these operations depend fundamentally on human insight or represent algorithmic processes that machines can perform independently. Recent developments in code generation suggest the latter.

Declarative Systems as Reverse Abstraction

Modern software development reveals a shift from imperative programming toward declarative configuration. Infrastructure as Code exemplifies the transformation. When engineers write Terraform configurations declaring desired system states rather than procedural steps, they specify outcomes rather than implementations. The tool generates actual API calls, error handling, state management and retry logic automatically.

The pattern holds across domains. Kubernetes declares desired system states while the control plane generates operational steps. SQL describes what data to retrieve while the query optimizer generates execution plans. Machine learning frameworks define model architectures while the framework generates training loops and optimization routines. This represents abstraction-building in reverse. Instead of humans abstracting away complexity, machines interpret high-level specifications and generate lower-level implementations. The traditional direction has inverted.

Automatic Language Design

Research published in 2024 demonstrates that language models can automatically design Domain-Specific Languages. Yu-Zhe Shi and colleagues at the Association for Computational Linguistics introduced the AutoDSL framework. The system takes experimental protocols in specific domains and automatically generates syntactic constraints (the grammar and structure), semantic constraints (the meaning and valid operations) and optimization rules (efficient processing methods). This replicates precisely what human language designers do. The difference lies in mechanism. Machines achieve the result through statistical pattern recognition and optimization rather than conscious insight.

Meta’s LLM Compiler represents another threshold. Trained on 546 billion tokens of compiler intermediate representations, the system understands code at multiple abstraction levels simultaneously. It suggests optimizations improving performance and generates equivalent code at different abstraction levels. The system achieves meaningful optimization results (77% of autotuning search potential, according to Meta’s 2024 paper) not through human-programmed rules but through learned patterns of efficient code.

The theoretical literature on recursive self-improvement examines whether systems can modify their own improvement mechanisms. Recent frameworks demonstrate this capability exists. The STOP Framework (Self-Taught Optimizer), uses a scaffolding program that employs a fixed language model to recursively improve its own optimization strategies. Each iteration improves not just performance but the improvement process itself.

Self-evolving agent systems maintain multiple components. A policy determines how to act. A meta-policy governs how to improve the policy. An evaluation function assesses improvements. The system can modify any component based on performance feedback. This mirrors how human programmers develop not just better code but better programming methodologies over time.

The Algorithmic Nature of Abstraction

The computational theory question concerns whether abstraction-building is fundamentally algorithmic. Consider the process humans used to create C from assembly language:

Observe assembly code patterns (statistical analysis)
Identify common sub-sequences (pattern matching)
Create higher-level constructs that map to these patterns (compression)
Test whether these constructs are useful (optimization)
Iterate based on usage patterns (reinforcement learning)

Each step has a computational analog. The human advantage stemmed not from performing non-algorithmic operations but from having a huge training corpus (years of programming experience), possessing good heuristics (developed through trial and error), and being able to evaluate utility (knowing what makes code “better”). Modern AI systems now possess massive training corpora. They develop heuristics through training processes. The remaining challenge involves evaluation functions. Determining what makes one abstraction superior to another remains complex.

Intent Versus Pattern Recognition

A distinction emerges between human and machine approaches. When humans create abstractions, they often do so with intent. They want to solve specific problems, enable certain capabilities and prevent certain errors.

When machines generate abstractions, they optimize statistical patterns. They learn that certain symbol configurations lead to reward signals. The difference may be less fundamental than it appears. When human programmers learn that “this pattern causes bugs,” they respond to negative reward signals. When they learn “this pattern improves maintainability,” they optimize for a learned metric. Intent becomes an emergent property of the optimization process rather than a separate ingredient.

Configuration no longer merely replaces code. It has become code itself. When engineers specify desired outcomes in natural language, machines generate the configuration that generates the implementation. We are programming in meta-languages without explicit recognition of the shift.

If abstraction is fundamentally about compressing patterns inherent to problem domains, machines should converge on similar solutions humans found. But what if compression optimizes for statistical regularities that don’t align with human cognitive architecture? Would we recognize those abstractions as “correct”?

The Progression of Meta-Languages

A phase transition has occurred in how systems are constructed. The traditional model involved humans writing code for machines to execute. The configuration model has humans specifying goals for machines to implement. The emerging meta-configuration model has humans specifying domains while machines generate both the configuration language and the implementation. This third layer defines where self-generating systems operate. Machines no longer just execute or implement. They decide how to represent problems themselves.

Theoretical Implications and Limits

If machines can build abstraction layers, several theoretical questions emerge. The halting problem for abstraction generation asks whether a general algorithm can determine optimal abstraction for a domain. This likely remains undecidable, analogous to the halting problem itself. No universal definition of “optimal abstraction” exists.

The completeness question examines whether machine-generated abstraction layers can be as expressive as human-designed versions. Gödel’s incompleteness theorems suggest any formal system has limitations. Human-designed languages face the same constraints. Engineers constantly invent new languages when existing ones prove insufficient.

The verification challenge asks how we verify that self-generated abstraction layers preserve intended semantics. This represents the classic specification problem. Formal expression of desired properties must precede verification of their achievement.

The convergence hypothesis proposes that machines will converge on similar abstraction layers that humans would create. If abstraction fundamentally compresses patterns inherent to problem domains rather than patterns specific to human cognition, convergence appears likely. The patterns exist in the domain itself, not in our perception of it.

Five Distinct Futures

Given that machines can and do build abstraction layers, the critical question shifts from capability to trajectory. The process leads down at least five distinct paths. I find myself oscillating between these scenarios depending on which recent development I’m considering.

The Convergence Path assumes abstraction-building fundamentally compresses patterns inherent to problem domains. Machines will converge on similar solutions humans would have created. This suggests abstraction is discovered rather than invented. Natural ways to represent computation exist. Both humans and machines will find them. This future validates the human approach as universally applicable. The abstractions engineers built were not arbitrary cultural artifacts but optimal compressions of computational reality.

The Divergence Path proposes machines might discover abstraction paradigms entirely different from human approaches. Ways of organizing computation natural for learned statistical patterns may prove alien to human cognition. These abstractions might be more efficient but incomprehensible. This future suggests the human approach was one path among many, optimized for human cognitive architecture rather than computational efficiency.

The Hybrid Path, most likely in practice, predicts both convergence and divergence. Machines will rediscover some human abstractions because they represent genuinely optimal solutions. They will invent others humans never conceived because they require search through spaces too large for human exploration. This future treats human abstraction-building as a good heuristic rather than the universal solution.

The Fundamental Limits Path suggests self-building systems will hit theoretical walls. Levels of abstraction may exist where Gödelian incompleteness, computational complexity or verification impossibility prevent further ascent. The tower of abstractions has a maximum height. We may be approaching it. This future proposes only so many useful layers exist before diminishing returns or fundamental barriers stop progress.

The Unbounded Recursion Path proposes the self-referential loop continues indefinitely. Each abstraction layer enables building systems that build the next layer with no theoretical limit. Intelligence explosion scenarios occupy this space. Improving the improvement mechanism accelerates without bound. This future remains most contentious, raising questions about control, alignment and whether such recursion remains stable or spirals into instability.

The Abstraction Ladder and Configuration Endgame

The abstraction ladder now extends through distinct levels:

Layer 0: Machine instructions
Layer 1: Assembly language (human-readable machine code)
Layer 2: High-level languages (C, Python)
Layer 3: Configuration languages (YAML, Terraform)
Layer 4: Natural language specifications
Layer 5: ???

Each layer up requires specifying less “how” and more “what.” At the limit, engineers specify only intent while the entire implementation stack gets generated automatically. The systems generating these layers are themselves configured rather than programmed. The industry already operates at the natural language specification layer. Engineers describe desired outcomes in plain language. Machines generate the configuration that generates the code.

The next layer may involve systems inferring intent from context, history and implicit goals without explicit specification. Alternatively, the distinction between specification and implementation may collapse entirely.

The Computational Reality

From a computational theory standpoint, abstraction-building involves pattern recognition over problem domains, compression of recurring structures, optimization for utility metrics and iteration based on feedback. These operations are computationally tractable. The human approach applies to machines because the human approach was computational. Recognition of this fact came late.

The difference lies in degree rather than kind. Machines access larger pattern corpora. Humans currently maintain better evaluation heuristics. Machines iterate faster. Humans generalize across domains more effectively for now. These represent quantitative rather than qualitative differences.

Abstraction isn’t mystical. It involves compression, optimization and generalization. Machines excel at precisely these operations. Configuration no longer just becomes more important than code. Configuration has become the code itself. It serves as the meta-language specifying what abstractions machines should build.

The theoretical groundwork stretches back to Jürgen Schmidhuber’s 2009 work on Gödel machines and self-referential systems. Recent practical demonstrations accelerated the timeline. Google DeepMind’s AlphaEvolve, unveiled in May 2025, uses LLMs to design and optimize algorithms, potentially optimizing components of itself. The ICLR 2026 Workshop on AI with Recursive Self-Improvement, scheduled for later this year, brings together researchers examining how learning systems rewrite their own update mechanisms. The field has compressed decades of theoretical work into months of empirical progress.

The Evaluation Function Problem

Multiple futures branch from this point. Convergence, divergence, hybrids, fundamental limits or unbounded recursion remain distinct possibilities. Having spent years building abstraction layers manually, specific indicators reveal which path we’re entering.

The evaluation function problem stands above all other considerations. Systems currently build abstractions through pattern recognition and optimization. They cannot assess whether those abstractions serve purposes beyond their training objectives. The capability to judge quality without external validation remains absent.

This distinction determines the boundary between bounded and unbounded development. Solve the evaluation problem and recursive improvement becomes possible. Systems would iterate not just on implementations but on the abstractions themselves. The improvement mechanism improves itself without human checkpoints. Each generation of abstractions could be assessed and refined by the system that generated them.

Fail to solve it and a natural ceiling emerges. Systems generate abstraction layers but require human judgment to select among them. Progress continues but remains bounded by human evaluation capacity. The bottleneck shifts from generation speed to assessment quality.

The abstractions I spent years building manually provide a reference point. When systems generate patterns that converge with approaches I developed through experience, that suggests we’re finding universal compressions inherent to problem domains. When they diverge into structures I cannot immediately comprehend but that demonstrably function, that signals we’ve crossed into territory where human architectural intuition no longer guides development. The moment a system generates an abstraction I cannot follow but that proves superior on metrics beyond speed (elegance, maintainability, extensibility) marks the transition into alien territory.

Technical details, market adoption rates and deployment statistics amount to noise compared to the evaluation function question. Either systems develop reliable self-assessment or they remain dependent on human judgment. That binary determines which of the five futures unfolds.

The signals exist now. Systems already generate abstractions faster than humans can. The gap between generation and evaluation grows wider. The transition has already begun. Recognition lags behind reality.

The Last Abstraction Layer Humans Will Build was originally published in Mind In The Loop on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Ghost in the Machine: From Dumas to AI Agents and the Future of Writing

Riccardo Gatti — Mon, 02 Mar 2026 18:46:00 GMT

The Evolution of Authorship

When James Patterson releases another thriller or when a celebrity publishes a tell-all memoir, few readers pause to consider who actually put those words on the page. The reality is that many of the books lining our shelves were not written solely by the person whose name appears on the cover. This practice, known as ghostwriting, has been part of literature for centuries and raises fundamental questions about authorship, creativity, and what it means to write. Now, with the emergence of large language models capable of generating human-like text, these questions have taken on new urgency and complexity.

The history of ghostwriting is rich with surprising names and stories. Alexandre Dumas, the celebrated French author of The Three Musketeers and The Count of Monte Cristo, operated what critics called a fiction factory in nineteenth-century Paris. His most important collaborator was Auguste Maquet, who contributed significantly to the plots and drafts of some of Dumas’s most famous works. Though Dumas maintained tight editorial control and added his distinctive flair, Maquet’s role was substantial enough that he eventually sued for recognition and royalties. The courts sided with Maquet in part, acknowledging his contributions while still crediting Dumas as the primary author.

In the modern publishing world, James Patterson has become perhaps the most prolific and transparent example of author collaboration. He works with dozens of co-authors, typically creating detailed outlines and then reviewing and revising the drafts his collaborators produce. Patterson is remarkably open about this process, viewing it more as a production system than a secret to be kept. His name alone can sell millions of copies, and his co-authors benefit from the exposure and experience. The arrangement works because Patterson brings the vision, the brand, and the editorial oversight, while his collaborators bring the daily work of transforming outlines into prose.

Tom Clancy followed a similar path later in his career, particularly with his techno-thrillers. Books bearing his name but written with co-authors became commonplace, and after his death in 2013 the Clancy brand has continued with other writers producing novels in his established universe. The same pattern appears with V.C. Andrews, the gothic novelist whose estate has published far more books under her name after her death in 1986 than she wrote while alive. Ghostwriter Andrew Neiderman has been the actual author behind the V.C. Andrews name for decades, maintaining her distinctive style and themes while creating new stories for her devoted readership.

Celebrity memoirs represent another domain where ghostwriting is standard practice, though often hidden behind the phrase “written with” or “as told to” on the cover. Donald Trump’s The Art of the Deal was largely written by journalist Tony Schwartz, who spent eighteen months shadowing Trump and then crafted the narrative. Schwartz has since spoken publicly about his role, explaining how he created the voice and structure that made the book a bestseller. Similar arrangements exist for countless political memoirs, business books, and celebrity autobiographies, where the famous person provides the experiences and approval while a professional writer shapes the material into readable prose.

The Algorithmic Pen

The arrival of large language models has introduced a new player into these questions of authorship. Unlike human ghostwriters who bring their own creativity and judgment to the work, these AI systems generate text based on patterns learned from vast amounts of existing writing. When someone uses a tool like Claude or Gemini to help draft an article or develop a story, whether as a simple text generator or as an autonomous agent using tools and taking actions, they are engaging with something fundamentally different from hiring a human collaborator, yet also different from using a simple spell-checker or grammar tool.

The differences between human and AI writing styles become apparent when examined closely. Human writers bring lived experience, cultural context, and genuine emotional understanding to their work. A human ghostwriter can interview a subject for hours, absorb their manner of speaking, understand their values and contradictions, and then produce prose that captures something essential about that person. The ghostwriter makes countless small decisions about tone, emphasis, structure, and detail that reflect not just technical skill but human judgment about what matters and why.

Large language models, by contrast, operate through pattern recognition and statistical prediction regardless of whether they function as simple text generators or as sophisticated agents. They excel at producing grammatically correct, contextually appropriate text that follows conventional structures and uses familiar phrases. What they lack is any genuine understanding of meaning or any authentic experience to draw upon. An LLM can describe heartbreak in technically proficient language, but it has never felt the weight of loss. It can explain scientific concepts clearly because it has processed millions of explanations, but it cannot have the sudden insight that leads to a new way of understanding a problem. This fundamental limitation persists even when AI operates as an agent, iterating and refining its work, researching information, and adapting to specific requirements.

AI agents can reduce some of the generic quality issues through iterative refinement and tool use. When an AI can research, draft, revise, check facts, and adapt based on specific requirements, the output becomes more customized and less obvious than simple one-shot text generation. However, this represents a more sophisticated application of the same underlying capabilities rather than a fundamental change in nature. The agent is still assembling text based on learned patterns, just through a more complex process. Even when an AI agent produces something that appears creative or insightful, it is fundamentally synthesizing and recombining patterns from its training data. It can spot patterns humans might miss by processing vast amounts of information, but it does not understand those patterns in any meaningful sense.

This distinction manifests in subtle but important ways. Human writing tends to have irregularity, personal quirks, unexpected word choices, and structural variations that reflect individual thinking patterns. AI-generated text, especially when produced quickly without extensive prompting or revision, often exhibits a kind of smooth competence that can feel generic. The language flows well and makes sense, but it rarely surprises or challenges the reader in the way that distinctive human voices do. The metaphors tend toward the familiar, the sentence structures favor the conventional, and the overall effect can be one of capable blandness.

The Standardization Question

For casual users who turn to AI writing assistance without extensive customization or careful prompting, the risk of standardization becomes particularly acute. When thousands of people use the same tool with similar prompts to write business emails, blog posts, or social media content, a certain homogenization of style becomes almost inevitable. The AI has been trained on common patterns and will naturally reproduce those patterns unless specifically directed otherwise. A business email drafted by an AI will likely have a professional but generic tone, hitting familiar beats and using standard phrases that millions of other AI-drafted emails also employ.

This standardization extends beyond just style to affect structure and even thinking. If an AI is asked to write an article about climate change, it will likely organize the information in predictable ways because those are the patterns it has learned from existing climate articles : introduction establishing the problem, explanation of causes, discussion of impacts, consideration of solutions, conclusion with a call for action. There is nothing wrong with this structure, but when it becomes the default for vast amounts of content, we risk losing the diversity of approaches that comes from different human minds tackling the same topic.

The issue becomes more serious when we consider how people learn to write by reading. If the next generation of writers grows up reading content that is increasingly AI-generated and therefore increasingly standardized in style and structure, what models will they internalize? Will they learn that writing means conforming to these smooth, competent but ultimately generic patterns? The feedback loop is genuinely concerning: AI systems trained on human writing produce standardized output, which humans then read and potentially emulate, which then becomes part of the training data for the next generation of AI systems.

Yet this concern must be balanced against a more optimistic perspective. Writing has always involved learning and applying conventions. Students study the five-paragraph essay not because that structure is sacred but because it teaches them how to organize thoughts coherently. Business writing follows templates because consistency and clarity serve important purposes in professional communication. The presence of standards and common patterns is not inherently bad. The question is whether we maintain enough diversity and creativity alongside those standards.

The Standardized Future

Imagining a world where knowledge expression becomes more standardized while ideas continue to flow freely requires us to distinguish between the packaging of ideas and the ideas themselves. Scientific papers already follow highly standardized formats. Researchers must conform to strict conventions about structure, citations, and terminology, yet this standardization has not prevented the flourishing of novel ideas and discoveries. Indeed, the standardization serves a purpose. A biologist can quickly scan a paper’s methods section because it follows a predictable format, spending their cognitive energy on evaluating the actual research rather than decoding an idiosyncratic presentation.

The advantages of such standardization are real and should not be dismissed. When business reports follow similar structures, executives can find key information quickly. When news articles conform to the inverted pyramid style, readers can grasp the essential facts in the opening paragraphs. Standardization reduces cognitive load and improves efficiency in contexts where communication serves primarily instrumental purposes.

For many writers, AI assistance could free them from the mechanical aspects of writing that they find burdensome, allowing them to focus on what they really want to say. Someone with brilliant insights but poor grammar skills might use AI to polish their prose. A non-native English speaker might use it to express complex ideas that they struggle to articulate in their second language. In these cases, standardization in the service of clear communication seems more like a feature than a bug.

The dangers, however, cannot be ignored. If standardization becomes too pervasive, we risk losing the distinctive voices and unconventional approaches that often lead to breakthrough thinking. The most innovative ideas frequently come packaged in surprising ways precisely because the thinker approached the problem from an unusual angle. James Joyce’s stream-of-consciousness technique in Ulysses was not just stylistic flourish but integral to his exploration of human consciousness. Virginia Woolf’s experimental structures reflected and enabled her insights about time and subjectivity. These achievements required writers willing to break conventions, not conform to them.

There is also the question of cultural homogenization. Language reflects culture, and different cultures have different rhetorical traditions, different ways of structuring arguments, different relationships between writer and reader. If AI systems trained predominantly on English-language internet content begin to shape how people around the world write, we could see a flattening of these cultural differences. A Japanese business email might start to sound more like an American one not because the Japanese writer chose that style but because the AI suggested it as the most natural way to write. The loss would be subtle but real: a gradual erosion of the diversity of human expression.

Perhaps the most profound concern is what happens to human creativity when AI becomes the default tool for putting thoughts into words. Writing is not just a way to record pre-existing ideas but a way to develop and refine those ideas. The act of struggling to find the right words, of revising and rethinking, is central to how many people think. If AI handles too much of this process, we might lose something essential about how human minds develop and express original thought.

Finding Balance

The comparison to ghostwriting offers some guidance for navigating these challenges. Ghostwriting has existed for centuries without destroying literature or eroding the value of authorship. It persists because it serves genuine needs: helping busy executives share their knowledge, enabling celebrities to tell their stories, allowing prolific authors to produce more books. The key is disclosure and appropriate use.

AI writing assistance might follow similar norms. For routine communications, reports, and other functional writing, AI assistance could become as unremarkable as using spell-check. For creative works, academic papers, and other contexts where original thinking and distinctive voice matter, the use of AI might require more careful consideration and possibly disclosure. The key is to preserve the connection between human intention and written expression, ensuring that AI serves as a tool for human communication rather than a replacement for human thought.

Education will play a crucial role in this future. If students learn to use AI as a genuine writing assistant rather than a shortcut around the difficult work of learning to write, the technology could be enormously valuable. Teaching people to craft effective prompts, to critically evaluate AI output, to revise and personalize AI-generated drafts, and to recognize when they should write from scratch rather than relying on AI tools would help ensure that the technology enhances rather than diminishes human capability.

The issue is not the technology itself but how we choose to use it. A world where AI assists with the mechanics of writing while humans focus on original thinking and authentic expression could be enriching. A world where AI standardizes thought as well as style (where the hard work of finding one’s own voice is abandoned for the ease of algorithmic generation) would be impoverished. The choice between these futures is not yet determined, and the decisions we make now about how to integrate these powerful tools into our writing practices will shape which world we create.

The Ghost in the Machine: From Dumas to AI Agents and the Future of Writing was originally published in Mind In The Loop on Medium, where people are continuing the conversation by highlighting and responding to this story.

How Modern AI Tools Could Transform Your Impostor Syndrome Journey

Riccardo Gatti — Fri, 27 Feb 2026 16:53:04 GMT

Eighty-two percent of tech professionals experience impostor syndrome. When I joined AWS five years ago, I was one of them along with most of my cohort. The company recognized this clearly enough to make addressing it a mandatory part of onboarding. At the time, that acknowledgment felt meaningful. Looking back through the lens of today’s AI capabilities, the realization lands differently. We fought that battle with one hand tied behind our backs.

The Numbers Behind the Self-Doubt

Impostor syndrome extends far beyond tech industry folklore. A 2025 meta-analysis published in BMC Psychology examining 11.483 individuals across 30 studies established a global prevalence rate of 62% with confidence intervals ranging from 52.6% to 70.6%. The syndrome, which psychologists Pauline Rose Clance and Suzanne Imes first described in their 1978 paper, manifests as a persistent failure to internalize accomplishments despite objective success.

The research reveals particular vulnerability in specific populations. Women score higher than men across 115 effect sizes spanning more than 40,000 participants, with a mean effect size of 0.27 according to a 2024 meta-analysis in Current Psychology. Ethnic minorities face especially elevated rates. One systematic review published in the Journal of General Internal Medicine in 2020 found that impostor feelings served as stronger predictors of impaired mental health than the stress of minority status itself. A finding that challenges how researchers typically approach ethnic minority psychological health.

Tech workers experience impostor syndrome at particularly acute levels. A 2021 survey of the PreSales Collective, a community of 11,000 technology professionals, found that 82% reported experiencing impostor syndrome within the previous twelve months. Healthcare professionals face similar pressures. Studies indicate that between 22% and 60% of physicians suffer from the phenomenon, with one 2022 study in Mayo Clinic Proceedings documenting these experiences among more than 3,000 surveyed physicians.

The syndrome operates through a characteristic cognitive pattern. Individuals attribute successes to external factors such as luck, timing, help from others. Setbacks, meanwhile, become confirmation of inadequacy. The pattern creates a trap that thrives in uncertainty and information gaps.

Recognition Without Resolution

AWS’s decision to address impostor syndrome during onboarding demonstrated institutional awareness. The company understood something fundamental: bringing high-performers into an environment populated with brilliant colleagues creates optimal conditions for self-doubt. The intensity of AWS’s “Learn and Be Curious” culture, while inspiring in principle, amplifies feelings of never knowing enough in practice.

The onboarding experience demanded deep dives across an impossibly broad technical landscape. Security, networking, software development, database architecture, machine learning, storage systems, compute infrastructure belonging to list that stretched endlessly. Even arriving as an expert in several domains meant confronting dozens more requiring immediate competency. Customers and colleagues asked questions spanning this entire spectrum. Each inquiry carried the weight of potential disappointment, the fear of revealing gaps in knowledge that perhaps shouldn’t exist.

Technology’s velocity compounded the challenge. Documentation aged rapidly, rendered obsolete by service updates that arrived with relentless frequency. The daily avalanche of emails (tens of them announcing new features, architectural patterns, best practices) exposed the impossible task of staying current. FOMO developed quickly, that gnawing sense that somewhere in those unread messages lurked the critical information tomorrow’s customer conversation would demand.

Most critically, the network hadn’t yet formed. AWS operates fundamentally on personal connections. In fact knowing precisely whom to ask when doubt emerges smooths work dramatically. But those relationships require time to develop. The first weeks passed in isolation punctuated by meetings with strangers whose expertise seemed boundless. Each knowledge gap became a potential trigger for impostor thoughts. Should this already be known? Will asking this question expose inadequacy? Everyone else seems to grasp this effortlessly.

The Core Problem: Information Access

Impostor syndrome occupies the space between what someone knows and what they believe they should know. The anxiety manifests through several mechanisms. First comes the uncertainty loop: not knowing something, feeling afraid to ask, spending hours attempting solitary comprehension, falling behind, experiencing intensified inadequacy. The comparison trap follows closely by observing colleagues who appear confident, assuming they possess comprehensive knowledge, failing to recognize their parallel learning journeys, feeling uniquely incompetent. Finally arrives the validation void: completing tasks while uncertain about their adequacy, receiving no immediate feedback, watching self-doubt grow, developing hesitancy to share work.

Traditional solutions carried inherent limitations despite their value. Mentors operated within finite availability and might not cover the specific domain where questions emerged. Documentation aged rapidly, a guide written six months prior might reference deprecated features or miss recent architectural patterns. Peers harbored their own impostor syndrome, creating mutual reluctance to expose vulnerability through questions that might reveal gaps. These constraints defined professional learning environments until recently.

The AI Intervention

Modern large language models and AI agents represent a genuine paradigm shift not through replacing human support but by filling critical gaps in the impostor syndrome cycle.

The technology offers immediate, judgment-free information access. LLMs provide instant answers to questions that might trigger embarrassment when posed to colleagues. The conversational nature of these systems removes fear of appearing incompetent, eliminates waiting for availability, and dispenses with concerns about bothering others. Research on AI-assisted learning found that professionals using AI tools for skill development reported finding the experience both “fun” (44% of respondents) and “confidence-boosting” (35%), according to a 2023 study covered by Agility PR Solutions.

Unlike static documentation, AI tools adapt explanations to knowledge levels, generate analogies and offer clarifying follow-ups. Questions can continue until genuine understanding develops, freed from time pressure or social anxiety. Amazon’s implementation of tiered AI education programs, which categorizes employees into beginner, intermediate and advanced levels, demonstrated an 83% improvement in skill retention alongside a 27% reduction in impostor syndrome symptoms, as documented in a March 2025 article in HR Future.

The validation function addresses one of impostor syndrome’s most anxiety-inducing aspects: uncertainty about whether work meets quality standards. AI tools serve as first-pass reviewers, checking analyses, suggesting presentation improvements, helping verify understanding before sharing work with colleagues. When joining new teams or projects, AI agents rapidly synthesize relevant background information, reducing the “drinking from a firehose” sensation that exacerbates impostor thoughts during onboarding.

Knowledge that others rely on AI tools for learning creates beneficial cultural shifts. The practice normalizes incomplete knowledge and reframes intelligence from accumulated facts toward effective information finding and application. Google’s AI mentorship program, which paired employees for group-based rather than hierarchical AI training, reported a 38% increase in problem-solving confidence and a 72% boost in overall AI usage confidence as of October 2024, according to HR Future.

Had these tools existed during those first AWS weeks, the trajectory would have shifted dramatically. Instead of spending hours parsing obsolete documentation about VPC configurations or Lambda execution models, an LLM could have provided current, plain-language explanations tailored to existing knowledge. That customer question about cross-region replication that arrived via email at 9 PM could have been pressure-tested against an AI system before formulating a response. The network gap that made every question feel like an imposition would have mattered less. A conversational AI doesn’t judge, doesn’t get annoyed, doesn’t subtly communicate that the question reveals inadequacy. It answers, clarifies and invites follow-ups until genuine understanding develops. Those first days wouldn’t have eliminated impostor syndrome because the psychological pattern runs deeper than information access, but the intensity would have diminished substantially. More importantly, the cognitive bandwidth consumed by constant anxiety about knowledge gaps could have redirected toward what actually mattered: building relationships, contributing meaningfully to discussions and developing the judgment that comes from confident engagement rather than fearful silence.

The Complications

AI introduces new psychological challenges alongside its benefits. When AI makes tasks too easy, it generates what researchers now term “AI impostor syndrome”, doubt arising because success lacks traditional struggle. John Nosta, writing in Psychology Today in March 2025, characterized the phenomenon: “AI-driven impostor syndrome flips this concept so that people experience self-doubt because their success lacks the traditional struggle associated with intellectual effort.”

Attribution confusion compounds the problem. When AI contributes significantly to work products, questions emerge about actual authorship and understanding. This intensifies fraudulence feelings rather than reducing them. The technology itself creates new competency anxieties. Research covered by Agility PR Solutions found that 21% of professionals have overstated their AI knowledge specifically because they “did not want to look foolish for not knowing or needing to ask for clarification.” Separately, 41% expressed concern about falling behind professionally without AI tool mastery.

Some professionals report feeling they are “cheating” by using AI, particularly in workplaces where adoption remains non-normalized. This leads to hiding AI usage, which recreates the isolation fueling impostor syndrome in the first place. Over-dependence on AI for routine tasks threatens to atrophy critical thinking skills. Like relying on GPS for every trip, successful navigation occurs without building internal cognitive maps. VentureBeat cautioned in an October 2025 article: “The ease of AI assistance creates a cognitive dissonance: one where mastery and doubt coexist.”

A 2024 study presented at the Academy of Management examined workplace AI augmentation through the lens of impostor phenomenon theory. The research posited that AI augmentation can evoke impostor thoughts in employees, subsequently decreasing their knowledge sharing and interpersonal citizenship behavior. These effects appeared more pronounced among employees with higher levels of intrinsic motivation.

Establishing Equilibrium

The distinction between AI as capability enhancer versus replacement for learning defines successful implementation. Effective approaches involve using AI to accelerate learning rather than substitute for it, requesting explanations and examples instead of merely accepting answers. Boundaries matter! Reserving AI for genuinely complex or time-consuming tasks while personally tackling core work maintains skill development. Validating AI outputs through verification and understanding underlying reasoning prevents shallow engagement.

Research on workplace AI implementation published in HR Future found that when employees view AI as supporting capabilities rather than exposing shortcomings, impostor syndrome decreases. Transparency about appropriate AI usage normalizes the practice when organizational culture permits. The time AI saves creates opportunities for developing judgment, creativity, and relationship-building. All those skills that the distinct humans and that algorithms cannot replicate easily.

Microsoft’s AI micro-learning modules, delivering education in small digestible lessons, increased task completion speed by 47% and knowledge retention by 31% according to the HR Future report. The approach suggests that implementation methodology matters as much as the technology itself.

Forward Implications

Organizations integrating AI tools into professional environments face opportunities to reshape fundamental assumptions about competence, learning and impostor syndrome. The goal extends beyond eliminating impostor syndrome. Rather, the objective involves preventing impostor syndrome from becoming paralyzing while reducing suffering and maintaining motivation.

Institutional responses require explicit AI tool integration into onboarding and training programs. Creating cultures where AI-assisted learning receives normalization and open discussion becomes essential. Providing training on effective AI usage that builds skills rather than creates dependency separates successful implementations from problematic ones. Organizations must address “AI impostor syndrome” proactively, celebrating how AI enhances rather than replaces human capability.

Individual professionals face parallel imperatives. Embracing AI tools as modern professional necessities (comparable to spreadsheets or search engines) establishes appropriate baseline expectations. Strategic deployment reduces anxiety-inducing knowledge gaps without fostering over-reliance. Maintaining personal learning and critical thinking alongside AI usage preserves cognitive development. Sharing AI usage openly helps normalize practices for others navigating similar transitions.

The Unfinished Transformation

Today’s landscape offers tools extending beyond acknowledgment toward active intervention against the information gaps fueling impostor feelings. The technology exists to accelerate learning, validate understanding, and redirect energy from anxious doubt toward meaningful work. The next cohort joining AWS or any demanding technical environment need not fight impostor syndrome with artificial constraints. Modern AI tools, deployed thoughtfully, provide unprecedented capability to cultivate confidence in learning rather than fixating on comprehensive knowledge.

The fundamental shift involves recognizing that impostor syndrome never concerned knowing everything. The syndrome centers on confidence to learn anything. Modern AI tools, applied with intention and awareness of their limitations, help cultivate precisely that confidence. Whether that potential becomes realized depends entirely on how organizations and individuals navigate the transition and how they embrace capabilities while guarding against new vulnerabilities the technology introduces.

The conversation about AI and impostor syndrome continues evolving as more professionals gain experience with these tools. Early evidence suggests genuine benefits alongside legitimate concerns. Neither uncritical adoption nor reflexive rejection serves workers navigating this landscape. The measured approach appears most likely to deliver on the technology’s promise without succumbing to its pitfalls.

Here’s what impostor syndrome gets wrong: if you’ve been selected for a strong team and have foundational competence, growth follows naturally when you commit to the work. The syndrome does not reveal the truth about your potential but it distorts your perception of where you currently stand. Most people experiencing it are precisely where they are supposed to be. They simply can’t see it yet.

How Modern AI Tools Could Transform Your Impostor Syndrome Journey was originally published in Mind In The Loop on Medium, where people are continuing the conversation by highlighting and responding to this story.

Hiring in the Age of AI: Why I have Always Hired for Problem-Solving Over Code

Riccardo Gatti — Sat, 21 Feb 2026 11:58:01 GMT

For years, my hiring approach drew skepticism from peers across the industry. While most interview panels grilled candidates on algorithms, I took a different path. While they obsessed over coding challenges, I hired brilliant engineers by barely asking them to code at all.

My Hiring Philosophy: Conversations Over Code

Across my career building tech teams, I developed what many considered a counterintuitive methodology. I minimized technical questions. I rarely asked candidates to recite syntax or solve algorithm puzzles. Those skills can be Googled in seconds or now generated by AI in milliseconds.

Instead, I focused on maximum soft skill assessment. Communication mattered more than code. Curiosity revealed more than credentials. Adaptability told me more about future performance than any past project list.

I let candidates talk extensively about their past work through deep project discussions. Not the surface details of what they built, but the deeper questions of why they made certain choices. I listened for the level of detail they could master. I paid attention to the challenges they articulated and I observed how they described navigating complexity.

Then came the deliberately impossible scenarios. I’d present problems completely outside their domain expertise. The scenarios were intentionally ambiguous and unsolvable with their current knowledge. The exercise revealed everything I needed to know:

How do they ask clarifying questions?
Can they break a massive problem into manageable pieces?
Do they embrace feedback or become defensive?
How do they handle uncertainty?

If candidates wanted to show code samples, fine. Optional code reviews had their place. But their thinking process mattered far more than their syntax preferences. The approach delivered results. I built teams of exceptionally capable engineers. Across dozens of hires, I struggle to recall a single hiring mistake using this method. Those engineers built exceptional things.

The Industry Catches Up

Now, in early 2026, the technology industry has caught up to what seemed like an outlier philosophy. Artificial intelligence transforms software development at unprecedented speed. The skills I always prioritized have become the consensus standard for what separates valuable engineers from replaceable ones. Problem-solving leads that list and adaptability follows close behind. Architectural thinking rounds out the essential trinity.

With vibe coding the shift accelerated rapidly. Vlad Balazs, who oversees engineering at Intuit, acknowledged his company is redesigning interview processes around this reality. The new assessments present more complex problems with the explicit expectation that candidates will use AI tools to complete them, he explained, because that mirrors how they’ll actually work once hired.

The commoditization of coding knowledge has exposed a truth about skill hierarchies that researchers at Harvard Business School recently quantified. Their analysis found that nearly 80 percent of the wage premium commanded by advanced technical skills depends on underlying foundational abilities: communication, critical thinking, problem-solving. These capabilities, increasingly termed “durable skills” to distinguish them from perishable technical knowledge, demonstrate markedly different longevity. Technology-specific skills carry a half-life under 2.5 years, while problem-solving and decision-making skills persist beyond 7.5 years, according to workforce development firm Guild’s analysis of labor market data.

What CTOs Actually Want Now

CTOs describe the transformation in remarkably consistent terms. Engineering leaders echo the same themes. A survey of technology executives revealed unanimous emphasis on critical evaluation over code generation. One chief technology officer insisted the focus had shifted “from pure coding ability to evaluating deep problem-solving acumen, architectural foresight and that uniquely human ability to question, reason effectively and adapt swiftly.” Another observed the inherent irony: “AI was supposed to make coding easier, but it’s actually making the thinking parts of development more valuable than ever.”

This dynamic extends beyond individual contributor roles into architectural work, where the stakes prove even higher. AI excels at generating functions but struggles (at the moment) with system-level design. Engineers in this domain must balance competing constraints. They must anticipate failure modes. They must make judgment calls about infrastructure choices that AI cannot make. System design capabilities are becoming expected of engineers at all levels, argued one analysis of hiring trends. As AI handles repetitive tasks, even junior developers must think at architectural scale. They need to guide AI agents effectively. They must understand how components interact within larger systems.

The New Demands of Architecture

Meanwhile, the nature of architectural work itself has evolved. Modern software architects must navigate challenges their predecessors never confronted. They design systems where AI agents interact with traditional code. They create feedback loops that improve model outputs. They understand constraints like token budgets. Semantic drift requires new strategies. The fundamental question remains unchanged, making decisions AI cannot make, noted an O’Reilly analysis in summer 2025. An AI can explain how to implement Kubernetes, but lacks the contextual judgment to determine whether the complexity serves a particular organization’s needs.

Research into workplace skills reinforces these patterns. Deloitte’s 2025 survey of young professionals found that while nearly two-thirds focus on building AI capabilities, more than 85 percent identified communication as more vital to long-term success. Empathy ranked similarly high. Leadership completed the essential triad. The technology excels at pattern recognition. Data processing comes naturally to AI but inspiring teams remains exclusively human. Understanding human impact requires emotional intelligence while creative problem-solving in unprecedented situations defies automation.

How Hiring Processes Are Adapting

Smart companies have begun redesigning their hiring processes accordingly. Leading firms now replace traditional coding tests with “problem-solving simulations” using real client scenarios. These assessments test interpretation skills. Requirement gathering reveals candidate capabilities. Some deliberately introduce errors into problem statements while others add ambiguities by design. They monitor whether candidates ask clarifying questions rather than charging ahead with assumptions. They evaluate code for risks. Functional correctness alone no longer suffices.

Zhi Sun, a startup founder who has interviewed hundreds of engineers, captured the transformation succinctly: “AI tools haven’t just changed how we code. They’ve changed what we should look for in engineers. The real challenge is knowing what to build, how to shape it, and how to ship it quickly.”

Marcel Weekes, Figma’s Vice President of Software Engineering, described how his teams leverage AI for pre-reviewing pull requests. The system catches redundancies before human reviewers see the code. It identifies inefficiencies automatically. The strongest developers learn to break down problems into smaller chunks for multiple AI agents to work on simultaneously, then synthesize the results. “One key skill going forward is spending time on documentation,” Weekes noted. “Providing additional context to LLMs matters more than ever, almost like you would help an intern ramp up on a problem.”

What This Means for Engineers

The implications ripple through career development strategies. Engineers who invested heavily in memorizing algorithms now face a landscape where those skills deliver diminishing returns. Syntax knowledge proves similarly devalued. The capacity to work with AI as a collaborative tool matters more than viewing it as a threat. Communication skills enable engineers to translate technical concepts across teams. Explaining architecture to non-technical stakeholders requires capabilities AI cannot replicate. They must identify edge cases and they need to spot the “confidently wrong” answers that large language models occasionally produce.

For hiring managers clinging to traditional assessment methods, the message carries urgency. Job descriptions emphasizing specific frameworks optimize for skills becoming less relevant by the month. Programming language requirements follow the same trajectory. On the other side, behavioral questions revealing how candidates handle ambiguity deliver more predictive value than algorithm memorization tests. Questions about learning from failure matter more than past successes. Collaboration with diverse teams predicts future performance better than solo coding prowess.

Looking back, the philosophy wasn’t unconventional at all, it was simply running on a different clock than the rest of the industry. It rested on a belief that technical skills were trainable given the right foundation. Problem-solving ability either developed over years of deliberate practice or proved largely innate. Curiosity could not be packaged into a boot camp curriculum. Adaptability didn’t come with a certification. As AI accelerates the commoditization of coding knowledge, what once felt like instinct has quietly become industry consensus.

The engineers built to thrive in this landscape are not the ones who code fastest or memorize the most frameworks. They are the ones who think deepest, ask sharper questions, and notice patterns others overlook. They bring clarity to messy problems and build systems that hold up under pressure. Most critically, they know when to trust AI output and when to push back on it. Those were the engineers I sought out then. They remain the ones every forward-thinking organization should be competing to hire right now.