The Generative AI Policy Landscape in Open Source: 77 Organizations, No Consensus
Created: 2026-03-30 | Size: 9428 bytes
TL;DR
Kate Holterhoff at RedMonk surveyed 77 open-source organizations on their generative AI contribution policies. The headline: 57% are permissive, but the conditions are all over the map. Quality concerns, not copyright, dominate the conversation. Disclosure labels are fragmenting into at least six incompatible formats. And almost nobody has thought about what happens when fully autonomous agents start submitting PRs.
The Data at a Glance
Of 77 surveyed organizations, spanning foundations, individual projects, and standards bodies, the policy breakdown looks like this:
44 organizations permit AI-assisted contributions. 14 ban them outright. 10 haven't decided yet. The remaining 9 either have no formal policy or operate as standards bodies.
This isn't a clean split between "pro-AI" and "anti-AI" camps. The permissive category covers everything from "go ahead, no strings attached" to "mandatory disclosure, DCO sign-off, and you must be able to explain every line". The ban count nearly doubled as more system-level projects formalized their stance.
2025 Was the Inflection Year
Policy adoption didn't ramp gradually. It exploded.
Six policies existed in 2023. By the end of 2025, there were 50. The conversation shifted from "should we address this?" to "what specifically should we require?" Early movers like the Linux Foundation, Apache, and OpenInfra set the tone. Then 2025 brought the Linux Kernel, CPython, Rust Foundation, Fedora, curl, QEMU, Wikipedia, KDE, SciPy, and dozens more. The floodgates opened.
Quality Trumps Copyright
Here's what surprised me most: while copyright contamination dominates the headlines, quality is the primary concern for the vast majority of policies. Maintainers aren't worried about accidentally shipping GPL-tainted code from a training set. They're drowning in low-quality AI-generated submissions that waste scarce volunteer review time.
This tracks with what we've seen in LLM code generation benchmarks: models that score 88% on synthetic tests hit 30% in real-world settings. When contributors use AI to generate code they don't fully understand, the reviewer burden shifts downstream. The maintainer becomes the debugger.
Copyright does matter at the foundation level; the Linux Foundation and Eclipse Foundation explicitly address it. But for individual projects, quality review is the bottleneck, and that's what their policies target.
The Disclosure Fragmentation Problem
Even among the 32 organizations that require or recommend disclosure, there's no standard format. The commit-tag ecosystem is splintering:
| Label Convention | Count |
|---|---|
Generated-by: <tool> | 10 |
Assisted-by: <tool> | 7 |
| AI-assisted (commit msg) | 9 |
| Verbose disclosure block | 3 |
| Human submitter required | 10 |
| No standard / varies | 12 |
Six different conventions across 77 organizations. This makes cross-project analysis nearly impossible. If you wanted to audit how much AI-generated code exists in the open-source ecosystem, you'd need to parse half a dozen different tag formats, and that's only for projects that require disclosure at all.
Standardization here would be a huge win. A single AI-Assisted-By: <tool> trailer that every project adopts would enable ecosystem-wide visibility. We're not there yet.
The DCO as Legal Fulcrum
The Developer Certificate of Origin is emerging as the primary legal mechanism for handling AI contributions. Rather than creating new legal frameworks, organizations like the Linux Foundation and QEMU frame the question around whether the contributor can truthfully sign off that they have the right to submit the code.
This is pragmatic. The DCO already exists, developers already use it, and it places responsibility squarely on the contributor. If you used an AI tool and can certify the output is yours to submit, fine. If you can't, don't submit it.
Bans Cluster Around System-Level Code
The 14 outright bans aren't random. They cluster around:
- Kernels: Linux Kernel (partial restrictions), NetBSD
- Hypervisors: QEMU, Cloud Hypervisor
- System libraries: musl-libc, Gentoo
- Security-critical infrastructure
The pattern is clear: when bugs can brick hardware, enable exploits, or cascade through millions of downstream consumers, the risk calculus changes. These projects can't afford the reviewer burden of vetting AI-generated code that might look correct but subtly isn't.
Notably, even projects with strict code bans often carve out exceptions for documentation and translations, acknowledging that different contribution types carry different risk profiles.
The Agentic Blind Spot
This is the finding that should worry you most: almost no policy addresses agentic AI. The overwhelming majority assume a human is driving the tool: typing a prompt, reviewing the output, making a judgment call. Only a handful, like Matplotlib, explicitly address fully autonomous agents that submit code without direct human prompting.
This gap matters because agentic workflows are already here. GitHub's agentic workflows, Claude Code with subagents, Devin-style autonomous coders. These systems don't just assist a human developer. They plan, execute, test, and submit code with minimal human oversight. And the current policy landscape has almost nothing to say about them.
As agent capabilities improve and multi-agent systems become more sophisticated, the question isn't just "did a human use an AI tool?" but "was a human meaningfully in the loop at all?" Policies that only address copilot-style assistance are already behind.
The Decision Flow
Here's what the typical contribution pipeline looks like today for AI-generated code:
The complexity here is the point. There's no single answer. Each project navigates a maze of disclosure requirements, legal sign-offs, and quality checks, and every one does it slightly differently.
What This Means for Practitioners
If you're contributing to open source with AI tools:
- Check the policy first. More projects have one than you think. A rejected contribution wastes your time and the maintainer's.
- Disclose proactively. Even if not required, it builds trust and sets expectations for review.
- Understand what you submit. The universal requirement across permissive policies: you must be able to explain every line. If you can't, don't submit it.
- Watch the agentic frontier. If you're using autonomous agents to generate contributions, you're in largely uncharted policy territory. Proceed with caution.
If you're a maintainer:
- Having a policy is better than not having one. Even "we're thinking about it" gives contributors guidance.
- Quality gates matter more than disclosure labels. A well-reviewed AI contribution beats a poorly-reviewed human one.
- Start thinking about agents now. The copilot era is already giving way to the agent era.
References
- The Generative AI Policy Landscape in Open Source - Kate Holterhoff, RedMonk (original source)
- AI Slopageddon and the OSS Maintainers - Kate Holterhoff, RedMonk
- open-source-ai-contribution-policies - Melissa Weber Mendonca, GitHub
- Awesome LLM Policy - CHAOSS Working Group on AI Alignment
- Your LLM Scores 88% on Code Benchmarks. In Production, It Hits 30%. - Daita blog
- The Evolution of Continuous Delivery: Embracing Agentic Workflows - Daita blog
- Agent Skills: The Paradigm Shift Hiding in Plain Text - Daita blog