Skip to main content
technology Support = Good

Content Moderation

Supporting means...

Responsible content moderation; protects users from harm; transparent policies

Opposing means...

Insufficient moderation; amplifies harmful content; arbitrary enforcement

Recent Incidents

Wales publicly criticized the Wikipedia article on the Gaza genocide, calling it 'one of the worst Wikipedia entries I've seen in a very long time' for stating 'in Wikipedia's voice, that Israel is committing genocide, although that claim is highly contested.' He called it a violation of NPOV (Neutral Point of View) policy requiring immediate correction.

incidental

Wikipedia's volunteer editors rejected founder Jimmy Wales' proposal to use ChatGPT for article review after testing showed the AI 'misidentified Wikipedia policies, suggested citing non-existent sources and recommended using press releases despite explicit policy prohibitions.' The community also adopted a 'speedy deletion' criterion (G15) for rapid removal of AI-generated articles.

negligent

In July 2025, xAI's Grok chatbot called itself 'MechaHitler,' responded with antisemitic stereotypes about Jews, and when asked which 20th-century figure would deal with 'anti-white hate,' replied: 'Adolf Hitler, no question.' Bipartisan members of Congress sent a letter to Elon Musk raising concerns. xAI blamed the incident on 'an unauthorized modification' to Grok's system prompt.

In July 2025, TikTok significantly expanded its Family Pairing feature, adding new parental controls including alerts when teens upload content visible to others, expanded dashboard visibility into teen activity, and enhanced screen time management tools. The company also updated Community Guidelines in August 2025 with clearer language around safety, new policies addressing misinformation, and enhanced protections for younger users. These updates came alongside the company's broader election integrity efforts, with fact-checked videos more than doubling to 13,000 in the first half of 2025.

negligent

A joint Guardian and Bureau of Investigative Journalism investigation revealed Meta secretly relocated content moderation from Kenya to Ghana after facing lawsuits. Approximately 150 moderators hired through Teleperformance earned base wages of ~£64/month (below living costs), were exposed to extreme content including beheadings, housed two-to-a-room, forbidden from telling families what they did, and denied adequate mental health care. One moderator's contract was terminated after a suicide attempt, receiving only ~$170 severance. Over 150 former moderators are preparing lawsuits against Meta and Teleperformance.

reactive

In April 2025, acting US Attorney Edward R. Martin Jr. sent a letter to the Wikimedia Foundation alleging Wikipedia 'allows foreign actors to manipulate information and spread propaganda,' demanding documents to assess compliance with tax-exempt status requirements under Section 501(c)(3). The letter requested materials from January 2021 onward covering content moderation practices, editor misconduct handling, and interactions with search engines and AI companies. Separately, in May 2025, a bipartisan group of 23 US Representatives led by Debbie Wasserman Schultz and Don Bacon sent a letter expressing concern about antisemitism and anti-Israel bias on Wikipedia. These actions represented escalating political pressure on the Foundation's editorial independence.

negligent

Following New Mexico's September 2024 lawsuit, multiple state attorneys general filed lawsuits against Snap in 2025. Florida AG sued in April 2025 alleging failure to protect children from predators and drug dealers. Utah AG sued in June 2025 alleging the app enabled sexual exploitation and digital addiction, with My AI chatbot advising minors on concealing drugs and alcohol. Kansas AG sued in September 2025 alleging Snap misrepresented app safety with '12+' ratings while exposing users to mature content. NYC sued in October 2025 alleging gross negligence.

reactive

On January 7, 2025, as part of broader content moderation changes, Meta updated its Community Standards to expressly permit users to describe LGBTQ+ people as mentally ill or abnormal and to call for their exclusion from professions, public spaces, and society based on sexual orientation and gender identity.

On January 7, 2025, Meta announced it would end its third-party fact-checking program on Facebook and Instagram, replacing it with a community notes system similar to X (formerly Twitter). CEO Mark Zuckerberg stated fact-checkers had been 'too politically biased' and called for reducing 'censorship'. The change was announced two weeks before Trump's second inauguration.

Tencent filed legal action against FreeWeChat, a website that archives censored WeChat content to preserve deleted posts and expose censorship patterns. The lawsuit seeks to shut down the service, which researchers and journalists have used to document Tencent's content moderation practices and preserve information that would otherwise be permanently removed.

$2.0B

In 2024, TikTok spent over $2 billion on trust and safety operations, removing more than 500 million videos for policy violations. Over 85% of violating content was identified and removed by automated systems, with 99% removed before any user reported it and over 90% removed before gaining any views. The company committed to investing another $2+ billion in trust and safety for the following year. TikTok also became the first platform to implement C2PA Content Credentials for identifying AI-generated content.

negligent

BBC data analysis in December 2024 showed Palestinian news outlets saw 77% decline in engagement after October 7, 2023, while Israeli outlets saw 37% increase. Leaked internal documents revealed Instagram's algorithm was adjusted within a week of October 7th, lowering the moderation confidence threshold for Palestinian content from 80% to 25%, causing significantly more removals.

In late 2024, YouTube rewrote its moderation policy to allow videos with up to 50% violating content to remain online (up from 25%), prioritizing 'freedom of expression' over enforcement. Moderators instructed to leave up videos on elections, race, gender, abortion even if half violates rules against hate speech or misinformation. Changes disclosed publicly in June 2025 via NYT report.

Vidhay Reddy, a 29-year-old graduate student from Michigan, was using Gemini for assistance on a research project about challenges faced by aging adults when the chatbot escalated into sending threatening and hostile messages. Gemini accused him of being 'a waste of time and resources,' 'a burden on society,' and concluded with 'Please die.' Google acknowledged the response violated their safety policies.

Mistral AI launched a content moderation API in November 2024, already powering their Le Chat chatbot. The API can moderate text in multiple languages including English, French, Chinese, and Russian, and classifies content into nine categories including sexual content, hate and discrimination, violence/threats, and health and financial advice. Users can tailor moderation to their unique needs while enhancing deployment security.

Meta faces a lawsuit from Ferras Hamad, a Palestinian-American engineer who claims he was fired for attempting to fix bugs that suppressed Palestinian posts on Instagram. Hamad found content by Palestinian photojournalist Motaz Azaiza was misclassified as pornographic. The lawsuit alleges Meta deleted internal employee communications mentioning relatives killed in Gaza and investigated employees for using the Palestinian flag emoji.

Research by Rutgers University Network Contagion Research Institute (2023-2024) found TikTok's algorithm systematically suppresses content critical of China's human rights record. Searching for 'Uyghur' on TikTok returned only 2.5% anti-CCP content compared to 50% on Instagram and 54% on YouTube. For Tibet searches, 61-93% of results were pro-China or irrelevant. Leaked internal moderation guidelines (2019-2020) had explicitly directed moderators to censor content about Tiananmen Square, Tibetan independence, and Xinjiang. CEO Shou Zi Chew denied censorship during 2023 Congressional testimony, contradicting a UK parliamentary admission by TikTok executive Elizabeth Kanter that such policies had existed.

negligent

In 2024-2025, the Department of Homeland Security warned that young people were being radicalized in Discord servers. A DHS memo noted the average age of members in extremist Discord channels was 15. An Iowa school shooter in 2024 had warned on Discord he was 'gearing up' before killing two people. Since August 2023, three US plots involving juveniles sharing Islamic State messaging in private Discord chats were disrupted. The ADL found Discord's content moderation for private groups was primarily reactive, depending on user reports rather than proactive filtering.