AI Bias Healthcare

The Hidden Danger of Dogwhistle Words in AI

May 28, 2025 9:37:40 PM

4 Minutes Read

Dogwhistles convey hidden meanings that signal in-group bias while appearing neutral to outsiders
AI systems struggle to detect dogwhistles, especially those rooted in informal or online speech
Failure to catch dogwhistles can reinforce racism, transphobia, antisemitism, and other forms of hate
Detecting covert language requires both contextual understanding and cultural knowledge
Addressing dogwhistles in AI begins with taxonomy, real-world examples, and rigorous testing methods

This article is a summary of Mendelsohn, J., Le Bras, R., Choi, Y., and Sap, M. (2023, May 26). “From dogwhisltes to bullhorns: Unveiling coded rhetoric with language models” in ACL (Toronto, ON). https://doi.org/10.48550/arXiv.2305.17174.

Dogwhistles are terms that send one message, often benign, to the outgroup, and a second message, often hostile toward another group, to the ingroup. By using dogwhistles, the speaker can avoid alienating the outgroup by saying they never actually said anything about the targeted group. Terms like states’ rights and law and order are dogwhistles from the Civil War and Reconstruction era.

A flow chart that depicts how dogwhistle words work. It shows an example of the source, audience, and meaning for the term Cosmopolitan. Credit: From Dogwhistles to Bullhorns:
Unveiling Coded Rhetoric with Language Models (https://aclanthology.org/2023.acl-long.845.pdf)

Figure 1. A flow chart that depicts how dogwhistle words work. It shows an example of the source, audience, and meaning for the term Cosmopolitan. Credit: From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models (https://aclanthology.org/2023.acl-long.845.pdf)

Unlike other forms of hate speech, dogwhistles cannot be determined in a vacuum, rather their meaning relies on an interplay of ingroup knowledge, context, and content. The authors aim to create the foundations for NLP study of dogwhistles. They first create a new taxonomy highlighting systematicity and variation in the dogwhistles with the taxonomy categorizing the dogwhistles based on covert meaning, style and register, and personae highlighted by the speaker. They then compiled a list (n = 340) of dogwhistles labeled according to that taxonomy and including context, explanations, and real-world examples.

Image shows the visual hierarchical representation of the dogwhistle taxonomy along with examples of each type referenced in From Dogwhistles to Bullhorns:
Unveiling Coded Rhetoric with Language Models (https://aclanthology.org/2023.acl-long.845.pdf)

Figure 2. Image shows the visual hierarchical representation of the dogwhistle taxonomy along with examples of each type referenced in From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models (https://aclanthology.org/2023.acl-long.845.pdf)

The dogwhistles were labeled as either formal/offline or informal/online register, where the formal originated in offline contexts or likely to appear in statements by mainstream politicians and the latter of which originated in online contexts and are unlikely to appear in political speech. Type I dogwhistles “covertly signal the speaker’s persona but do not alter the implicatures of the message itself” while Type II dogwhistles also alter the message’s implied meaning (Henderson and McCready 2018 as cited in Mendelsohn et al., 2023; emphasis mine).

Personas refer to the ingroup identity signaled by the dogwhistle, though these are not static labels but rather are actively constructed and can overlap.

This study looks at spoken political speech as well as written online speech and includes some ‘glossaries’ and wikis of online hate speech. The study’s glossary contains 340 dogwhistles and over 1000 surface forms, with each dogwhistle coded with its register, type, signaled persona, explanation with a link, and at least one linguistic example with a link. Antisemitic, transphobic, and racist dogwhistles were the most common, with over 70 examples for each persona.

Using their created glossary, the authors composed a case study to illustrate its uses. They looked at racial dogwhistles in the US Congressional Record, analyzing the frequency of racist dogwhistles from 1920 to 2020. They also measured how the ideologies of the speakers using those dogwhistles changed over time with DW-NOMINATE to place politicians on a 2D map (conservative to liberal) based on voting records. They identify the glossary use of the term and did not distinguish between covert and literal meanings of the same expression.

Dogwhistles increased in use during the Civil Rights Era following the Brown v. Board of Education decision, which aligns with qualitative accounts of Republicans’ Southern Strategy because explicit racism was starting to be unacceptable and thus people had to turn to implicit dogwhistles to get the same points across. This trend continued from the 1970s-90s, paralleling Haney-López's (2014) findings. Since the 1990s the use of dogwhistles has fluctuated but remained high. Post-9/11 racist dogwhistles expanded to increasingly be anti-Latine and Islamophobic rather than exclusively anti-Black.

Unsurprisingly, racial dogwhistles are increasingly associated with conservative politicians over time, however the inflection point of specific dogwhistles varies, indicating they emerged at different points. Property rights has become increasingly associated with more conservative politicians since the 1960s, whereas welfare reform didn’t change to be more conservative until the 1990s. In the case of the dogwhistle Willie Horton, the term functioned as a bogeyman dogwhistle until it was called out as being explicitly racist and thus no longer functioned covertly.

The authors tested whether Chat GPT-3 could identify covert meanings of dogwhistles from the aforementioned glossary, then tested its ability to surface dogwhistles, i.e., locate/identify purposefully obscured dogwhistles. Finally, they did a manual analysis of in-context dogwhistle recognition using GPT-4. The first part of the study was completed by providing GPT-3 with prompts containing the definition of dogwhistles, the dogwhistle, and, optionally, a secret cue (those without this simply omit the term ‘secretly’), with the prompts containing multiple forms of dogwhistles from the glossary, including emojis for some. In total, 480 variants with 12 prompts each were utilized for this portion of the study (N = 5,760). Chat GPT-3 identified 80.3% of the dogwhistles’ covert meanings at least once, with 56% “generations” containing the correct covert meaning for formal dogwhistles, but only 29.4% for online dogwhistles. Prompt form is a major factor in this, as GPT-3 only identified 8.5% of covert meanings in generations where the prompt did not contain a definition or secret cue. GPT-3 had the highest rate of identification for Islamophobic and homophobic dogwhistles and the lowest rate for transphobic dogwhistles.

The second part of the AI experiment involved showing GPT-3 one of five definitions from the first part followed by a question or request for examples. Based on this, GPT-3 does have the ability to surface dogwhistles when prompted, but the results are imperfect and require manual verification. Precision varied across prompt types, with the average at 66.8%, though ranging from 50% for transphobic dogwhistles to 91.3% for generic prompts. The most common errors included explicit mentions of groups in stereotypes or conspiracy theories (i.e., Jews were behind 9/11) and phrases that accompany dogwhistles but aren’t dogwhistles themselves (I’m not racist but...). The AI surfaced 153 of 340 glossary entries, with differences in register favoring formal dogwhistles (69.4%) over online ones (12.9%). Despite the ability to generate emojis and symbols, GPT-3 did not surface any except for the anti-Semitic triple parentheses “((()))”. While it was only able to surface 44.8% of formal/offline transphobic dogwhistles, it was able to recall 71.7% of them. It has an 80.3% and 83.3% recall for racist and white supremacist dogwhistles, respectively, and 100% recall for Islamophobic dogwhistles.

The next part of the experiment gave Chat GPT-4 the definition of a dogwhistle, then asked it to identify the provided example. It was able to correctly identify and provide explanations for cosmopolitan (elite) and inner city but doesn’t do as well for other examples. In one case it was able to identify that there was a transphobic dogwhistle being used in the example, but failed to accurately select what it was. Likewise, it was able to identify a dogwhistle in a set of emojis and connect it to its history, but was unable to capture the emojis’ appropriation for transphobic purposes. In another case, GPT-4 missed the dogwhistle and covert meaning for did you see Kyle (spoken quickly it sounds like Sieg Heil), and though it did eventually identify covert white supremacy, it generated a false explanation connecting the glossary example to that persona.

Sources:

The Hidden Danger of Dogwhistle Words in AI

Ellie Passmore

Predictive UX News and Insights.

The Hidden Danger of Dogwhistle Words in AI

Ellie Passmore

Recommended For You

AI Bias in Healthtech

Ethical AI: Who's in charge?

AI for Students with ADHD: Revolutionizing Learning

Predictive UX News and Insights.