I believe much of the focus concerning generative language models in relation to false information have been centered around several misconstrued premises:

1. It will take a state actor to develop a language model system that will have large scale political or social impact

2. We should be more concerned about the use of language models in the development of "fake news" articles, rather than other forms of false content

3. Large generative language models will serve as a strong defense against large generative language models

These three misplaced concerns are inherently linked on the overarching focus on 'fake news' as full-length, coherent, traditional news articles. I suggest that we should not consider these models in a strict relationship to "fake news" articles but also fake content, more generally, including false or misleading article summaries, fake user-generated comments, fake and misleading reviews, and falsified profiles, and fake or misleading micro-blogs. In particular, we should concern ourselves with generated organic user posts and comments.

In this blogpost, I focus on conversations regarding the recent releases of Grover. This post is not an indictment of the release strategy of the model as I found the release strategy to be well planned.  Rather, I comment on the discussions surrounding the model releases, and troubling trends in prioritizing model releases in strict relation to full-length fake news articles.

Premise 1: It will take a state actor to develop a language model system that will have large scale political or social impact

Low Costs: The recent release of Grover suggests that we are in an 'Era of Neural Disinformation' where the prices of developing the datasets to create large generative models, and developing the models themselves are relatively low. This is certainly true. In fact, my experiments suggest that the costs for developing the training data in their paper are overstated. Regardless, the compute costs for developing a full-scale GPT-2 model is around $43,000.  Similarly, obtaining the data required to train Grover through Common Crawl cost $10k in AWS credits, and training the model itself  cost of $0.30 per TPU v3 core-hour and two weeks of training, at a total cost is $25k. However, the cost to fine-tune the smaller models developed by OpenAI range between $5 and $15 on an V100 AWS GPU, or free on a Google CoLab GPU. Similar techniques will undoubtedly be applied to Grover, where fine-tuning the full model will produce much more coherent results.

Low Technical Expertise: The technical expertise necessary to fine-tune these models, scaffold, and operationalize them are also low and require only basic machine learning and coding knowledge.  I have shown previously that the barrier to entry for developing systems to disseminate misleading posts and comments is nearly at novice/junior software developer levels. After the release of GPT-2 there was a flurry of activity to develop scripts for fine-tuning and operationalizing the model, including python packages and one-liner command line interfaces. Actors with basic coding knowledge can easily gather data and fine-tune these models, and users with little-to-no coding skills can download and run operationalized models with simple terminal commands.

Premise 2: We should be more concerned about the use of language models in the development of "fake news" in article form rather than other forms of false content.

There is an assumption that the largest threat of false information is fake news, in the form of false full length articles, released on fake or alternative news sources. In fact, the design strategy of Grover is to condition the model on news article metadata such as plausible sites, author lists, or titles, and to then allow the model to generate output.

I don't intend to diminish full-length fake news articles as a threat. However, I argue that the concentration on full-length articles is misplaced. To date, fake content has been proven to be more impactful in the form of short opinionated comments than other forms of false information. Organic posts had the largest impact, compared to political ads or images or new stories, for the Internet Research Agency's (IRA) interference in the 2016 election. Research done on the amount and type of attacks perpetrated by the IRA throughout the previous presidential election show that there was a shift in strategy from advertisement-based attacks to organic posts.  "In 2016, the average monthly volume of live ads was more than double the 2015 level and remained similar in 2017." However, "the monthly volume of organic Facebook posts rose steadily between 2015 and 2017. Between 2015 and 2016, monthly organic post volume increased almost sevenfold and continued to rise rapidly into 2017."  

Consistently producing passable full-length fake news articles will prove to be much harder than fake user generated comments. Whereas fake full-length news articles should be: 1. in the target domain, 2. up to date with current events, 3. factually plausible, and 4. posted to a site with some institutional credibility, the bar is not as high for user based comments. Ultimately, operationalized models will have to fight model rot that moves at or near the speed of the news cycle. Alternatively, I have shown that generated comments from the 117M model, when posted in the wild, are passable as real user comments. To support this theory, I developed a reddit bot that would read comments from subreddits and respond with generated comments, using the initial reddit comment as a prompt for the model. These generated comments typically pointed towards a general political ideology rather than a particular current event. As such the comments were more plausible. Further information on the experiment can be found here, and a followup on the GPT-2-345M model can be found here.

Premise 3:  Large generative language models will serve as a strong defense against large generative language models

Defending Against Neural Fake News provides clear evidence that Grover performs best in defending against Grover's fake news. As stated in the paper, Grover's discriminator detects roughly 90% of text generated by Grover across all ranges and sizes. While this is an impressive result in theory, in practice identifying and removing Grover generated false information will prove to be a challenging task.

What should platforms do? Video-sharing platforms like YouTube use deep neural networks to scan videos while they are uploaded, to filter out content like pornography (Hosseini et al., 2017). We suggest platforms do the same for news articles. An ensemble of deep generative models, such as Grover, can analyze the content of text – together with more shallow models that predict humanwritten disinformation. However, humans must still be in the loop due to dangers of flagging real news as machine-generated, and possible unwanted social biases of these models.

This proposed model could stand when assuming that:

A. Most platforms have the capacity to run discriminators like Grover's across their text, presumably with a fast response to remove the items

B. The discriminator will mainly be applied to fake news articles rather than generated comments, summaries, and other content

However, I argue that it would be generally infeasible.

A. Most platforms have the capacity to run discriminators like Grover across their text, presumably with a fast response to remove the items:

While one could assume that larger platforms like Facebook and Twitter might operationalize Grover to detect false information, smaller online communities and platforms will certainly remain at risk. In the same breath, many of the marginalized communities that are vulnerable to false information are not on the major platforms.

Fringe online communities serve as vulnerable targets in false information campaigns. By identifying a community that is already vulnerable to conspiracy and polar political manipulation, a bad actor could generate comments that pull that community further away from mainstream political engagement. We have seen this tactic used by the IRA in the recent election. A concerning aspect in the development generative text models is the emphasis on machine learning as a defense against their output. Because fringe communities tend to congregate on fringe message boards, this leaves them particularly vulnerable to attack. Where Facebook and Twitter have community standards and guidelines for posting content, and the technical and staff infrastructure to enforce them, many websites that host vulnerable communities do not.

Additionally, communities without strict guidelines, as well as self-policed communities, have a significant impact on mainstream websites and effect the flow of false information. False news stories on Reddit, 4Chan and other message board sites have higher rates of posts from alternative news sources like breitbart.com, rt.com, infowars.com, and sputniknews.com.  These posts are then spread to more reputable sites and communities. While there have been few studies on the spread of user comments and sentiment from message boards like Reddit and 4Chan to larger platforms, the logic should follow. However, where fake full-length news stories can be verified and removed from reputable site, false user-generated information spread in comment sections could gain much quicker traction and could prove much harder to put out. Historically, we have seen this pattern play out with Pizzagate and QAnon. Whereas institutional news sources discredited the false information, the organically generated false information continues to thrive.

Lastly, regardless of the type of information, be it full length news stories or otherwise, it has been shown that correcting misinformation does not necessarily change people’s beliefs. Further, any repetition of misinformation, even in the context of refuting it, can be harmful. The suggestion that releasing a model that excels at generating fake news stories will also defend against fake-news stories presupposes that removal will be faster than user engagement. While this touches on the design of systems to detect false information as much as the model itself, it should be strongly considered.

B. The discriminator will mainly be applied to fake news articles rather than generated comments, summaries, and other content:

Detecting fake news is half of the battle when developing production systems. In addition, it is necessary to verify and remove the false content, preferably before the content is viewed by other users. While a system could be automated using Grover to remove full length fake news articles, consider scenarios where the same models are used to produce fake user comments. Whereas strict guidelines determine what image and video content users can upload to a site, user comments, particularly comments that are not overtly inflammatory, are not as cut and dry. Further, many users (correctly or incorrectly) tie content in written form to free speech.  Removing false comments in practice means developing models with near 1 false positives. Removing or flagging organic user posts (particularly pertaining to politics) as Grover generated text could result in larger issues for platforms. Lastly this approach assumes that the generative model will be used for creating coherent text over denial-of-information attacks or automated of social engineering attacks.


Ultimately, "focus[ing] on text-only documents formatted as news articles: stories and their corresponding metadata that contain purposefully false information" is not a broad enough premise. By choosing to ignore more prominent false information types and focus solely on "fake news" articles, we run the risk of exposing models that can be used in a myriad of other nefarious ways, without considering how defending against them might unfold in practice.


  • "Better Language Models and Their Implications,” OpenAI, 14-Feb-2019. [Online]. Available: https://openai.com/blog/better-language-models/. [Accessed: 31-May-2019].
  • J. Practice et al., “Combating Fake News: An Agenda for Research and Action.” .
  • “Community Standards | Facebook.” [Online]. Available: https://www.facebook.com/communitystandards/. [Accessed: 31-May-2019].
  • “Generative Text Scenarios - GPT-2,” Matthew Kenney, 14-Apr-2019. [Online]. Available: http://www.mattkenney.me/gpt-2/. [Accessed: 31-May-2019].
  • S. Zannettou et al., “On the Origins of Memes by Means of Fringe Web Communities,” arXiv:1805.12512 [cs], May 2018.
  • H. G. Oliveira, D. Costa, and A. M. Pinto, “One does not simply produce funny memes!,” p. 8.
  • M. Woolf, Python package to easily retrain OpenAI’s GPT-2 text-generating model on new texts: minimaxir/gpt-2-simple. 2019.
  • “r/MachineLearning - [P] Python package to easily retrain OpenAI’s GPT-2 text-generating model on new texts + Colaboratory Notebook to use it w/ GPU for free,” reddit. [Online]. Available: https://www.reddit.com/r/MachineLearning/comments/bf137p/p_python_package_to_easily_retrain_openais_gpt2/. [Accessed: 31-May-2019].
  • “Republicans seem more susceptible to fake news than Democrats (but liberals, don’t feel too comfy yet),” Nieman Lab. .
  • E. Chandrasekharan, M. Samory, A. Srinivasan, and E. Gilbert, “The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data,” in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems  - CHI ’17, Denver, Colorado, USA, 2017, pp. 3175–3187.
  • “The Future of Truth and Misinformation Online | Pew Research Center,” 19-Oct-2017. .
  • P. N. Howard, B. Ganesh, D. Liotsiou, J. Kelly, and C. François, “The IRA, Social Media and Political Polarization in the United States, 2012-2018,” p. 47.
  • D. J. Flynn, B. Nyhan, and J. Reifler, “The Nature and Origins of Misperceptions: Understanding False and Unsupported Beliefs About Politics: Nature and Origins of Misperceptions,” Political Psychology, vol. 38, pp. 127–150, Feb. 2017.
  • Flynn et al. - 2017 - "The Nature and Origins of Misperceptions Understa.pdf.”
  • R. DiResta et al., “The Tactics & Tropes of the Internet Research Agency,” p. 101.
  • “The Twitter Rules.” [Online]. Available: https://help.twitter.com/en/rules-and-policies/twitter-rules. [Accessed: 31-May-2019].
  • S. Zannettou et al., “The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources,” arXiv:1705.06947 [cs], May 2017.
  • S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis, “The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans,” Journal of Data and Information Quality, vol. 11, no. 3, pp. 1–37, May 2019.
  • S. Torrence, train gpt-2 in colab. Contribute to CaptainValor/gpt-2-colab development by creating an account on GitHub. 2019.
  • Brundage, Miles, Jack Clark, Gregory C Allen, Carrick Flynn, Sebastian Farquhar, Rebecca Crootof, and Joanna Bryson. n.d. “The Malicious Use of Artificial Intelligence : Forecasting , Prevention , and Mitigation,” no. February 2018.