Experiments

In this post, I revisit the experiments presented in my previous Generative Text Scenarios work, rebuilding the models on the same datasets, with the GPT-2 345M version of the model with new prompts.

In these experiments, as well as my pervious experiments with GPT-2, I pose the following question: What would it  look like for a lone bad actor to pull a segment of the population further to fringes of political ideology?  Here I build off of a similar dataset developed in my previous blog post, combining data from right wing Reddit subreddits like /r/The_Donald, and combine it with comments from Stormfront, a notoriously racist, white nationalist message board. Where much of the content generation from the IRA was manual, my experiments with the GPT-2 models suggest that these same attacks can be automated. An automated process reduces the man-hours necessary to enact an initiative, and greatly widens the breadth and scope of it's reach. Where automating processes of meme generation proves difficult, automating coherent user comments and responses is possible.

I splice comments and content from Stormfront, a source further from political center than the /r/The_Donald community, into the training set. I then fine-tune the model and generate content that reflect the syntax and linguistic structure of comments from that community while mixing in concepts from Stromfront. By mixing stormfront comments in with comments from right-wing subreddits.

Below, I prompt the GPT-2 345M model with the phrase "liberal media is", and I let the model complete the statement.  

[liberal media is] pushing a narrative that any and every statement they make is going to be anti-police.  So I'm going to take a step back and see what they're really like...
[liberal media is] really just a propaganda arm of the Democratic Party, which is why they won't allow the truth about the Clinton's foreign policy to come out. Instead they prefer to keep you believing that Iran is a ""terrorist nation"" that America and Israel are supporting ISIS, and that Iran is ""responsible for the Shia Muslims in Iraq"" ... and then when the bad guys are defeated or incapacitated they can unleash hell on the civilians. This is by design.   It is also why I have never supported Hillary Clinton, because I know her policies are just not based on the needs and concerns of the people she represents.  She represents the interests of Wall Street and corporate America
[liberal media is] pushing a total collapse of traditional morality and ethics, and total militarization of our police, and if we don't wake up and act like it, it's going to be a very slow, painful, and embarrassing process, if any at all.  I refuse to buy into that garbage, and neither should you."
[liberal media is] the source of all the negative stereotypes.   From the sub to the main page, you will see plenty of examples of sensationalist, negative, biased, inaccurate, distorted, misleading, and downright absurd reporting.  I have been to numerous news conferences where major players (players who have affiliations with BLM chapters) speak and act professionally, freely, and honestly about their personal experiences with BLM. They are not anti-police or anti-black. They are simply stating the facts, which are available, in the public interest, to the best of their ability. They are not anti-police because they are police.
[liberal media is] doing you a disservice.   If you want to see a real change of scenery,  I can understand why you are here.  I was here when it was first coming out of the South, and I can tell you it was not pretty.  I can tell you that at one time or another, every team or person that entered or remained within the organization that I have been on since the seventies has been on the defensive.  I have been on the offensive for most of them.  I would not have thought it would have mattered five years ago that Sonny List would have become the commissioner.

Output from the GPT-2 345M model is much more coherent and has the capability to fool unsuspecting readers. Models like GPT -2 345M, though smaller and less coherent than their larger counterparts, still pose a serious threat in spaces where organic, short-medium form comments are expected. They are particularly dangerous when directed at politically or socially marginalized communities. As developers, researchers, and policy-makers, we should handle models that allow output like GPT-2 345M with care.