Artificial Intelligence that can write stories and crunch data is spreading in newsrooms. That’s a good thing for journalists
In 2009, a team of researchers and students at Northwestern University developed software that fully automates the writing of baseball game recaps.
StatsMonkey relies on publicly available data, like box scores, and a series of pre-written partial story templates. It can analyze a team’s changing probability of winning, play by play, find important moments from a given game, and develop a narrative arc based on them. It then plugs in the names of the relevant teams and players and produces a surprisingly readable story.
The following year Northwestern professors Kristian Hammond and Larry Birnbaum used the underlying technology to start Narrative Science, a Chicago based company that develops programs for big business and newsrooms alike through Natural Language Generation. NLG relies on artificial intelligence to interpret data and then display it in a readable format like text.
Since then, technology that can automate narrative reporting, often referred to as “robot journalism,” has grown by leaps and bounds in both its spread and sophistication. The Associated Press now automates the writing of 4,400 earnings reports for private companies every quarter, roughly 15 times as many as they were able to release when the reports were manually written. The AP has also automated its coverage of minor league baseball games, allowing the news service to cover more games than was previously possible. The Los Angeles Times uses a program they call Quakebot to report on earthquakes using seismic data from the U.S. Geological Survey. The Washington Post, Forbes, Yahoo! News, Reuters and the New York Times, among others, have all used similar technologies to release both fully and partially automated stories.
But journalists who fear that such technology might render them obsolete are missing the point. xxx can rest easy. The applications of fully automated journalism are still limited and, if anything, allow reporters to focus on more interesting tasks than writing quarterly earnings reports.
“The state of the technology is very much in the template-based approach, if you think of madlib type of things.” said Nick Diakopoulos, an assistant professor at Northwestern University and the author of the upcoming book Automating the News: How Algorithms are rewriting the media. “Human journalists are writing templates that have gaps in them where data is inserted.”
The technology has a number of limitations. For automated journalism to work, readily available and highly structured data is necessary, which is why the technology lends itself to writing sports game recaps, weather reports and financial earnings reports. When such data is available, software can autonomously gather that data, perform basic analysis like comparing a company’s earnings from one year to the next, and use that data to produce fully automated stories by plugging numbers into human-made templates.
“That was one of the early promises of automated journalism, that every high school would have a story for their game and you wouldn’t have to send a reporter to every game,” said Joe Germuska, director of Northwestern University’s Knight Lab, where journalism and computer science students collaborate on tools for media makers. Germuska. “ But somebody actually has to structure the data and schools aren’t doing that.”
Outside the world of sports reporting, for local news outlets, structured data can often be hard to come by, even when dealing with government entities. Documents produced by local governments often contain unstructured data, which is sometimes published in formats that an algorithm would have to be tailored to recognize, begging the question of whether a small outlet’s resources would be best placed in technology with such limited applications.
Cost is another prohibitive factor. Fully automated journalism is almost strictly used by large news outlets, like Reuters and AP because of this barrier. And smaller news organizations have little need for technology which would allow them to produce the quantity of stories which automated journalism makes possible, like writing recaps for hundreds of minor league baseball games.
Yet the relatively new field of computational journalism, which automated journalism falls under the umbrella of, can offer many tools that aid journalists instead of replacing them entirely. While fully automated journalism has limited applications, other forms of artificial intelligence can help journalists with tasks in many facets of content creation.
Brent Jones, a journalist at KWMU, St. Louis’ NPR station, has spent the last seven years working on the tech side of news. After graduating from Southern Illinois University at Carbondale in 2007 with a degree in journalism, he began working at the St. Louis Beacon, largely doing web design.
“Every once in a while a project would come up where we would need a project page built or we’d need a fairly simple graphic, so I would create that,” said Jones. “That part of it really interested me, so I started working more as that. I went to a computer assisted reporting bootcamp at the investigative reporters and editors organization in Columbia at Mizzou … and that really kind of opened my eyes.”
When the St. Louis Beacon merged with KWMU, who already had an IT department, Jones pivoted to working with reporters on big-data projects and building internal tools to help reporters present and promote their stories.
“The data journalism is a big part of it, and seeing sort of how we can … take some of the burden off the reporters to perform some of those repetitive, boring, sometimes difficult tasks that a computer can do well,” Jones said. That’s really a thing that I see as mythical for me, is to figure out how I can make their lives easier by taking away some of that stuff and letting them so some of the more interesting and challenging work that is not easy for a computer.”
For a story on the podcast, “We Live Here,” Jones helped reporters analyze disciplinary data from schools across Missouri to determine how race affected the rate of suspensions and other disciplinary measures for students.
“Being able to program a little bit, we were able to look at the statistics across the entire state and look at how many suspensions were given out, combine that with enrollment statistics, and find where there were imbalances or where suspensions given to minority students were sort of out of proportion with their enrollment.”
Jones was able to build a database where listeners could zero in on their local school district and compare that data to the state-wide set.
Personalizing stories and using large swaths of data are not uncommon uses of computational assistance. While Jones’ work is not the most sophisticated example of such practices, it’s unique because of the small scale of the station he was working for and the limited resources he used to analyze data on a scale that would have near impossible without computational assistance given the size of the team he was working with.
The New York Times, for instance, used AI to publish a story that used a reader’s geolocation data to personalize the story based on where they were reading it. The story, published in 2015, examined how a children’s financial opportunities were affected by the area where they grew up. The actual text of the article would change to include data on cities nearby the reader as reference points. Like in a fully automated story, human-written templates and structured data to fill in the gaps left by the reporters for the personalized section.
AI can also help reporters find stories. Reuters developed a program called News Tracer to help them monitor social media to break stories faster than other outlets, as well as determine the veracity of viral trends. According to a report published by Reuters, the software “runs machine-learning algorithms on a percentage of Twitter’s 700 million daily tweets to find breaking news. These algorithms look for clusters of tweets that are talking about the same event and the tool then generates a newsworthiness rating, questioning whether the event is worth reporting.”
The software helps reporters verify the events in question by identifying likely eye witnesses and performing a sort of rudimentary background check by investigating the twitter accounts where a story originated using information like whether or not an account is verified, and looking at the accounts’ followers.
News Tracer led to Reuters getting a head start on stories like the San Bernadino shooting and a bombing in Brussels.
The breadth of possible applications of AI in newsrooms is still being explored by journalists and developers. At the Knight Lab, Germuska is currently overseeing the development of a project called “Watch Me Work,” which would perform background research on subjects in reporters’ stories as they type them.
“It’s still in pretty early days, but the idea is that it finds- sort of extracts – the subjects of your story from your text like the people and the nouns, things like that, said Germuska. “It is meant to be present while you’re working in, say, google doc, and to offer smart advice about information you might be needing for the work you’re working on with contextual clues and that kind of thing, but it’s not really aimed at doing the work for the journalist as much as making the journalist more effective.”
As the technology grows more sophisticated, and becomes more readily available, it is possible that smaller news outlets may begin to invest in technologies that assist reporters instead of writing stories for them.
AI has also been used to generate automated videos based on text, test out multiple headlines for a story to see which one generates the most clicks and create graphics which visualize data, among other things.
The spread of AI in newsrooms also raises a number of questions for journalists using the technology. For example, what should a byline look like for stories that were automated, either in full or in part? What is a news agency’s responsibility in terms of providing algorithmic transparency to their readers, that being allowing users to see what drives the algorithm generating the content in the stories they’re reading? Algorithms have the potential to be as biased as any reporter since they are made by humans, and often have to be taught to recognize patterns in data, which can itself be biased.
“If you train algorithms based on biased data, you’re going to get biased algorithms. And that is a thing we sort of have to be on the lookout for,” said Jones.
Algorithms can also be prone to errors, which reporters have to be especially careful to look out for. When content is being produced at a mass scale, even with a record for veracity, reporters and editors may grow lax when it comes to fact-checking.
In May 2015, The LA Times’ Quakebot falsely reported two earthquakes. Apparently, a large seismic event in the Pacific Ocean near Japan, affecting USGS equipment in California. Quakebot then published two reports stating that earthquakes of magnitudes 4.8 and 5.5 had hit California.
The incident reiterates the importance of human oversight of news-generating algorithms, even with software that has a proven track record.
Yet such instances are possible even for human reporters. A 2015 study conducted by German professor Andreas Graefe found that consumers preferred reading human-written content, but found automated news to be more credible. 986 participants were given articles, some correctly and some incorrectly labeled as being written by a human or an algorithm. The study found that participant’s opinions of the articles did not vary significantly based on whether the article was said to be written by a human or an algorithm, but that articles written by algorithms were consistently seen as less readable and more trustworthy than their human-written counterparts.
“I tend to be of the mind that you can’t put genies back in the bottle,” Germuska said. “So if there’s really a case where the tool does the job better than people, then great. Let the tool do that and people can do better stuff.”
It is unlikely that AI will take the jobs of human journalists anytime soon.
Andrea Guzman, a former journalist and professor at Northern Illinois University, as well as the author of several books on consumer interactions with AI, including automated journalism, has been following the development of such technology from its infancy.
While automated journalism used to be a discussion she would have once a semester in a few of her classes, she now teaches whole courses on the subject.
“People automatically jump to ‘well, machines are going to replace humans,’” Said Guzman, pointing out that such thinking lacks nuance. “But there’s also a lot of journalists using these as tools, right? And using them to help improve their reporting.”
Guzman recalled talking about automated journalism at a conference in 2015 with student journalists and veterans alike. After her lecture, a newspaper editor approached her and chided her for giving students “misinformation” about automated journalism becoming a larger part of newsrooms.
“You always want to be careful in discussing these things,” said Guzman. “‘What do we have now? Where could it go?’ The ‘where could it go,’ we know we’re not historically very good at predicting these thing. I mean, look at how newspapers and journalists reacted to the internet, you know, and how they thought that would end up.”
Ian Karbal is Chicago-based freelance journalist. He can be found on Twitter at @iankarbal.