By Joe LananeLMA Contributor

Machine learning could soon make it easier for news publishers to monitor the equity of their coverage in real-time.

A new content analysis tool from Lenfest Local Lab and the Brown Institute for Media Innovation could soon help outlets audit the language, locations and visuals used in news stories. The resulting data can provide insights for news leaders who ultimately want to address diversity and representation in their coverage. But this content analysis effort isn’t regularly executed in newsrooms because the process can be time-consuming and costly.


“The biggest challenge still is bandwidth,” said Sarah Schmalbach, director of the Lenfest Local Lab, a local news product and user experience innovation team. “That is why we’re building a tool that allows newsrooms to zoom out and analyze the representation of their coverage.”

Newsrooms may think they are aware of the diversity of their coverage, both in a geographic and racial context. But for all the emphasis on audience analytics, Schmalbach believes many news operations could spend more time evaluating how their own resources are allocated.

In an attempt to streamline efforts, the team behind the project is working on an open-source tool that can scan large content archives as well as recent stories for data. The Philadelphia Inquirer, which is owned by the Lenfest Institute for Journalism, will be first to test the tool which is being developed with support from the Google News Initiative Innovation Challenge.

“The tool will support newsrooms who want to have a broader look at their body of work,” Schmalbach said.

A more recent test product used natural language processing technology to process content data; however, language analysis alone couldn’t extract location as accurately as needed, said Michael Krisch, the deputy director of the Brown Institute, a collaborative effort between the Columbia University journalism program and Stanford University’s engineering school.

“One thing that became obvious was there’s no way to scale that process,” he said. “The insights just don’t translate correctly, so we started thinking about ways tech could assist editors.”


Machine learning proved more reliable in pulling location data from stories, he said. Once the technology is improved, it could help other newsrooms identify geographic coverage gaps more efficiently and adjust editorial strategies and business structures that might be contributing to underrepresentation.

“The goal is to make a tool that can move well beyond the Inquirer,” Krisch said. “The beauty of machine learning is that it’s able to adapt and learn from the environment it’s in.”

As part of the GNI Innovation Challenge, the Brown Institute is working with the Lenfest Local Lab to further improve the location data extracted by this tool. After improving location identification, the team hopes to incorporate outside datasets — such as U.S. Census stats — that add additional context to the analysis. That combination of data could also help beat reporters research and focus news coverage.


“Incorporating outside datasets could contextualize beat coverage in ways we haven’t been able to do before,” said Ana Graciela Méndez, Lenfest Local Lab special projects editor.

Once a large content set is analyzed, this tool could also be used to organize coverage around communities that provides many benefits to residents. For example, publishers might be able to deliver stories to readers about their neighborhood, school or part of town.

The project, once complete, could eventually complement similar industry efforts to analyze content in real-time, whether it’s content management systems that require reporters to manually enter source diversity data or dashboards that showcase this information.

“We are excited to be working alongside other DEI-focused Google GNI Innovation Challenge winners including Gannett, Next City and Bloom Labs because we are by no means the first or last people to be working in this space,” Schmalbach said.