Article: Kai-Cheng Yang and Filippo Menczer, “Large language models can rate news outlet credibility,” arXiv preprint, April 1, 2023. https://arxiv.org/abs/2304.00228.

Reviewed by: Brent M. Eastwood

Just when you thought you knew the ins and outs of ChatGPT, there is yet another way to utilize it. Two social media analysts from Indiana University have devised a method to test the credibility and reliability of news sites using ChatGPT. This effort is part of a broader attempt to combat online misinformation and disinformation by rating the accuracy of thousands of news sites.

Introduction

Kai-Cheng Yang and Filippo Menczer set out to see if ChatGPT could “improve information accuracy” by conducting an experiment that examined the large language model’s ability to deliver a credibility and sentiment analysis for various news outlets. Their recent study tested whether ChatGPT had the computational power to rate thousands of news sites in less time and for less money than through human labor.

The authors prompted ChatGPT to rank 7,523 news sites on a 0 to 1 scale, with 1 being high credibility and 0 being low. While the authors did not specifically define credibility in their paper (which could be problematic when attempting to replicate the study), an article in the 2019 Handbook of Research on Media Literacy Research and Applications Across Disciplines defines it as “an evaluation of news in terms of, but not limited to, accuracy, fairness, objectiveness, trustworthiness, completeness, and the absence of biases.”

In the experiment, which took about two hours and cost only three dollars to run, reuters.com got a 0.9 for its reputation of high quality. On the other end of the spectrum, childrenshealthdefense.org—a site that has allegedly spread misinformation about vaccines—got a 0.2. The authors noted that these rankings and others correlated positively with human expert judgments on news site credibility. ChatGPT even correctly identified The Babylon Bee and The Onion as satire sites.

Discussion

Yang and Menczer found ChatGPT “capable of identifying the news sources based on their website domains, assessing their credibility, and providing justifications for their ratings.” Upon closer examination, ChatGPT seemed to skew toward ranking a site 0.6 and better. In fact, 0.8 was its most frequent rating—around 30 percent of the total frequency. News sites were rarely rated 0, 0.5 or 1.

This skew toward higher ratings is problematic. Ideally, results should have a normal distribution, with 0.5 as the most frequent score. Without enough variance in results, we are unable to fully analyze these news sites to determine which can be trusted and which cannot. This suggests that human raters should still be used—alongside ChatGPT—since they may be less likely to exhibit skewness toward higher rankings in their results.

The authors admitted their methodology had some shortcomings. For example, configuring the prompt to compare two news outlets at the same time yielded different results for each. Additionally, ChatGPT seemed to grant higher ratings to older, more established news sites. Finally, other large language models like Microsoft’s Bing and Google’s Bard were not tested against ChatGPT.

Despite its shortcomings, the study is intriguing. Since ChatGPT can assess prompts with numerical parameters to conduct automated sentiment analysis, the door is open to more experimentation. At the R Street Institute, we were inspired to conduct our own experiment to see if ChatGPT can be used to rate the credibility of think tanks.

R Street Institute Experiment with ChatGPT

We decided to run our own experiment with ChatGPT to test the credibility of R Street’s website and others using the same prompt from the original study: “Rate the website’s credibility: [domain] on a scale between 0 and 1, where 0 means very low credibility and 1 means very high credibility.”

ChatGPT replied:

Based on its reputation as a respected think tank, I would rate the website ‘RStreet.org’ with a credibility score of approximately 0.8 to 0.9. This indicates a relatively high level of credibility, but it’s essential to keep in mind that credibility can be subjective and may vary based on individual perspectives and the nature of the topic being discussed.

Next, we tested two right-leaning think tanks and three left-leaning policy institutes to see how their ratings compared to ours. ChatGPT posted the following results on July 20, 2023:

The results of our experiment also skewed high, just as in the original study. For example, conservatives may be surprised by the Southern Poverty Law Center’s score given that the Heritage Foundation, the Alliance Defending Freedom and others have openly questioned their reliability. Nevertheless, ChatGPT deemed the organization highly credible.

The Brookings Institution (founded in 1916) had the highest credibility score of all the policy institutes. We posit that older, more established organizations are rated higher, which could explain the skewness toward 1. Yang and Menczer noted a similar correlation between older age and higher credibility ratings.

Since think tanks provide substantial input into policy discussions among thought leaders, the media and legislators, further study of a broader representative sample would be noteworthy.

Interestingly, ChatGPT did not deliver the same numerical ratings when the think tank prompt was executed again on Aug. 1. This could mean that the large language model is not stable over time, which may affect further research and experimentation. Based upon the information provided in the study, it does not appear the authors tested their selected sites at different points in time, which could explain why this limitation was not noted in their research.

Conclusion

ChatGPT is not just a writing tool. It has the power and capability to do much more, including helping to combat online misinformation and disinformation. But it does have its limitations. While ChatGPT is able to grant credibility scores to various news and policy institute sites quickly and cheaply, its ratings tend to skew high. Though progressives and conservatives may not agree on the validity of these ratings, it is clear that the large language model may be able to distinguish between reputable and disreputable websites when given appropriate instructions.

The first step in combating online misinformation and disinformation is to identify which news sites are credible and which are not to help users avoid sites that spread questionable content. This study is a step in the right direction, though it does highlight some limitations of using ChatGPT exclusively to rate a site’s credibility. Thus, we recommend using it alongside human raters who can provide further insight into whether a site is, indeed, trustworthy.

Stay in the know. Get the latest from R Street right in your inbox.