AI spies questionable science journals, with some human help • The Register

About 1,000 of a set of 15,000 open access scientific journals appear to exist mainly to extract fees from naive academics.

A trio of computer scientists from the University of Colorado Boulder, Syracuse University, and China’s Eastern Institute of Technology (EIT) arrived at this figure after building a machine learning classifier to help identify “questionable” journals and then conducting a human review of the results – because AI falls short on its own.

A questionable journal is one that violates best practices and has low editorial standards, existing mainly to coax academics into paying high fees to have their work appear in a publication that fails to provide expected editorial review.

As detailed in a research paper published in Science Advances, “Estimating the predictability of questionable open-access journals,” scientific journals prior to the 1990s tended to be closed, available only through subscriptions paid for by institutions. 

The open access movement changed that dynamic. It dates back to the 1990s, as the free software movement was gaining momentum, when researchers sought to expand the availability of academic research. One consequence of that transition, however, was that costs associated with peer-review and publication were shifted from subscribing organizations to authors.

“The open access movement was set out to fix this lack of accessibility by changing the payment model,” the paper explains. “Open-access venues ask authors to pay directly rather than ask universities or libraries to subscribe, allowing scientists to retain their copyrights.”

Open access scientific publishing is now widely accepted. For example, a 2022 memorandum from the White House Office of Science and Technology Policy directed US agencies to come up with a plan by the end of 2025 to make taxpayer-supported research publicly available.

But the shift toward open access has led to the proliferation of dubious scientific publications. For more than a decade, researchers have been raising concerns about predatory and hijacked [PDF] journals. 

The authors credit Jeffrey Beall, a librarian at the University of Colorado, with applying the term “predatory publishing” in 2009 to suspect journals that try to extract fees from authors without editorial review services. An archived version of Beall’s List of Potentially Predatory Journals and Publishers can still be found. The problem with a list-based approach is that scam journals can change their names and websites with ease.

In light of these issues, Daniel Acuña (UC Boulder), Han Zhuang (EIT), and Lizheng Liang (Syracuse), set out to see whether an AI model might be able to help separate legitimate publications from the questionable ones using detectable characteristics (e.g. authors that frequently cite their own work).

“Science progresses through relying on the work of others,” Acuña told The Register in an email. “Bad science is polluting the scientific landscape with unusable findings. Questionable journals publish almost anything and therefore the science they have is unreliable. 

“What I hope to accomplish is to help get rid of this bad science by proactively helping flagging suspected journals so that professionals (who are scarce) can focus their efforts on what’s most important.”

Acuña is also the founder of ReviewerZero AI, a service that employs AI to detect research integrity problems.

Winnowing down a data set of nearly 200,000 open access journals, the three computer scientists settled on a set of 15,191 of them.

They trained a classifier model to identify dubious journals and when they ran it on the set of 15,191, the model flagged 1,437 titles. But the model missed the mark about a quarter of the time, based on subsequent human review.

“About 1,092 are expected to be genuinely questionable, ~345 are false positives (24 percent of the flagged set), and ~1,782 problematic journals would remain undetected (false negatives),” the paper says.

“At a broader level, our technique can be adapted,” said Acuña. “If we care a lot about false positives, we can flag more stringently.” He pointed to a passage in the paper that says under a more stringent setting, only five false alarms out of 240 would be expected.

Acuña added that while many AI applications today aim for full automation, “for such delicate matters as the one we are examining here, the AI is not there yet, but it helps a lot.”

The authors are not yet ready to name and shame the dubious journals – doing so could invite a legal challenge.

“We hope to collaborate with indexing services and assist reputable publishers who may be concerned about the degradation of their journals,” said Acuña. “We could make it available in the near future to scientists before they submit to a journal.” ®


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *