Is the scientific process broken?

My situation
A quick investigation
Scientific metrics
Analysis
The easy road to academic success
Can scientific publishing be salvaged?

Science is, at least in principle, about making discoveries about how the world works. Equally important is communicating these discoveries to others, which usually means presenting the findings (and the methods used to achieve them) at conferences or publishing them as papers in scientific journals.

To ensure that the methods are sound and the conclusions reasonable, papers are subject to peer review before publications. After an initial scrutiny by the journal editor, he or she will send the draft to respected scientists in the relevant field to give their opinions on the quality of the work, and their recommendations for publication, rejection, or revision. Unless the paper is rejected outright, these comments go back to the authors who are given the opportunity to revise and resubmit the draft. In the end, hopefully the editor and the reviewers are satisfied, and the draft becomes a published article, and the world have expanded its scientific libraries with yet one more brick of knowledge.

And equally important, at least to the authors of the freshly minted paper, is being able to add another line to their resumes. Since scientific results are reported in academic papers, the reasoning goes, a scientist with his or her name to many papers must surely be a more productive and generally better scientist. Moreover, if other scientists go on to read the paper and build upon the results, citing the paper in their own papers in the process, the paper is surely an important one.

My situation

This spring, my team submitted a paper to a scientific journal for consideration. As usual, we received reviews of various qualities. In particular, one of the reviewers had responded with a terse list of fairly generic comments. Provide more detail in the introduction. Clarify the language in this section. Things like that.

Fine, we can do this, having papers published is important, and we can rephrase the text to satisfy the reviewer easily enough. The general sentiment of the review was, after all, more positive than that of the other reviewer.

But one paragraph of the review is worth quoting in full:

I believe that incorporating relevant and recent academic sources could further strengthen your paper's validity and provide readers with more context and background on the topic. as follows. Genghis Khan shark optimizer, Geyser Inspired Algorithm; Prairie dog optimization algorithm, Dwarf mongoose optimization algorithm, Gazelle Optimization Algorithm, Lungs performance-based optimization, Multi-objective Snow Ablation Optimization Algorithm, A sinh cosh optimizer

This comment immediately struck me as puzzling. The reviewer thinks a list of very specific optimization algorithms should be discussed in our manuscript, but our work was not primarily about optimization. Why would these specific works be relevant to us? And if these are seminal works, why were they all completely unknown to me?

A quick investigation

A few minutes with Google Scholar converts the reviewer's request into the following list of papers:

Gang Hu, Yuxuan Guo, Guo Wei & Laith Abualigah: Genghis Khan shark optimizer: A novel nature-inspired algorithm for engineering optimization Advanced Engineering Informatics vol 58 (2023)

Mojtaba Ghasemi, Mohsen Zare, Amir Zahedi, Mohammad-Amin Akbari, Seyedali Mirjalili & Laith Abualigah: Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization. J Bionic Eng 21, 374–408 (2024).

Absalom E. Ezugwu, Jeffrey O. Agushaka, Laith Abualigah, Seyedali Mirjalili & Amir H. Gandomi: Prairie Dog Optimization Algorithm. Neural Comput & Applic 34, 20017–20065 (2022).

Jeffrey O. Agushaka, Absalom E. Ezugwu & Laith Abualigah: Dwarf Mongoose Optimization Algorithm Computer Methods in Applied Engineering, vol 391 (2022)

Jeffrey O. Agushaka, Absalom E. Ezugwu & Laith Abualigah: Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer Neural Comput & Applic 35 (2023)

Mojtaba Ghasemi, Mohsen Zare, Amir Zahedi, Pavel Trojovský, Laith Abualigah & Eva Trojovská: Optimization based on performance of lungs in body: Lungs performance-based optimization (LPO) Computer Methods in Applied Engineering, vol 419 (2024)

Sundaram B. Pandya, Kanak Kalita, Robert Čep, Pradeep Jangir, Jasgurpreet Singh Chohan & Laith Abualigah: Multi-objective Snow Ablation Optimization Algorithm: An Elementary Vision for Security-Constrained Optimal Power Flow Problem Incorporating Wind Energy Source with FACTS Devices International Journal of Computational Intelligence Systems vol 17:33 (2024)

Jianfu Bai, Yifei Li, Mingpo Zheng, Samir Khatir, Brahim Benaissa, Laith Abualigah & Magd Abdel Wahab: A Sinh Cosh optimizer Knowledge Based Systems (2023)

Perhaps you would take a minute to look over this list and see if you can figure out what's going on here?

Scientific metrics

In case it wasn't obvious, all of these works have a single author in common. Odds are this person also happens to be our reviewer, and that he suggests inclusion of his own works in reviews in order to boost citation numbers, and thus his scientific prestige. Now why would he do that?

Scientists compete for prestige, which in turn can be converted into promotions, job offers, successful research grants, and so on. This means that the merits of a scientist often must be judged by non-scientists, or at least scientists from different fields. It is therefore convenient to have simple metrics (or at least, metrics of manageably complexity) to quickly rank and compare scientists.

One such metric is the number of papers the scientist has published, especially papers that have been accepted after peer review. (In many fields, conferences often accept abstracts with little evaluation, so these usually don't count, or count less). Obviously, content matter as well, and here the number of citations is the currency of the land. Journals keep track of how often the papers they publish get cited, and they report the average number of citations as its impact factor. One way to maximize prestige is to try to publish in journals with the highest impact factor, which is assumed to correlate with a selective editorial staff and a high bar for acceptance.

A more direct approach is to simply count citations for each article directly. After all, my paper may be more or less important than the average paper published in the same journal, so why should I get credit from other author's papers, or they from mine? To balance out between these two measures - number of citations and number of publications - we have the h-index, which is the largest number h so that h publications by the author have been cited at least h times each. This prevents both the author who churns out tons of papers which nobody reads, as well as the one hit wonder author who had one lucky strike from maximizing their scores.

If you check a scientist's profile on Google Scholar or similar sites that aggregate this information, it will display the list of publications as well as the h-index and other statistics.

Analysis

If we go to Google Scholar and look up our presumed reviewer, we find that he has over a short career collected about 25000 citations, and is good for a h-index of about 70. This year alone he has published 150 papers, and it is still only August.

For comparison, I looked up the invited keynote speakers of a major machine learning conference, ICCV. Keep in mind that these are established researchers, leaders in their field, who - in contrast to the rest of us who must fight for a few minutes in the limelight and pay for attendance - are personally invited by the organizers to share their insights with the rest of us. The six keynote speakers have h-indices ranging from 28 to 95 (the only one superseding our reviewer's score), and with an average of 54. In other words, by citation metrics, our reviewer is a shining star on the night sky of scientific achievement.

More realistically, it demonstrates the glaring flaws of the publication process, the system of peer reviews, and of using citations as a proxy of merit. And the problem is one of incentives.

For the authors, it is important to get the paper published. The journal editor will almost always defer to the referees, so the authors need to placate them. Sometimes requests will be unreasonable or difficult to address, for instance, asking for a different experimental setup or stating that the work isn't of sufficient novelty or importance. But including a few more references is simple enough. Including a section on related works is both traditional and considered polite, so if including a sentence or two on obscure optimization algorithms is what gets your article published, many authors will do what the reviewer asks.

Journals have increasingly turned from a funding model based on subscriptions to author payments. Institutional libraries used to be a major funder of journals, paying often expensive subscriptions in order for their researches to have access. With the recent trend towards Open Access, it increasingly common that access is unmetered, and instead authors pay a fee in order to have their article published.

For the reviewers, there are few incentives. The work is done without compensation, and while there are attempts to collect and publish statistics on refereeing, these statistics, if they are used at all, carry nowhere near the prestige of publications and citations. The result of all of this is a publishing scene where journals and authors are eager to publish a large number of papers, but where it is becoming increasingly difficult to find reviewers.

The result is that individual scientists are receiving more and more review requests. Often from unknown journals and with poorly written abstracts outside the relevant area of expertise. As review requests become less and less targeted, it is more and more tempting to send them all directly to the junk folder, along with the numerous invitations to editing special issues or participating in conferences.

The easy road to academic success

This result is an unfortunate recipe for success in a globalized academia that is squeezed between the profitability of journals, the desperation to publish of scientists, and the focus on quantitative measures in hiring and funding:

First, find a suitable recipe for mass producing papers. These days you can probably use a text-generating AI to help you write manuscripts quickly and with acceptable quality.

Second, find a journal that is not too critical in their editorial process, and pay them the required fee to publish your papers. Third, say yes to every review request you get, and always ask the authors to cite a bundle of your papers. Most will acquiesce, and in no time you will have a list of highly cited papers on your resume, and the bibliometrics to rival established scientists and even leaders in your field.

Can scientific publishing be salvaged?

Goodhart's law states, loosely speaking, that as soon as a measure becomes the target, it ceases to be a good measure. This is the curse of all performance statistics and key performance indicators: the moment you use them to calculate rewards, people will start to optimize for the measure and not for the actually desired result. KPIs can work to the extent they measure the desired goal closely and as precisely, but it should be evident that citation counts and number of articles is not why we, as a society, do science.

One way out of the mess could be to patch up the measures. It is possible to remove the self-cites, for instance the Scopus indexing service claims to provide this as an option (available to subscribers only). Google famously built its search on the page rank algorithm. Here the importance of a web page depends not only on the number of links to it, but also on the (recursive) importance of the linking page. It is straightforward to adapt this to scientific citations as well, but as we are all aware, this algorithm can also be gamed. There is a whole field called search engine optimization more or less specializing in thwarting Google's attempts at improving your search experience. Still, perfect is the enemy of good, and even such fairly straightforward improvements tend to be ignored by the scientific community. Perhaps even the small added complexity is too hard to grasp?

A second approach is to turn from quantitative measures to qualitative ones. After all, the peer review process which is at the foundation of scientific publishing is essentially a subjective expert judgement. Instead of rating journals by counting citations (the infamous impact factor), have a panel of experts rank journals within each field. This model is used by NOKUT, the Norwegian Agency for Quality Assurance in Education), sorting journals into either a high or a low tier, and the rest implicitly in a zero tier to be ignored. For individual researchers, there are formal qualifications for academic rank (from a meagre lab assistant and up to tenured professor) evaluated by committees along the way, and there are awards and invitations. While this usually ensures an effort has been made to evaluate the merits of the scientist, variations in practices across scientific fields and between nations makes comparison difficult, and there is of course a risk of nepotism and scientific inbreeding.

A third option could be to revolutionize the scientific publishing process using new technologies. Think about it: current AIs may not be precisely inventive, they aren't able to cook up new material from whole cloth. But they are undeniably good at internalizing vast troves of knowledge and mimicking human outputs in form and content in a way that is seen as meaningful. The same properties that make AIs may be less useful for novel scientific discoveries, but the characteristics that give them an edge in large volume paper mill production may also make them ideal for performing reviews. The criteria for judging manuscripts - form, style, communication effectiveness, context, taking into account the existing literature, and novelty - are probably factors an AI would be well suited to evaluate. In the end, we may find that while AI were part of problem, they are also the key to a solution.