Databases in Inorganic Chemistry from a Publication Statistics Perspective

Introduction. At the ACS National Meeting in San Diego, I gave two talks. The theme of the conference was "Computers in Chemistry". Therefore, my first talk was dealing with the usefulness of local order parameters that we have recently developed in our NaCl nucleation study for conducting database-related chemical science. When I was thinking about how to concretely motivate our efforts, I figured that a convincing argument will be found in the two most important scientific currencies: data and publications.

Methodology. So, I went to Thomson Reuters' Web of ScienceTM on March 4, 2016. I used their Core Collection database to perform an advanced search that was restricted to the "Chemistry, Inorganic & Nuclear" subject area because my talk was in a session of the Division of Inorganic Chemistry. Starting from 1985, I determined two publication numbers per year: first, the total number of articles in the "Chemistry, Inorganic & Nuclear" subject area; second, the (smaller) number of articles that included the keyword "database" in the topic field. Appending a wildcard character at the end of the keyword enabled detection of wanted variations such as "databases". The results are depicted below.

Results. The number of published articles in Web of ScienceTM subject area "Chemistry, Inorganic & Nuclear" that include keyword "database" started to rise significantly around 1991 (gray boxes in above figure). Although there is some obvious fluctuation involved, the general trend is a clear increase over the past 25 years. But we all know that the global scientific output is ever increasing. A 2014 Nature News Blog by Richard Van Noorden highlighted that the output currently doubles roughly every nine years. The estimate is based on a bibliometric analysis by Lutz Bornmann and Rüdiger Mutz that aimed at identifying growth rates and phases in modern science. Hence, the more appropriate figure for rating database-related Inorganic Chemistry is in fact the fraction of articles including the keyword "database" relative to the total output in the considered Web of ScienceTM subject area for a given year. That fraction is plotted as a blue curve (right y-axis), and it closely follows the absolute number of articles with keyword "database". While the increasing trend is confirmed, we note that both curves seem to approach plateau values.

Discussion. Given the approach-to-plateau observation and a 2006 blog post by Rich Apodaca that came in the midst of the chemistry database revolution, I got even more intrigued by the data. I asked myself the question: "Where could a likely ceiling of the relative output lie?" As a trained chemical engineer, I naturally resorted to chemical kinetics, and modeled the evolution as a first-order reaction. For this reaction, the product (fraction of database articles) should increase in an exponential manner. Thus, I fitted an exponential function to the data to find (a possible) plateau value of 0.57%.

Conclusions. Clearly, the role of databases in Inorganic Chemistry has become more and more important. The future will reveal whether or not the relative scientific output in this context will reach a level close to the 0.57% plateau prediction. After all, a second "database revolution" could kick in because Bornmann and Mutz showed that growth rates of scientific output can change significantly and quite abruptly over time. So, let's see.



Acknowledgements. My research is currently financially supported through the Materials Project by the Department of Energy’s Basic Energy Sciences program under Grant No. EDCBEE. The figure has been generated with gnuplot. Finally, I like to thank Minke Zimmermann for proofreading of this blog entry.