Project

Title of the project

New Model for Automatic Classification of Scientific Documents based on Metadata Vectorization

Summary

Scientific research is vital for today's societies. Therefore, not only is investment in it important but also optimization of its performance. For this reason, scientometric studies are constantly becoming ever more relevant.

In such studies, it is necessary to break down the research by scientific discipline. On the one hand, this is to report on the development of each of them, and, on the other, because citations in some disciplines are not comparable with others. For this, traditionally there has been a reliance on large databases' journal classifications. These classifications, however, present major problems for scientometric studies deriving from the fact that the journals usually publish works from various scientific disciplines, leading to most of them being assigned to more than one scientific category and to the existence of multidisciplinary categories. Consequently, each work is assigned to the journal's scientific categories even though that work is usually more specialized and should mostly be assigned to a single scientific category.

The use of these classifications therefore entails two errors. One is quantitative in breaking down the research into scientific categories, and the other is qualitative since the qualitative indicators are based on comparing the citation obtained with the works in the same category.

In this project we propose to develop a method of automatically assigning each document to one of the categories, thus solving the problem using the same taxonomy of science. To this end, we shall use the documents' metadata -- the references which, as Glänzel & Schubert (1999) said, is what best defines the document, together with keywords, title, and abstract.

For this, we shall use the Scopus database, which has a greater coverage than the WOS in non-English languages, congresses, social sciences, arts, and humanities.

We will carry out a quality study, based on precision and relevance, and comparative studies of the results with other classifications.

Funding

Grant PID2020-115798RB-I00 funded by MCIN/AEI/ 10.13039/501100011033 "ERDF A way of making Europe".