Errors of omission and their implications for computing scientometric measures in evaluating the publishing productivity and impact of countries
Purpose ‐ The purpose of the paper is to explore the extent of the absence of data elements that are critical from the perspective of scientometric evaluation of the scientific productivity and impact of countries in terms of the most common indicators ‐ such as the number of publications, the number of citations and the impact factor (the ratio of citations received to papers published), and the effect these may have on the h-index of countries ‐ in two of the most widely used citation-enhanced databases. Design/methodology/approach ‐ The author uses the Scopus database and Thomson-Reuters' (earlier known as ISI) three citation databases (Science, Social Sciences and Arts & Humanities), both as implemented on the Dialog Information Services (Thomson ISI databases) and on the Web of Knowledge platform, known as Web of Science (WoS). The databases were searched to discover how many records they have for each year, how many of those have cited references for each year, and what percentage of the records have other essential or often used data elements for bibliometric/scientometric evaluation. Findings ‐ There is no difference between the databases in the presence of publication year data ‐ all of them include this element for all the records. The presence of the language field is comparable between the Thomson and Scopus databases, but it should be noted that a 2 per cent difference for mega-databases of such size is not entirely negligible. The rate of presence of the subject category field is better in Scopus, even though it has far fewer subject categories (27) than the Thomson databases (well over 200). The rate of absence of country identification is the most critical and disappointing. It is caused primarily by the fact that journals have not had consistent policies for including the country affiliation of the authors. The huge 34 percent omission rate of country identification in Scopus also hurts its impressive author identification feature. Unfortunately, the country information is not available in more than 12 million records. Originality/value ‐ Irrespective of the reasons for the very high rate of omission of country names or codes, it should be realised and prominently mentioned in any scientometric country reports. The author has never seen this mentioned in published papers, nor in the manuscripts that he has peer reviewed. Many can live with the low omission rates of the language, document type and subject category elements, and many can just avoid using these filters. The two factors that define the level of distortion in the assessment and ranking of the research achievements of countries are the rate of cited reference enhanced records and the rate of presence of country affiliation data.