“we have mapped the 122 million objects in Crossref up to the end of May 2022 to languages (based on titles and abstracts, where available) and done an initial analysis. The results are a mix of the expected and surprising….
Not surprisingly, English dominates the literature (although with a slowly dropping proportion) with other European languages following including German, French, Spanish and then Portuguese, with Bahasa Indonesian as the next largest language. Spanish and Portuguese grew strongly over the period with Portuguese growing from around 7,000 outputs captured in 2000 to over 150,000 in 2021, reflecting the rise of Brazil as a research powerhouse, and the effectiveness of SciELO as a dissemination platform over that period. Indonesian shows massive growth, probably in part reflecting improved coverage of Crossref metadata over this period along with the massive growth of Indonesian publishing efforts….
Open access shows substantial differences across languages. Perhaps even more importantly, our ability to classify open access types is leading to issues across different languages. Indonesian is a great example. Currently we use DOAJ as the marker of a “completely OA journal” (and we differ from Unpaywall in this at the moment). Many Indonesian journals are not in DOAJ and therefore show as “hybrid”. Unpaywall is also not always able to pick up license information so full OA journals that are not in DOAJ may also get characterized as “bronze”. In Portuguese it is likely that a large proportion of “hybrid” is actually fully OA journals published through SciElO. Categories of open access publishing in Hungarian, Polish, Turkish and many other languages are also likely to need closer examination. We used DOAJ to identify non-APC journals as well and this is likely undercounting this category for Indonesian, Turkish, Portuguese and Spanish outputs.
Nonetheless, we observe high proportions of articles in non-APC journals in Spanish and Portuguese (attesting to the success of the diamond OA model in Latin America), as well as in a number of other languages, including Nordic languages, many Eastern European languages, and others. Overall, when looking at 2020-2022, for English articles in DOAJ journals, 21% are in non-APC journals, but for articles in languages other than English, this percentage is a massive 86%. Non-APC models appear to dominate the landscape for non-English full OA journals. And amongst English language articles in OA journals (as defined by registration in DOAJ) the APC model definitely dominates. As is often the case, innovation rooted in community needs is more common away from traditional centres of prestige.
Some countries with high levels of open access in English have comparatively low levels in the local language. This is the case for the Netherlands and to some extent France and Germany. This is most likely related to disciplinary differences in what is published in English (with a bias towards STEM and higher levels of OA) vs local language (with a bias towards HSS subjects). By contrast Nordic languages and Norwegian in particular show high levels of open access with an emphasis on APC-free OA journals, likely as a result of local initiatives to fund the conversion of national language journals (which tend to focus on HSS) to open access. The Hr?ak central portal providing support for Croatian journals is another example with Croatian also showing a similar pattern….”