Open Call: Machine translation evaluation in the context of scholarly communication (proposals invited by Dec 23, 2022) | OPERAS

In 2020, the French Ministry of Higher Education and Research (MESR) launched the Translations and Open Science project with the aim to explore the opportunities offered by translation technologies to foster multilingualism in scholarly communication and thus help to remove language barriers according to Open Science principles.

During the initial phase of the project (2020), a first working group, made up of experts in natural language processing and translation, published a report suggesting recommendations and avenues for experimentation with a view to establishing a scientific translation service combining relevant technologies, resources and human skills.

Once developed, the scientific translation service is intended to:

address the needs of different users, including researchers (authors and readers), readers outside the academic community, publishers of scientific texts, dissemination platforms or open archives;
combine specialised language technologies and human skills, in particular adapted machine translation engines and in-domain language resources to support the translation process;
be founded on the principles of open science, hence based on open-source software as well as shareable resources, and used to produce open access translations.

Project Goals

In order to follow up on recommendations and lay the foundation of the translation service, the OPERAS Research Infrastructure was commissioned by the MESR to coordinate a series of preparatory studies in the following areas:

Mapping and collection of scientific bilingual corpora: identifying and defining the conditions for collecting and preparing corpora of bilingual scientific texts which will serve as training dataset for specialised translation engines, source data for terminology extraction, and translation memory creation.
Use case study for a technology-based scientific translation service: drafting an overview of the current translation practices in scholarly communication and defining the use cases of a technology-based scientific translation service (associated features, expected quality, editorial and technical workflows, and involved human experts).
Machine translation evaluation in the context of scholarly communication: evaluating a set of translation engines to translate specialised texts.
Roadmap and budget projections: making budget projections to anticipate the costs to develop and run the service.

The four preparatory studies are planned during a one-year period as of September 2022. 

The present call for tenders only covers the (3) Machine translation evaluation in the context of scholarly communication.

Global impact or national accessibility? A paradox in China’s science | SpringerLink

Abstract:  During the past decades, Chinese science policy has emphasized the international dissemination of research. Such policies were associated with exponential growth of English-language publications and have led China to become the largest contributor to international scientific literature. However, due to the paywalls and language barriers, China’s international publications are less accessible to local Chinese scholars, which suggests that the dissemination to the international scientific community may come at the expense of dissemination to the local Chinese community. This paper investigates the local accessibility of China’s international publications and finds that publishing internationally limits the visibility of Chinese research for the national Chinese scientific community, and the restriction is even worse for immediate access.

 

Less ‘prestigious’ journals can contain more diverse research, by citing them we can shape a more just politics of citation. | Impact of Social Sciences

“The ‘top’ journals in any discipline are those that command the most prestige, and that position is largely determined by the number of citations their published articles garner. Despite being highly problematic, citation-based metrics remain ubiquitous, influencing researchers’ review, promotion and tenure outcomes. Bibliometric studies in various fields have shown that the ‘top’ journals are heavily dominated by research produced in and about a small number of ‘core’ countries, mostly the USA and the UK, and thus reproduce existing global power imbalances within and beyond academia.

In our own field of higher education, studies over many years have revealed persistent western hegemony in published scholarship. However, we observed that most studies tend to focus their analysis on the ‘top’ journals, and (by default) on those that publish exclusively in English. We wondered if publication patterns were similar in other journals. So, we set out to compare (among other things) the author affiliations and study contexts of articles published in journals in the top quartile of impact (Q1), with those in the bottom quartile of impact (Q4)….”

Capacity-building for institutional open access publishing across Europe

“Projects are expected to contribute to the following expected outcomes:

Improved understanding of the current landscape of institutional scientific publishing activities across Europe.
Coordination amongst institutional publishing services and initiatives across Europe at the non-technological level and improve their overall service efficiency, in particular in a multilingual environment.
Actionable recommendations for strategies regarding institutional publishing in research performing organisations across the European Research Area.

These targeted outcomes in turn contribute to medium and long-term impacts:

Increased equity, diversity and inclusivity of open science practices in the European Research Area.
Increased capacity in the EU R&I system to conduct open science and set it as a modus operandi of modern science.

Scope:

Recent years have witnessed a sharp increase in open access publishing activities. Commercial scientific publishers and other service providers have turned their attention to open access publishing, responding to increased demand for open access by funders and research performing organisations. Research institutions have also developed their own open access publishing activities and services. These are either new and based on open access publishing, or are existing publishing activities transitioning into the new digital and open access environment. Libraries are often involved, while new types of mission-driven open access university presses are also emerging in Europe and beyond. Such initiatives do not require article fees for publishing, and are often supported by their institutions. They enable open access publishing of journals and other types of outcomes in various languages and are important in supporting multilingualism in Europe. At the same time, they often have not gained the prestige bestowed on established publishing venues, usually produced in collaboration with well-known commercial scientific publishers. Moreover, institutional publishing in the social sciences and the humanities is often in languages other than English, which is both an asset and a limitation….”

Recalibrating the Scope of Scholarly Publishing: A Modest Step in a Vast Decolonization Process | SciELO Preprints

Khanna , S., Ball, J., Alperin, J. P., & Willinsky, J. (2022). Recalibrating the Scope of Scholarly Publishing: A Modest Step in a Vast Decolonization Process. In SciELO Preprints. https://doi.org/10.1590/SciELOPreprints.4729

Abstract: By analyzing 25,671 journals largely absent from journal counts and indexes, this study demonstrates that scholarly communication is more of a global endeavor than is commonly credited. These journals, employing the open source publishing platform Open Journal Systems (OJS), have published 5.8 million items and represent 136 countries, with 79.9 percent publishing in the Global South and 84.2 percent following the OA diamond model (charging neither reader nor author). More than half (54.6 percent) of the journals operate in more than one language, while publishing research in 60 languages (led by English, Indonesian, Spanish, and Portuguese). The journals are distributed across the social sciences (45.9 percent), STEM (40.3 percent), and the humanities (13.8 percent). For all their geographic, linguistic, and disciplinary diversity, the Web of Science indexes 1.2 percent of the journals and Scopus 5.7 percent. On the other hand, Cabells Predatory Reports includes 1.0 percent of the journals, while Beall lists 1.4 percent of them as predatory. A recognition of the expanded scope and scale of scholarly publishing will help ensure that humankind takes full advantage of what is increasingly a global research enterprise.

 

Learned Societies and Responsible Research: Results of the survey for the TSV member societies | Tieteellisten seurain valtuuskunta

Abstract:  The Federation of Finnish Learned Societies studied its member societies’ activities related to responsible research in connection to open science, research integrity and research evaluation. In addition to these areas, the assessment covered the societies’ scientific activities and activities promoting societal impact, as well as the effects of the coronavirus pandemic on the societies’ ability to operate. The material was gathered through a survey carried out in November 2021. A total of 116 member societies, representing various fields, responded to it.

 

Open access books: A global preference for regional subjects | Impact of Social Sciences

For many research disciplines English functions as the global language for research. But, how far does this align with patterns of research use globally? Drawing on download evidence from the OAPEN library of open access books, Ronald Snijder explores this global demand for open research and finds significant demand for regional research and research published in languages other than English.

A call for volunteers: German, Korean, Portuguese, Turkish – DOAJ News Service

“DOAJ has a network of skilled, voluntary Associate Editors and Editors who spend a few hours a week processing new journal applications. Would you like to join us? We are now recruiting volunteers who understand German, Korean, Portuguese and Turkish. (You do not have to be a native speaker.) You must also be proficient in written and spoken English.

As a DOAJ volunteer, you will do a few hours of voluntary, unpaid work a week. You will receive training materials to help you carry out your duties. Your work will directly contribute to the quality, reputation, and prominence of open access scholarly publishing around the globe….”

Preprints as a Language-Editing Funnel | Jeff Pooley

Preprint platform Research Square exists to drive business to English-language editing factory American Journal Experts (AJE), which launched the platform in 2018. Preprint authors receive a Language Quality Score, and are then shilled to spend hundreds of dollars on AJE services:

What does my Language Quality Score mean? AJE used machine learning to develop a tool that assesses your language quality. The model was trained using more than 100,000 academic papers in all areas of study that had been scored by professional editors based on the quality of English. Your Language Quality Score reflects how the quality of English in your paper compares to the other papers in our dataset. Scores take into account all aspects of readability in English, including grammar, consistency, and clarity.

This is grim stuff: leveraging English-language hegemony to squeeze Global South scholars, by way of preprinting’s corporate capture.

Exactly no one should be surprised that Springer Nature acquired a majority stake in Research Square/AJE in 2018, the year the preprint platform launched.

[…]

 

Open Research in the Humanities | Unlocking Research

“The Working Group on Open Research in the Humanities was chaired by Prof. Emma Gilby (MMLL) with Dr. Rachel Leow (History), Dr. Amelie Roper (UL), Dr. Matthias Ammon (MMLL and OSC), Dr. Sam Moore (UL), Prof. Alexander Bird (Philosophy), and Prof. Ingo Gildenhard (Classics). We met for four meetings in July, September, October and December 2021, with a view to steering and developing services in support of Open Research in the Humanities. We aimed notably to offer input on how to define Open Research in the Humanities, how to communicate effectively with colleagues in the Arts and Humanities (A&H), and how to reinforce the prestige around Open Research. We hope to add our perspective to the debate on Open Science by providing a view ‘from the ground’ and from the perspective of a select group of humanities researchers. These disciplinary considerations inevitably overlap, in some measure, with the social sciences and indeed some aspects of STEM, and we hope that they will therefore have a broad audience and applicability.

Academics in A&H are, in the main, deeply committed to sharing their research. They consider their main professional contribution to be the instigation and furthering of diverse cultural conversations. They also consider open public access to their work to be a valuable goal, alongside other equally prominent ambitions: aiming at research quality and diversity, and offering support to early career scholars in a challenging and often precarious employment landscape.  

Although A&H cover a diverse range of disciplines, it is possible to discern certain common elements which guide their profile and impact. These common elements also guide the discussion that follows….”

Language Diversity in Scholarly Publishing – COKI

“we have mapped the 122 million objects in Crossref up to the end of May 2022 to languages (based on titles and abstracts, where available) and done an initial analysis. The results are a mix of the expected and surprising….

Not surprisingly, English dominates the literature (although with a slowly dropping proportion) with other European languages following including German, French, Spanish and then Portuguese, with Bahasa Indonesian as the next largest language. Spanish and Portuguese grew strongly over the period with Portuguese growing from around 7,000 outputs captured in 2000 to over 150,000 in 2021, reflecting the rise of Brazil as a research powerhouse, and the effectiveness of SciELO as a dissemination platform over that period. Indonesian shows massive growth, probably in part reflecting improved coverage of Crossref metadata over this period along with the massive growth of Indonesian publishing efforts….

Open access shows substantial differences across languages. Perhaps even more importantly, our ability to classify open access types is leading to issues across different languages. Indonesian is a great example. Currently we use DOAJ as the marker of a “completely OA journal” (and we differ from Unpaywall in this at the moment). Many Indonesian journals are not in DOAJ and therefore show as “hybrid”. Unpaywall is also not always able to pick up license information so full OA journals that are not in DOAJ may also get characterized as “bronze”. In Portuguese it is likely that a large proportion of “hybrid” is actually fully OA journals published through SciElO. Categories of open access publishing in Hungarian, Polish, Turkish and many other languages are also likely to need closer examination. We used DOAJ to identify non-APC journals as well and this is likely undercounting this category for Indonesian, Turkish, Portuguese and Spanish outputs. 

Nonetheless, we observe high proportions of articles in non-APC journals in Spanish and Portuguese (attesting to the success of the diamond OA model in Latin America), as well as in a number of other languages, including Nordic languages, many Eastern European languages, and others. Overall, when looking at 2020-2022, for English articles in DOAJ journals, 21% are in non-APC journals, but for articles in languages other than English, this percentage is a massive 86%. Non-APC models appear to dominate the landscape for non-English full OA journals. And amongst English language articles in OA journals (as defined by registration in DOAJ) the APC model definitely dominates. As is often the case, innovation rooted in community needs is more common away from traditional centres of prestige.

Some countries with high levels of open access in English have comparatively low levels in the local language. This is the case for the Netherlands and to some extent France and Germany. This is most likely related to disciplinary differences in what is published in English (with a bias towards STEM and higher levels of OA) vs local language (with a bias towards HSS subjects). By contrast Nordic languages and Norwegian in particular show high levels of open access with an emphasis on APC-free OA journals, likely as a result of local initiatives to fund the conversion of national language journals (which tend to focus on HSS) to open access. The Hr?ak central portal providing support for Croatian journals is another example with Croatian also showing a similar pattern….”

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features | Scientific Data

Abstract:  The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

Shedding light on linguistic diversity and its evolution | Max-Planck-Gesellschaft

“Scholars from the Max Planck Institute for Evolutionary Anthropology in Germany and the University of Auckland in New Zealand have created a new global repository of linguistic data. The project is designed to facilitate new insights into the evolution of words and sounds of the languages spoken across the world today. The Lexibank database contains standardized lexical data for more than 2000 languages. It is the most extensive publicly available collection compiled so far….”

Impact of Open Research: Challenges and Opportunities in the ‘Scientific Periphery’ Call for Abstracts.

“Professor Nielius Boshoff at Stellenbosch University and Dr Lai Ma are co-editing a special issue, Impact of Open Research: Challenges and Opportunities in the ‘Scientific Periphery’, to be published in Online Information Review. They invite contributions to address the impact of the open research agenda on research and scholarship in the scientific periphery, including topics such as epistemic injustice, epistemic diversity, multilingualism, decolonisation, knowledge practices, publication practices, research infrastructure and scholarly communication. The open research agenda can also be more widely interpreted, beyond open access, to address issues related to open research data and open peer review….”