A slide presentation by Aaron Tay.
Category Archives: oa.predictions
Journal of Medical Internet Research – Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
Abstract: Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles.
Objective:
We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts.
Methods:
PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile.
Results:
There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression.
Conclusions:
Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.
The Future of the Monograph in the Arts, Humanities and Social Sciences: Publisher Perspectives on a Transitioning Format | SpringerLink
Abstract: A web-based survey of academic publishers was undertaken in 2021 by a team at Oxford International Centre for Publishing into the state of monograph publication in the arts, humanities, and social sciences. 25 publishing organisations responded, including many of the larger presses, representing approximately 75% of monograph output. Responses to the survey showed that the Covid 19 pandemic has accelerated the existing trend from print to digital dissemination and that Open Access (OA) titles receive substantially greater levels of usage than those published traditionally. Responses also showed that for most publishers OA publication stands at under 25% of output and that fewer than 10% of authors enquire about OA publication options. Continuing problem areas highlighted by respondents were the clearing of rights for OA publication and the standardisation of title and usage metadata. All responding organisations confirmed that they expect to be publishing monographs in ten years’ time, but that they anticipate the format and/or the model will be different, with open access expected to play a key part in the future, perhaps in the context of a mixed economy of OA and ‘toll access’ publication.
Tackling overpublishing by moving to open-ended papers | Nature Materials
“Regarding the future of publishing, we suggest that its current rapid expansion should result in a phase transition, eventually offering new opportunities for research communication. A fast evolution towards data and code sharing, open-access publishing and the widespread use of preprints seems to be just the beginning. Below we outline our view on the paradigm shift in publishing that we think will benefit the scientific community.
First, we can make it easy to track scientific progress and reduce overpublishing by moving to open-ended and stackable publications instead of publishing multiple papers for each research direction. For example, instead of ten papers published on one line of research, a scientist can prepare a single study where each piece (‘chapter’) can be stacked with or inserted into the previous piece. A similar approach is implemented on Github where codes can be updated and expanded; or on Jupyter where the data, analysis and text can be published on a single page (with more chapters being added as the study develops further). Importantly, Jupyter notebooks are free and do not charge for open access as most publishers do, pointing towards a possible solution for reduced publishing fees….”
The curious internal logic of open access policymaking – Samuel Moore
“This week, the White House Office of Science and Technology Policy (OSTP) declared 2023 its ‘Year of Open Science‘, announcing ‘new grant funding, improvements in research infrastructure, broadened research participation for emerging scholars, and expanded opportunities for public engagement’. This announcement builds on the OSTP’s open access policy announcement last year that will require immediate open access to federally-funded research from 2025. Given the state of the academic publishing market, and the tendency for US institutions to look towards market-based solutions, such a policy change will result in more article-processing charge payments and, most likely, publishing agreements between libraries and academic publishers (as I have written about elsewhere). The OSTP’s policy interventions will therefore hasten the marketisation of open access publishing by further cementing the business models of large commercial publishers — having similar effects to the policy initiatives of European funders.
As the US becomes more centralised and maximalist in its approach to open access policymaking, European institutions are taking a leaf out of the North American book by implementing rights retention policies — of the kind implemented by Harvard in 2008 and adopted widely in North America thereafter. If 2023 will be the ‘year of open science’ in the USA, it will surely be the year of rights retention in Europe. This is largely in response to funders now refusing to pay APCs for hybrid journals — a form of profiteering initially permitted by many funders who now realise the errors of their ways. With APC payments prohibited, researchers need rights retention to continue publishing in hybrid journals while meeting their funder requirements….”
On the culture of open access: the Sci-hub paradox | Research Square
Abstract: Shadow libraries have gradually become key players of scientific knowledge dissemination, despite their illegality in most countries of the world. Many publishers and scientist-editors decry such libraries for their copyright infringement and loss of publication usage information, while some scholars and institutions support them, sometimes in a roundabout way, for their role in reducing inequalities of access to knowledge, particularly in low-income countries. Although there is a wealth of literature on shadow libraries, none of this have focused on its potential role in knowledge dissemination, through the open access movement. Here we analyze how shadow libraries can affect researchers’ citation practices, highlighting some counter-intuitive findings about their impact on the Open Access Citation Advantage (OACA). Based on a large randomized sample, this study first shows that OA publications, including those in fully OA journals, receive more citations than their subscription-based counterparts do. However, the OACA has slightly decreased over the seven last years. The introduction of a distinction between those accessible or not via the Sci-hub platform among subscription-based suggest that the generalization of its use cancels the positive effect of OA publishing. The results show that publications in fully OA journals (and to a lesser extent those in hybrid journals) are victims of the success of Sci-hub. Thus, paradoxically, although Sci-hub may seem to facilitate access to scientific knowledge, it negatively affects the OA movement as a whole, by reducing the comparative advantage of OA publications in terms of visibility for researchers. The democratization of the use of Sci-hub may therefore lead to a vicious cycle against the development of fully OA journals.
event: The next 10 years of Open Data, 13th December 2022 | Digital Science
“As 2022 draws to a close, join us for a Figshare webinar that looks ahead to the next 10 years of open data. What should the roadmap of open data uptake look like in academia? Figshare celebrated their 10th anniversary in 2022 and have been reflecting on 10 years of providing leading repository software to universities, publishers, funders, government agencies, pharmaceutical organizations, labs and more. As we embark on the next phase of our journey, this webinar will take stock of the current landscape of Open Data and what the coming years could bring for Figshare and the community as a whole. 2022 also saw the so-called ‘seismic’ OSTP memo and in January 2023, the NIH’s new Data Management and Sharing Policy will take full effect. During our webinar we’ll discuss the rise of national and international open data mandates and what they mean for publishers, universities and importantly researchers themselves….”
Nature’s Take: what’s next for the preprint revolution
“In this first episode of Nature’s Take, we get four of Nature’s staff around microphones to get their expert take on preprints. These pre-peer-review open access articles have spiked in number over recent years and have cemented themselves as an integral part of scientific publishing. But this has not been without its issues.
In this discussion we cover a lot of ground. Amongst other things, we ask whether preprints could help democratise science or contribute to a loss of trust in scientists. We pick apart the relationship between preprints and peer-reviewed journals and tackle some common misconceptions. We ask how preprints have been used by different fields and how the pandemic has changed the game. And as we look to the future, we ask how preprints fit into the discussion around open access and even if they could do away with journals all together….”
What to expect from post-pandemic publishing – Research Professional News
“Luckily for the world, as the world’s scientists grappled to understand Covid-19, the publishing situation is very different to Sars. The Covid-19 pandemic prompted what Barbour calls “an outpouring of research”, and most of it was rapidly available online and on preprint servers.
This time around scientists were able to disseminate early data and release initial findings in preprints, publications which are not peer reviewed and are a relatively recent innovation in the research landscape. Traditional journal publishing processes could not keep pace with the pandemic.
Post-Covid, says Barbour, publishing should be heading for a permanent change.
“My view is that the pandemic has reinforced [the view] that traditional journals on their own can’t respond to the rapid flow of information that’s needed in an emergency,” she says. “Traditional journals will have a role in that system, but it’s a limited one and should not be the dominant method.”
The tide appears to be turning in favour of novel forms of academic publishing. In December 2021, the Australian Research Council performed a major U-turn and uncancelled 32 applicants who had been disqualified from entry to the ARC Future Fellowships and Discovery Early Career Researcher Awards because their applications contained references to preprints.
If this is progress, though, there are already questions about whether it can be maintained….”
Death of academic journal greatly exaggerated, says ERC president | Times Higher Education (THE)
“Publishing in highly selective journals will remain important to
scientists in future because academics will always recognise the value
added by scholars attached to such publications, the new president of
the European Research Council has said.
Dismissing predictions that traditional scholarly publishers will not
be needed in the near future as preprint and other open access
platforms grow in popularity, Maria Leptin said she did not foresee a
world without journals.
Even in decades to come, researchers “will still be submitting
articles for peer review in the same way as they do now”, said
Professor Leptin, who took over the European Union’s research funder
in November, having been director of the European Molecular Biology
Organization (EMBO), which publishes a select number of journals,
since 2010….”
Perspectives on Open Science and The Future of Scholarly Communication: Internet Trackers and Algorithmic Persuasion | Research Metrics and Analytics
The current digital content industry is heavily oriented towards building platforms that track users’ behaviour and seek to convince them to stay longer and come back sooner onto the platform. Similarly, authors are incentivised to publish more and to become champions of dissemination. Arguably, these incentive systems are built around public reputation supported by a system of metrics, hard to be assessed. Generally, the digital content industry is permeable to non-human contributors (algorithms that are able to generate content and reactions), anonymity and identity fraud. It is pertinent to present a perspective paper about early signs of track and persuasion in scholarly communication. Building our views, we have run a pilot study to determine the opportunity for conducting research about the use of “track and persuade” technologies in scholarly communication. We collected observations on a sample of 148 relevant websites and we interviewed 15 that are experts related to the field. Through this work, we tried to identify 1) the essential questions that could inspire proper research, 2) good practices to be recommended for future research, and 3) whether citizen science is a suitable approach to further research in this field. The findings could contribute to determining a broader solution for building trust and infrastructure in scholarly communication. The principles of Open Science will be used as a framework to see if they offer insights into this work going forward.
Strong open access growth set to continue – report | Research Information
“Around a third of all global research articles are now published open access, according to a new report from the STM association. Recent strong growth in OA publishing is projected to continue – with some countries, such as the UK, on track for 90 per cent of their researchers’ output to be published OA within a year due to business model and operational innovations.
STM (the Association of Scientific, Technical and Medical Publishers) published the latest edition of The STM Report, the organisation’s overview of the scientific and scholarly publishing market. The revised report, which adopts a new supplement format to be issued in regular thematic updates, reveals significant publisher-driven growth in OA and ‘continued dynamism’ in the scholarly communication ecosystem.
For the past 15 years, the STM Report has provided data and analysis for all involved in the global activity of research, highlighting and exploring the trends, issues and challenges facing scholarly publishing. The latest edition in the series: ‘STM Global Brief 2021 – Economics and market size’ provides an update on the size and shape of scholarly publishing and offers the latest global market values for the industry across scientific and technical, medical, and social sciences and humanities fields….”
Content at Scale – The Third Wave – The Scholarly Kitchen
“Third Wave – 2020s – AI and Open Content
This decade will see the tipping point reached for open research content between the [top down] expansion of OA initiatives from commercial publishers and the [bottom up] support for Open Science efforts from within the academy. Having more content freely available and more content on the same platforms enables large scale analyses. The economic models are shifting from the value of the content at the unit level to the deployment of tools to uncover intelligence in a large body of content….”
Universities without walls: A vision for 2030
“Open Science, making research accessible to all, will be the default way of producing knowledge. Universities will support a diverse non-commercial publishing system and will, themselves, be directly involved in such a system, by promoting and supporting non-commercial and smaller publishing initiatives. Data and other outputs resulting from research will be made FAIR (Findable, Accessible, Interoperable, Reusable). Scientists will be adequately rewarded for the processing and publishing of data. Europe’s scholarly information infrastructure will facilitate cross-border, multidisciplinary research with advanced digital services and tools….”
Universities without walls: A vision for 2030
“Open Science, making research accessible to all, will be the default way of producing knowledge. Universities will support a diverse non-commercial publishing system and will, themselves, be directly involved in such a system, by promoting and supporting non-commercial and smaller publishing initiatives. Data and other outputs resulting from research will be made FAIR (Findable, Accessible, Interoperable, Reusable). Scientists will be adequately rewarded for the processing and publishing of data. Europe’s scholarly information infrastructure will facilitate cross-border, multidisciplinary research with advanced digital services and tools….”