Updating controlled vocabularies by analysing query logs: Online Information Review



– Controlled vocabularies play an important role in information retrieval. Numerous studies have shown that conceptual searches based on vocabularies are more effective than keyword searches, at least in certain contexts. Consequently, new ways must be found to improve controlled vocabularies. The purpose of this paper is to present a semi-automatic model for updating controlled vocabularies through the use of a text corpus and the analysis of query logs.



– An experimental development is presented in which, first, the suitability of a controlled vocabulary to a text corpus is examined. The keywords entered by users to access the text corpus are then compared with the descriptors used to index it. Finally, both the query logs and text corpus are processed to obtain a set of candidate terms to update the controlled vocabulary.



– This paper describes a model applicable both in the context of the text corpus of an online academic journal and to repositories and intranets. The model is able to: first, identify the queries that led users from a search engine to a relevant document; and second, process these queries to identify candidate terms for inclusion in a controlled vocabulary.


Research limitations/implications

– Ideally, the model should be used in controlled web environments, such as repositories, intranets or academic journals.


Social implications

– The proposed model directly improves the indexing process by facilitating the maintenance and updating of controlled vocabularies. It so doing, it helps to optimise access to information.



– The proposed model takes into account the perspective of users by mining queries in order to propose candidate terms for inclusion in a controlled vocabulary.

Associate University Librarian for Scholarly Communications and Publishing | VCU Libraries

“The Virginia Commonwealth University Libraries invites applications and nominations for the position of Associate University Librarian for Scholarly Communications and Publishing. The successful candidate will provide innovative, creative leadership for a newly-created division of the VCU Libraries dedicated to advancing the university’s growing engagement with contemporary scholarly communications and scholarly publishing issues….”

Harvard Law Library Readies Trove of Decisions for Digital Age – The New York Times

“Shelves of law books are an august symbol of legal practice, and no place, save the Library of Congress, can match the collection at Harvard’s Law School Library. Its trove includes nearly every state, federal, territorial and tribal judicial decision since colonial times — a priceless potential resource for everyone from legal scholars to defense lawyers trying to challenge a criminal conviction.

Now, in a digital-age sacrifice intended to serve grand intentions, the Harvard librarians are slicing off the spines of all but the rarest volumes and feeding some 40 million pages through a high-speed scanner. They are taking this once unthinkable step to create a complete, searchable database of American case law that will be offered free on the Internet, allowing instant retrieval of vital records that usually must be paid for….”

Listing of Open Access DataBases (LOADB) launched: http://loadb.org/

In the Open Access week, Dr. Girish Sahni (Director General CSIR, India), Chief Guest in the Sixteenth Foundation day celebrations of CSIR-URDIP on 24th October 2015, had launched Listing of Open Access DataBases (LOADB) – the listing of open access databases, a portal developed by CSIR URDIP showcasing the databases in multiple areas of science and technology that are available freely for public use. In his address, he lauded the work done by CSIR-URDIP and emphasized on the pivotal role the institution needs to play in aligning with the CSIR mandate of Science for Societal need.

LOADB is a service of CSIR’s Unit for Research and Development of Information Products (URDIP) located at Pune in India and is being developed for the Open Science and Open Innovation Infrastructure Project supported by CSIR at URDIP.

The objective of  LOADB is  to create a web-enabled, linked, classified and categorized collection of Open Access Databases which one can access from a single portal. Although initial focus is on science and technology subjects, the ultimate aim is to include all subject areas.

Academy of Science of South Africa (ASSAf) articles published in celebration of OA Week 2015

The Academy of Science of South Africa (ASSAf) celebrated Open Access Week 2015 through publishing three articles in The Conversation (Africa pilot). The first article tried to set the scene and highlight the issues our country (South Africa) is facing, the second article tried to highlight challenges in terms of publishing fees, and the third article comes with possible solutions on how the challenges can be addressed.

Experts from ASSAf presented papers on OA at institutions throughout the country, and entered into many discussions with publishers.

Open Access Week was ended on a high note by hosting a webinar on Open Journal Systems, presented by Kevin Stranack. ASSAf is looking forward hosting more and more journals following the golden route to Open Access.

We are excited about taking OA forward, and to build on the great progress made this far!

Here’s one way to recover and protect Africa’s ‘lost science’

Robin Crewe, University of Pretoria

It’s been 20 years since Wayt Gibbs introduced the phrase “lost science” to the world. Writing in Scientific American, Gibbs suggested that science and research from the developing world was being lost because it wasn’t shared on global platforms. He wrote:

Many researchers in the developing world feel trapped in a vicious circle of neglect and – some say – prejudice by publishing barriers (and structural obstacles) they claim doom good science to oblivion.

Not much has changed. In 2010 the Africa Institute’s Solani Ngobeni warned that library budget cuts and the rising costs of subscribing to scholarly e-resources meant research from the developing world remains largely “lost”. This science is invisible to the reading public.

This invisibility has consequences. During the 2014 Ebola outbreak, international research about the virus was not immediately available to the countries affected, which may have slowed treatment responses.

But developing countries are working hard to correct this imbalance with a homegrown Open Access research index that started life in Brazil two years after Gibbs warned the world about “lost science”.

Bringing African research to the world

Brazil established the Scientific Electronic Library Online (SciELO) portal in 1997. Today there are 14 developing countries in the SciELO network, mostly from Latin America. The platform is designed to tackle the global under-use of research publications from developing countries.

It is an open access – that is, free to access and free to publish – database of selected, high-quality scholarly journals. The full text of all articles is available rather than just an abstract. SciELO articles figure prominently in Google Scholar.

In 2009 South Africa became the first – and to date the only – African country to join the SciELO network. It was introduced by the Academy of Science of South Africa, which appreciated both its open access format and SciELO’s focus on developing countries. SciELO SA forms part of the academy’s scholarly publishing programme. The program focuses on enhancing the quality, quantity and worldwide visibility of original, peer-reviewed publications produced by researchers in South Africa.

The platform is funded by the South African Department of Science and Technology. Its journals which are listed in the SciELO Citation Index are accredited for funding purposes by the South African Department of Higher Education and Training.

A resource on the rise

To date, articles in the SciELO SA open access collection have been viewed almost three-and-a-half million times.

A resource on the rise. Google Analytics

As this graph shows, usage has climbed steadily and almost doubled over the last year. That’s significant exposure for the until recently “lost science” of South Africa.

The platform is helping to change South Africa’s research environment by providing equitable access to all researchers, globally and at home.

Some of these researchers may come from universities that don’t have access to traditional, peer-reviewed academic journals which charge high subscription fees. With SciELO SA, researchers can view, download and study information for free. To date there are 60 South African scholarly journals in the collection, and the Academy hopes this will eventually rise to more than 180.

Up next: the continent

After seven years of implementing SciELO SA, building expertise, establishing the model and enhancing the impact of the platform and journals, it is time to replicate this model in other African countries.

The Academy is working with the Network of African Science Academies to promote similar Open Access projects throughout the continent – a move that, we hope, will bring a great deal of Africa’s “lost science” to public attention.

This article was co-authored by Louise van Heerden, SciELO SA operations manager at the Academy of Science of South Africa (ASSAf) and Susan Veldsman, director of ASSAf’s Scholarly Publishing Unit

The Conversation

Robin Crewe, Professor of Zoology and Director, Centre for the Advancement of Scholarship, University of Pretoria

This article was originally published on The Conversation. Read the original article.

Why it’s getting harder to access free, quality academic research

Leti Kleyn, University of Pretoria

Academics at South Africa’s universities increased their research output by 250% between 2000 and 2013. Taxpayers funded a great deal of that research. For instance, R24 billion was spent on research and development in the 2012-13 financial year – more than half of it from the public purse.

That’s a wealth of research and knowledge. The problem is that it may not be accessible to the broader public, even though it was they who footed the bill. It may also be hard for policymakers and the private sector to access this information and apply it when developing initiatives that can help develop the country.

Why is South Africans’ access to important knowledge and research so limited? And, in the age of Open Access, what is being done to improve the situation?

The birth of a movement

It’s been more than two decades since the birth of the international Open Access movement.

The demand for access to information in an open society has grown rapidly since the 1990s, driven by the fast developing internet. Resources and movements like Creative Commons, founded in 2001; the Budapest Open Access Initiative (2002); the Bethesda Statement on Open Access Publishing (2003); the Berlin Declaration on Open Access (2003) and the Lyon Declaration on Access to Information and Development (2014) have followed.

South African universities followed international trends. They drafted Open Access policies and made available thousands of already-published journal articles and chapters from books free of charge through online platforms. They also used institutional research repositories to share “grey literature” – research not controlled by commercial publishers. This included theses and dissertations, research reports, conference proceedings and student projects.

The idea was to ensure that universities’ research outputs, which were all at least partially funded with taxpayers’ money, were made visible and accessible.

Until then, academic research was largely published and protected by international conglomerate publishers. They used online sales, library leasing and subscription fees to charge for access to research outputs.

Models change, profits don’t

The Open Access movement also saw the rise of new publishing platforms and mega journals like the Public Library of Science. It also birthed new business models for academic publishing, from the traditional journal subscription model to the Article Processing Charges (APC) or publication fee model and hybrid Open Access publishing options with traditional publishers.

Under the APC model, researchers, research funders or research institutions take responsibility for the payment of these charges, covering the journal’s costs, so that articles can be be published in an Open Access manner and be free to use.

But these changes in support of broader public access seem to have been to little avail. Publishers are maximising profits with a hybrid model of double payments, also referred to as “double dipping”. They collect Article Processing Charges from researchers to publish in an Open Access format and still collect subscription fees from users.

British higher education support body JISC conducted a study to explore this practice. It averaged the APC payment for 2014 by 20 universities in the United Kingdom at £1581. It concluded in a separate study that the overall increase in the total cost of ownership – subscription and APCs – when compared to capped subscription fees was as high at 73% at one UK institution.

The shifting model also brought with it a flood of predatory publishers, pirated academic journals and a variety of unethical research practices.

The South African story

So where does access to research stand in South Africa today? A survey by the country’s National Research Foundation revealed that only 20 of the country’s universities and three of its science councils have Open Access repositories. These repositories are used to make institutions’ research outputs publicly available while honouring existing copyright regulations.

The Academy of Science of South Africa (ASSAf) also conducted an Open Access audit of accredited journals. Only 48% of published research in local journals is free and accessible to the public.

South African institutions are fighting the same battle with publishers as their international counterparts. The results of preliminary, unpublished research by ASSAf estimated that university libraries paid around R470 million to national and international publishers for subscription fees to academic journals in 2014. These were limited for use by registered students and employees at universities only.

With the weakening rand and the implementation of a value-added tax on electronic resources, libraries claim to have lost an estimated 40% of their buying power over the last four years.

This makes it hard to continue subscribing to available research and knowledge sources and impossible to also pay APCs in support of research visibility and public access to knowledge.

A global fightback – but is it too late?

Researchers, libraries and universities have started to lobby against large academic publishing houses. There is increasing resistance to publishers who are trying to restrict access to information with stricter regulatory policies on the placement of articles in institutional repositories.

To date, these protests have had little effect on the global transition to Open Access proposed by the Max Planck Digital Library.

This makes it hard not to conclude that South Africans will in future be paying far more for knowledge – and will have even less access to it.

The Conversation

Leti Kleyn, Research Fellow and Manager, Open Scholarship Programme, University of Pretoria

This article was originally published on The Conversation. Read the original article.

Managing the Transition to Open Access Publication: EPS and EuCheMS Statement November, 2013

“In a meeting on April 5th, 2013, organised by the European Physical Society, representatives from a number of Learned Societies met in Strasbourg, France in order to discuss and formulate conditions for a transition towards Open Access publishing that both respects the need to make publically-funded research results freely available whilst at the same time maintaining peer-reviewed highquality journals, secure archiving, and a strong and successful international scientific enterprise. The present paper which has been drafted from this meeting has since benefitted from the input and support of further societies organised in the Initiative for Science in Europe. The paper is addressed to all stakeholders in scientific publication and aims to both raise awareness of a number of important issues, and to make specific recommendations that the signatories believe are necessary for Open Access publishing in science to be successfully implemented….”

Groundbreaking University of California policy extends free access to all scholarly articles written by UC employees

“Today the University of California expands the reach of its research publications by issuing a Presidential Open Access Policy, allowing future scholarly articles authored by all UC employees to be freely shared with readers worldwide. Building on UC’s previously-adopted Academic Senate open access (OA) policies, this new policy enables the university system and associated national labs to provide unprecedented access to scholarly research authored by clinical faculty, lecturers, staff researchers, postdoctoral scholars, graduate students and librarians – just to name a few. Comprising ten campuses, five medical centers, three national laboratories and nearly 200,000 employees, the UC system is responsible for over 2% of the world’s total research publications. UC’s collective OA policies now cover more authors than any other institutional OA policy to date….”

Why is the OA debate so focussed on the journal article?

Together with Prof. Jean-Claude Guédon and Ass. Prof. Thomas Wiben Jensen I have just published an article (http://dx.doi.org/10.7557/11.3619) in the new journal “Nordic Perspectives on Open Science” (http://nopos.eu).

In the article the two scholars go beyond the concept of open access and challenge the rather anachronistically way research results are being distributed among scholars today, i.e. through journal articles. 

As Jean-Claude Guédon reflects in the article:

What is striking in the debate about open access is that the notion of open access is not considered in itself; rather, it is refracted mainly by the ways in which it may affect existing dissemination tools, habits, actors, and institutions.

The existing dissemination tools are typically journals and articles. But why is the debate so focussed on the journal article as the channel of knowledge distribution?

Jean-Claude Guédon reflects:

Predictably, because journals and articles are taken to be objects located beyond critical thinking, the sought answer rests on the need to preserve the journal (and the articles it contains). Open access is no longer an objective; it is a potential threat to a familiar and comfortable situation. It is immediately viewed as disruptive. As a result, the discussion finds itself constrained within a framework where the emerging digital world is supposed to emulate the printing world, but do its copying faster, more efficiently, more accurately. This is precisely the point that must be questioned.

Departing from this question an exciting dialogue unfolds between the two scholars. Through a historical and epistemological account of the distribution of knowledge the conversation takes us at a deeper level. While acknowledging the present and historical importance of journals and articles as vehicles for the distribution of knowledge we witness the limitations of these kinds of “frozen moments” due to the lack of speed by which they are being produced and distributed, and, very importantly, also due to the nature of the article format itself.

Rather, the two scholars suggest to experiment with smaller units of intervention in an attempt to “liquefy” the scientific conversation (as has always been the ambition – the invention of the journal aimed at this, too) hence changing the focus from the product (the journal entity) to the process (the exchange of research results).

If this is going to happen, they argue, researchers with sufficient reputation to allow time for experimentation of this sort are needed as well as public funding. And perhaps, it could be argued, a common understanding and agreement that we should be looking ahead for new ways of communicating scientific results other than through the traditional channels like journals and articles is also needed.

Digitization goes far beyond just electrifying journals! Luckily, a lot experimentation is happening in the field. Yet, these experiments only make up tiny, tiny fractions of the total output of published research results.

All this is reflected in the, slightly alternative conversational, article format (!). Please, feel free to join the conversation at http://nopos.eu