More on OA at U. Pittsburgh Press

Peter Murray, Online Editions of Out-of-Print Books Results from Library/Press Partnership at Univ of Pittsburgh, Disruptive Library Technology Jester, May 26, 2009.

… Earlier today, I had a conversation with Rush Miller, library director at the University of Pittsburgh, about the joint effort between the university press and the university library system. Cynthia Miller (press director) and Rush arrived at approximately the same time 15 years ago at the University of Pittsburgh. Over the course of that time, the two have shared many discussions about open access content. A few years ago, they established a model for working together: the press would clear the rights for books (the press generally had the rights to publish in paper, but not digital) while the libraries would digitize the books, mount them on library servers, and do the graphic design work for the online site. With this model, they mounted 15 titles from the press’ Latin American series. The libraries also supplied the Chicago Digital Distribution Center (CDDC) with the digital scans for the Bibliovault print-on-demand service. The library has seven full-time people in the digital services department, plus support from systems analysis and developers from elsewhere in the library.

They had been closely studying the usage and sales data with the trial content and had found that online access didn’t necessarily cannibalize print sales. In fact, one title sold about 100 copies last year while having near zero sales the previous few years. (Adoption for a course is the suspected reason, and the item was probably found because the digital edition was online). Books that have been out of print for 20 years are now getting use as soon as the digital editions are available.

With the initial success, the libraries and press moved forward with digitizing and mounting the 500-title backfile represented by this announcement. This was a significant effort on the part of the press to clear the rights for all of these titles — about a year’s worth of work. The partners are already looking forward to another round of titles to be digitized and mounted online. …

See also our past post on the recent announcement.

More on the access crisis: UC libraries face budget cuts of up to 20%

The University of California Libraries has released an Open Letter to Licensed Content Providers, May 26, 2009.  (Thanks to ResourceShelf.)  Excerpt:

The University of California Libraries ask all information providers with whom we negotiate content licenses to respond to the major fiscal challenges affecting higher education in California in a spirit of collaboration and mutual problem-solving. We expect to work with each of our vendors at renewal to develop creative solutions that can preserve the greatest amount of content to meet the information needs of the University of California’s students, faculty, and researchers.

The University of California Libraries, including the California Digital Library (CDL), share the economic concerns expressed in the Statement to Scholarly Publishers on the Global Economic Crisis issued by the Association of Research Libraries and the Statement on the Global Economic Crisis issued by the International Coalition of Library Consortia. The economic crisis affecting libraries is particularly acute in California….

As a state-supported institution, the University of California has experienced significant budget reductions in fiscal year 2009, with more reductions to come. The $531 million shortfall now anticipated in state funding for the 2009-10 fiscal year amounts to nearly 17 percent of the $3.2 billion the state provides UC annually. Numerous cost containment measures are in place across the university, including salary and other compensation freezes for senior managers, hiring curtailments for other staff, travel restrictions, and other mandated reductions. More information about the UC budget situation is available on the University’s Web site….

UC Libraries are being hit hard by the budget reduction mandates in effect at each of the UC campuses. Targeted reductions to library materials budgets for fiscal year 2010 vary across the campuses, with some as high as 20%. Many campuses have been alerted that additional cuts will be levied in fiscal year 2011. Coupled with the typical inflationary increases for scholarly publications, the erosion of library buying power will have a profound and lasting impact on all of the UC libraries. Monographic purchasing has already been seriously curtailed, and every electronic content license is being placed under careful scrutiny.

Comment.  In addition to the ARL and ICOLC statements mentioned in the letter, also see the statements from RIN (in March 2009) and NERL (in April 2009). 

Bringing Gutenberg ebooks to more readers

Michael Hart, New Goal Set for Project Gutenberg: One Billion Readers, Project Gutenberg News, May 24, 2009. (Thanks to ResourceShelf.)

The first goal of Project Gutenberg was simply to reach totals of estimated audiences of 1.5% of the world population, or the total of 100 million people.

With the advent of cell phone access we are now setting our goal at 15% of the world population or 1 billion.

Given that there are approximately 4.5 billion cell phones now in service around the world, that means we would have to reach just over 1/5 of all cell phone users to accomplish this. …

This has to include many more languages than English, of course, so our effort also has to be multi-lingual, if we are to reach anyone beyond the number of people comfortable enough with English to read our eBooks on their cell phones.

As many of you know, we already have well over a thousand book titles in French, followed by lesser numbers in German and the other more popular languages, but not nearly enough to really, sincerely, say we are offering a library in these languages. …

Next NIH Director: probably Francis Collins, probably soon

Francis Collins said to be contender to run NIH, Los Angeles Times, May 23, 2009.  Excerpt:

Francis S. Collins, the scientist who led the U.S. government drive to map the human genetic code, is the leading candidate to run the National Institutes of Health, a source familiar with the selection process said.

Screening for Collins, 59, is in the final stages, said the source. Collins would take over an agency that President Obama has made key to his plans for reviving the U.S. economy and overhauling healthcare. The 27 institutes and centers under the NIH umbrella employ more than 18,000 people and fund research at thousands of universities and medical schools.

The former head of the National Human Genome Research Institute, a member agency, Collins became a driving force in the race to catalog the 3 billion letters of the human genetic code. As director of the institutes, Collins will face calls to boost spending on cancer research and free science from politics as well as financial conflicts of interest.

"NIH is a huge enterprise, and I think Francis has very good experience with getting the best out of a huge enterprise from what he did in the genome project," said David Baltimore, a biology professor at Caltech who won the 1975 Nobel Prize in medicine, in a telephone interview in February. "He’s also very well liked in Congress."

Collins didn’t respond to efforts to reach him. The White House declined to comment….

Comments.  This matters for two reasons:

  1. Collins is not just a leader in mapping the human genome, but in making the results OA.  He has also defended OA at the NIH’s PubChem against anti-OA lobbying by the ACS.  Kathy Hudson, Director of the US Genetics and Public Policy Center, described Collins as "a tireless champion of data sharing and open access to scientific information…."  When Celera made its genomic data OA in 2005, Collins told the Baltimore Sun that "[t]his data just wants to be public….It’s the kind of fundamental information that has no direct connection to a product, it’s information that everybody wants, and it will find its way into the public."  Collins would be the most experienced defender of OA ever to take the reins of a US federal agency.
  2. The fact that Collins is in the final stages of vetting means that we’ll soon have an NIH Director.  The position has been vacant since Elias Zerhouni stepped down in October 2008, and the leadership vacuum has impaired the fight against the Conyers bill.  Note David Baltimore’s assessment that Collins is "very well liked in Congress." 

Also see our past posts on Collins.

Canadian cities moving on open data

City of Vancouver embraces open data, standards and source, CBC News, May 22, 2009. (Thanks to Michael Geist.) See also our past post.

Vancouver city council has endorsed the principles of making its data open and accessible to everyone where possible, adopting open standards for that data and considering open source software when replacing existing applications. …

[City councillor Andrea] Reimer had argued that supporting the motion would allow the city to improve transparency, cut costs and enable people to use the data to create new useful products, including commercial ones. She had also noted that taxpayers paid for the data to be collected in the first place. …

According to Reimer, only a few other cities such as Washington, D.C., San Francisco and Toronto have started moving toward this kind of increased openness. …

Toronto Announces Open Data Plan at Mesh09, Visible Government, April 13, 2009. (Thanks to

City of Toronto mayor David Miller announced [the city]’s plans for an open data catalouge at Mesh09 [Toronto, April 7-8, 2009] last week. Miller, who is in charge of the 6th largest government body in Canada, made a strong case for the benefits of open government data. His arguments (transcribed from video) deserve repeating:

… I am very pleased to announce today at Mesh09 the development of, which will be a catalogue of city generated data. The data will be provided in standardized formats, will be machine readable, and will be updated regularly. This will be launched in the fall of 2009 with an initial series of data sets, including static data like schedules, and some feeds updated in real time.

The benefits to the city of Toronto are extremely significant. Individuals will find new ways to apply this data, improve city services, and expand their reach. By sharing our information, the public can help us to improve services and create a more liveable city. And as an open government, sharing data increases our transparency and accountability. …

Draft code of conduct for public health data sharing

On May 8, Elizabeth Pisani released the first draft of the Bamako data sharing code of conduct.

The code arose from last year’s Global Ministerial Forum on Research for Health (Bamako, Mali, November 17-19, 2008), where participants formulated the Bamako Call to Action on Research for Health, which included a call for "open and equitable access to research data, tools, and information…."  For more background, see Pisani’s slide presentation at the November 2008 meeting on the need for a data sharing code of conduct, and a report on the the discussion following Pisani’s presentation. 

From the May draft code:

…What is driving the exponential growth in knowledge in areas such as genetics, astrophysics, information technology? Data sharing….

Epidemiology and public health have been left behind in this data sharing revolution, mired in a culture that restricts access to data and information. This is in part because of a perceived need to protect the privacy of individuals involved in research. But public health is a public good; in public health research there’s an ethical imperative to use information gathered from individuals to benefit the greatest possible number of people. Public health deserves to advance at the same speed as genetics, where data sharing has led to an explosion of progress. The World Health Organisation and several funders of public health research, led by the Wellcome Trust, are thus supporting the development of a code of conduct to encourage greater sharing of public health data. The code seeks to provide guidance for funders of data collection and for institutions that collect and analyse data, including those who perform secondary analysis on data collected by other people. The principles espoused by the code are universal….

The draft code presented here is the product of initial discussions between epidemiologists and data managers from all continents. They gathered with a number of representatives from governments, international organisations and major funders of public health research in London on October 6th, 2008 to agree on the core principles in the code. The discussions of this Working Group were informed by a background paper which reviewed the major challenges to more open exchange of public health data, challenges that can be categorised broadly as incentive-related, capacity-related, ethical and technical. The draft code is structured around these four areas. The background paper has been updated to reflect the outcome of the meeting, and is appended here….

A code of conduct on data sharing is an important first step in striking the balance between the advancement of science and the rights and needs of individuals and communities….

To the extent possible, the code promotes the sharing of micro-level data — that is, individual level records. There may occasionally be reason to restrict access to individual level data. There is rarely any reason at all to restrict access to aggregated data….

We support the maximum public access to data of public health importance compatible with the following principles:

  • The protection of privacy of individuals from whom data are gathered
  • Fair reward for the work of data collectors and primary investigators
  • Maximum public health benefit delivered in a reasonable time frame….

Limited-time exclusive access for primary researchers

Data are available to the research team involved in data collection and their institutional partners for a fixed period (between six and 18 months) before they are shared. This allows the research team a head start on data analysis and publication….

Following a period of exclusive access for primary researchers where necessary, the most common levels for access to data of public health importance will be:

Fully open access

Data (anonymised where necessary) are made available in machine-readable formats on publicly-accessible websites. This is most desirable and should be encouraged where feasible and compatible with privacy….

Controlled public access

Data are made available to authorised users after a screening process. This is likely to be the most common form of access for data of public health importance….

Collaborative access among scientists

Data are made available to other scientists in a collaborative network. Collaborative access may be necessary for complex datasets that include sensitive information where anonymisation is difficult (e.g. longitudinal data sets including HIV status)….

Exclusive access for primary researchers

Data are only available to the research team involved in data collection and their institutional partners. This is currently the norm in public health data collection, but it is precisely this norm that the current code seeks to change. There are few cases in which this degree of exclusivity is necessary in the long term….

Increasing the incentives to share data

Under the Code of Conduct on Data Sharing we agree to:

Put past data sharing performance on a par with publication as a criterion for evaluating the performance and job suitability of scientists, as well as evaluating grant proposals.

Reward concrete plans for data sharing when evaluating funding proposals for research and routine health systems functions such as surveillance.

Develop citation standards and indices for shared data sets; commit to using them when publishing secondary analysis.

Require registration of public-health related research and data collection in open access data-bases to facilitate data discovery and create demand for shared data.

Encourage submission of micro-data to public repositories as a condition for journal publication of research results.

Promote a “creative commons” approach, in which derived datasets and secondary analysis files based on shared data are in turn made publicly available.

Support an ombudsman system to oversee the fair use and proper acknowledgement by secondary users of shared data….

Using technology to increase data sharing

Under the Code of Conduct on Data Sharing we agree to:

Commit to a single metadata standard for datasets of public health interest….

Ensure that metadata are open access and machine-readable, even for data that are shared under the controlled or collaborative access standards.

Support the development of “open source” software for management, documentation and analysis of public health data….

Taking the code forward

In trying to meet the needs of [a] huge and varied constituency, the current draft code is vague: phrases such as “promote x” and “encourage y” predominate. As the code develops, we hope that it will become more concrete: “Funding institutions commit to investing in x”, “Secondary analysts agree to provide y”….

WHO innovation plan approved after dropping R&D treaty

William New, Broad Plan On IP, Innovation In Developing Countries Approved At WHO, Intellectual Property Watch, May 22, 2009.

Applause broke out at the annual World Health Assembly Friday as agreement was reached at the end of a five-year process to devise a plan for boosting research and development on and access to drugs needed by developing countries. Now with the full assembly’s approval, the focus turns to five-year implementation and as-yet unclear ways to pay for it. …

Agreement in committee was reached after a group of developing countries eager to discuss a possible treaty on biomedical R&D dropped a demand to include the WHO as a stakeholder in discussions about the treaty …

The approved global strategy and plan of action on public health, innovation and intellectual property aims by 2015 to train over 500,000 R&D workers, improve research infrastructure, national capacity and technology transfer, and lead to numerous other outcomes such as creating 10 public access compound libraries and 35 new health products (vaccines, diagnostics and medicines). …

The WHO legal counsel gave an opinion to the committee that dropping the WHO as a stakeholder would not prejudice the R&D treaty issue as it is addressed in a separate expert working group on financing to continue deliberations this year under a mandate from the 2008 assembly. Those proposals are still on the table and could go the assembly next year, the counsel said. It also would not prevent any member state from making any proposals to the Executive Board as is standard WHO process. …

To accomplish all of the proposed activities was estimated by the secretariat to cost nearly $150 billion over the period of implementation. But several participants de-emphasised those estimates as hard to verify. …

Meanwhile, NGOs Health Action International and IQsensato this week issued a proposed way for countries to monitor implementation of the strategy and action plan. The proposal is available here. …

Comments. Background:

  • The World Health Assembly, the governing body for the World Health Organization, formally approved the Global Strategy and Plan of Action on Public Health, Innovation and Intellectual Property. The broad plan was drafted and revised through a working group over several years. The WHA had approved a draft plan last year, which did not include complete timeframes, progress indicators, estimated costs, and lists of stakeholders.
  • The plan includes an element directly related to OA, element 2.4(b), “strongly encouraging” publicly-funded researchers to self-archive. This element was watered down from a mandate in 2007.
  • The plan also suggests working on an R&D treaty. The draft treaty includes an OA mandate. The final version of the plan approved retains the reference to the treaty, but removes WHO from the stakeholders: i.e., WHO will not proceed with work on the treaty under the aegis of this plan. However, discussions on the treaty can proceed independently of WHO, and the treaty can continue to be discussed through other mechanisms at WHO (and in fact is already included in ongoing discussions on another topic). So the removal of WHO from the list of stakeholders in an R&D treaty doesn’t kill the treaty, but it delays WHO’s involvement indefinitely.

More on Merck’s Sage

Rick Mullin, Merck Seeds An Open Database With Computers And Data, Chemical & Engineering News, May 25, 2009.

Stephen Friend and Eric Schadt came to Merck & Co. in 2001 when the drug company purchased Rosetta Inpharmatics. They will be leaving this summer and taking with them a data-packed 10,000-processor computer cluster at Rosetta’s facilities in Seattle.

Friend and Schadt are launching Sage Bionetwork, an open-platform database for sharing and disseminating complex disease biology data. What’s spurred them is the massive influx of biological data in drug research and the need for collaboration in understanding the biological mechanisms of disease. …

“We are headed toward a clinical genomic Tower of Babel where people each have their own view of what’s going on and can’t talk to each other,” Friend says. “The reason I am leaving Merck is that I fundamentally believe we need a different mechanism, a space between the private and public sectors that will reward people for sharing and that will make disease biology a precompetitive space.” …

Friend says Merck’s willingness to share the data in Seattle with other research organizations by simply handing it over to Sage is an indication that large drug companies are becoming more willing to work collaboratively and are beginning to broaden the definition of public, precompetitive data. “I don’t think the pharmaceutical industry is willing to share compound data,” he says, “but it is willing to share disease biology data.”

Sage intends to expand its data center through partnerships. Its first partner, the Fred Hutchinson/University of Washington Cancer Consortium, is local. But Schadt envisions partnerships worldwide. The group is in talks with the Wellcome Trust, in London, and with potential partners in China. Sage’s tentative launch date is July 1.

See also another story in the same issue on the use of cloud computing for research, including its use at Sage and comments by John Wilbanks.

See also our past posts on Sage.

Update. Schadt has announced he’ll be taking a new day job rather than working at Sage full-time.