Google adds search for public data

Ola Rosling, Adding search power to public data, Official Google Blog, April 28, 2009.

… We just launched a new search feature that makes it easy to find and compare public data. …

If you go to and type in [unemployment rate] or [population] followed by a U.S. state or county, you will see the most recent estimates:

Once you click the link, you’ll go to an interactive chart that lets you add and remove data for different geographical areas. …

The data we’re including in this first launch represents just a small fraction of all the interesting public data available on the web. There are statistics for prices of cookies, CO2 emissions, asthma frequency, high school graduation rates, bakers’ salaries, number of wildfires, and the list goes on. Reliable information about these kinds of things exists thanks to the hard work of data collectors gathering countless survey forms, and of careful statisticians estimating meaningful indicators that make hidden patterns of the world visible to the eye. All the data we’ve used in this first launch are produced and published by the U.S. Bureau of Labor Statistics and the U.S. Census Bureau’s Population Division. …

Since Google’s acquisition of Trendalyzer two years ago, we have been working on creating a new service that make lots of data instantly available for intuitive, visual exploration. Today’s launch is a first step in that direction. We hope people will find this search feature helpful, whether it’s used in the classroom, the boardroom or around the kitchen table. We also hope that this will pave the way for public data to take a more central role in informed public conversations. …

See also Google’s Information for public data publishers:

… Google wants to eventually display data from other governmental agencies, research institutes, and other private organizations as well. To do so, we want to identify free, authoritative, high-quality data, irrespective of topic and locale. We are interested in both aggregated statistics and the underlying raw data from which they were derived. Other types of structured information like reference lists and classifications are also of great interest. We will not use any data that compromises the privacy of individuals or infringes upon any proprietary rights.

If you are a data publisher, get your data out to a wider audience, through Google, by telling us about your public data. …

APA adds Wellcome-compliant OA option

Robert Kiley, American Psychological Association develops Wellcome-compliant OA option, UK PubMed Central Blog, April 28, 2009.

The American Psychological Association (APA) – publisher of titles such as Journal of Abnormal Psychology and Psychological Bulletin – have developed a Wellcome-Trust compliant author-pays model.

In return for paying an OA fee ($4000) APA will deposit on behalf of the author, the final, published version directly into PMC, where it will be mirrored to UKPMC.

Such papers will be licenced such that anyone may “access, download, copy, display, and redistribute this article or manuscript as well as adapt, translate, or data and text mine the content contained in this document”, as long as this is done for non-commercial purposes, and proper attribution is given.

Upon submission to an APA journal, Wellcome-funded authors will be asked to identify their manuscript as being Wellcome Trust funded. If the papers is accepted for publication, Wellcome funded authors should complete this form to ensure that APA journals will deposit the manuscript in PMC.

As of April 2009, this author-pays option is only available to Wellcome-funded researchers.

Stimulus for cyberinfrastructure

James Boyle, What the information superhighways aren’t built of…, Financial Times, April 17, 2009. (Thanks to Lawrence Lessig.)

… We know that the United States’ experiments with freely providing publicly generated data — on everything from weather to roads to navigation — yield an incredible economic return. More than 30-fold by some estimates. We know that investment in basic science can provide stellar multipliers.

Some scholars have been arguing that the architecture of the internet, its embrace of openness as a design principle, might revolutionize science if we could apply the same principles there — if we could break down the legal and technical barriers that prevent the efficient networking of state funded research and data. Imagine a scientific research process that worked as efficiently as the web does for buying shoes. Then imagine what economic growth a faster, leaner, and more open scientific research environment might generate.

Streamlining science, learning from the success of the internet, more open access to state funded basic research: these kinds of initiatives are the ones that might provide the ”superhighways of the mind,” the ”freeways of the information age” — but they are too abstract, more likely to involve open data protocols than bundles of wires, and thus they garner little attention. Now would be an ideal time to invest in the architecture of openness, but this kind of architecture doesn’t get built with cement. …

More on the deepening access crisis

Charles Bailey, Seven ARL Libraries Face Major Planned or Potential Budget Cuts, DigitalKoans, April 28, 2009.  Excerpt:

Seven Association of Research Libraries member libraries are facing major planned or potential budget cuts….These examples suggest that significant budget cuts may be widespread in ARL libraries.

The Cornell University Library will have to cut around about $944,000 from the fiscal year 2010 materials budget.

The Emory University Libraries have "already cut $200,000 from the current (2008/2009) collections budget" and more cuts are planned in FY 2010….

The MIT Libraries are faced with a $1.4 million budget cut this summer….

The UCLA Libraries are facing a cut of over $400,00 this year alone….

The University of Tennessee Libraries sent a February 16th memo to deans, department heads, and library representatives saying that they were "facing a potential 8% base budget cut. This cut represents reductions totaling $1,343,299 from the library’s operations, personnel, and collections budget." …

The University of Washington Libraries have submitted a business plan to the Provost and Executive Vice President that reflects "levels of reduction in central support of 8%, 10%, and 12%." In dollar terms, these reductions are $2,457,962, $3,072,452, and $$3,686,943 respectively.

The Yale University Library is cutting its collection budget for the first time due to budget shortfalls….

Related post: "University of Florida Libraries Propose to Cut Budget by over $2.6 Million.

PS:  I’ve argued that the recession will have mixed results for OA, but will strengthen the case for it.

Preview of Wolfram/Alpha

Wolfram Research previewed its question-answering service, Wolfram/Alpha, at Harvard yesterday.  See the 105′:57" webcast or David Weinberger’s blog notes

Also see Larry Dignan’s preview of Wolfram/Alpha at ZDNet or Frederic Lardinois’ preview at ReadWriteWeb.

Comment.  At first glance Alpha looks like any other free search engine.  But it returns direct answers, sometimes with graphs of relevant data, not just links to pages which might contain answers.  I’m looking forward to its launch next month.  This kind of service –from humans or machines– is what I meant (in an article last summer) by solving the last-mile problem for knowledge.

More on the NIH policy and Conyers bill in the MSM

Brian Blank, Copyright Battle Looms for Docs Who ‘Grew Up Google’, ABC News, April 22, 2009.  Excerpt:

…Now a fourth-year student at Harvard Medical School, [Carolina] Solis spent a summer doing research in San Juan del Sur, Nicaragua. Before the sun rose each morning, she boarded an old school bus bound for some of Nicaragua’s most remote regions. When she finally arrived, farmers would be waiting for her, clutching small cups. The cups contained samples of their own stools, which Solis would check for evidence of certain parasites. Gathering up the samples, she then made the long trek back to San Juan del Sur….

Her results were startling. Up to 80 percent of some communities were infected. Contaminated well water was a likely culprit.

Like many researchers, she plans to submit her findings for publication in a medical journal. What she discovered could benefit not just Nicaraguan communities but those anywhere that face similar problems. When she submits her paper, though, she says the doctors she worked with back in San Juan del Sur will probably never get a chance to read it.

"They were telling me their problems accessing these [journals]. It can be difficult for them to keep up with all the changes in medicine." …

Now, with Washington rushing to transform health care, a debate often limited to hospital wards, medical schools and Internet forums is pushing to the fore. It’s a debate deeply rooted in beliefs about access to information — medical research. Increasingly, a generational gap is emerging.

On one side of the gap are those who say such research should be free to all, that it’s too valuable to keep firmly planted in the walled gardens of the prestigious journals that publish it. And for research that’s taxpayer-funded, the public that paid for it, at least, deserves access.

On the other side of the gap are those who say the copyright interests of the journals come first….

The [traditional subscription] pay-to-play model doesn’t jive with a generation of soon-to-be docs who "grew up Google," with information no farther than a search button away. It’s a generation that…doesn’t see why something as important as medical research should be locked behind the paywalls of private journals….

Washington recently got involved. Squirreled away in the massive $410 billion spending package the president signed into law last month is an open access provision. It makes permanent a previous requirement that says the public should have access to taxpayer-funded research free of charge in an online archive called PubMed Central….

But Democrats are divided on the issue. In February, Rep. John Conyers, D-Mich., submitted a bill that would reverse open access. HR 801, the Fair Copyright in Research Works Act, would prohibit government agencies from automatically making that research free….

[Dr. C. Michael Gibson] says it’s only a matter of time before the generation that verbified "Google" abandons the more traditional journal model. A cardiologist at Beth Israel Deaconess Medical Center in Boston, he didn’t grow up with the Internet but has embraced it. Four years ago, he literally borrowed a page from Wikipedia and started his own medical wiki, called WikiDoc. Like its progenitor, anyone can edit its pages. And because names are attached, Gibson says the whole process is a purer form of peer review. "In an era where information’s ubiquitous, the days of highly cloistered, secretive processes are just over." …