Workshop WP5: Policy guidelines for open access and data preservation and dissemination workshop — 25 September 2014, Meervart Conference Center, Amsterdam, Netherlands.

The RECODE project invites you to participate in its final, policy recommendations workshop in Amsterdam on the 25 September 2014. RECODE has provided a forum for European stakeholders in the open access ecosystem to work together on common solutions. The project will culminate in a series of over-arching recommendations for a policy framework to support open access to European research data.

This final workshop is addressed to policy-makers and decision-makers within the following categories:

  • Research funders
  • Research performing organizations
  • Data centers
  • Scholarly societies
  • Publishers
  • Scholarly communication and research management experts
  • Information specialists

We invite the above stakeholders and experts to provide their input to the project’s recommendations on open access policies to research data and their implementation. Participants will be provided with a copy of the recommendations in advance of the workshop and invited to review and validate them. This open process is expected to enrich the recommendations further, giving participants the opportunity to make a direct contribution to the formulation of the policy recommendations and be part of the EU policy-making process.

Location: Meervart Conference Center, de meervaart, meer en vaart 300, 1068 LE Amsterdam

Time: 09:00 – 17:00 (to be finalised)

Content Mining: Extraction of data from Images into CSV files – step 0

Last week I showed how we can automatically extract data from images. The example was a phylogenetic tree, and although lots of people think these are wonderful, even more will have switched off. So now I’m going to show how we can analyse a “graph” and extract a CSV file. This will be in instalments so that you will  be left on a daily cliff-edge… (actually it’s because I am still refining and testing the code).  I am taking the example from “Acoustic Telemetry Validates a Citizen Science Approach for Monitoring Sharks on Coral Reefs” ( [I’ve not read it, but I assume they got volunteers to see how long they could evade being eaten with and without the control).

Anyway here’s our graph. I think most people  can understand it. There’s:

  • an x-axis, with ticks, numbers (0-14), title (“Sharks detected”) and units (“Individuals/day”)
  • a y-axis, with ticks, numbers (0-20), title (“Sharks observed”) and units (“Individuals/day”)
  • 12 points (black diamonds)
  • 12 error bars (like Tie-fighters) appearing to be symmetric
  • one “best line” through the points


We’d like to capture this as CSV. If you want to sing along, follow: (the link will point to a static version – i.e. not updated as I add code).

This may look simple, but let’s magnify it:


Whatever has happened? The problem is that we have a finite number of pixels. We might paint them black (0) or white (255) but this gives a jaggy effect which humans don’t like. So the plotting software adds gray pixels to fool your eye. It’s called antialiasing (not a word I would have thought of). So this means the image is actually gray.

Interpreting a gray scale of images is tough, and most algorithms can only count up to 1 (binary) so we “binarize” the image. That means that  pixel becomes either 0 (black) or 1 (white). This has the advantage that the file/memory can much smaller and also that we can do toplogical analyses as in the last blog post. But it throws information away and if we are looking at (say) small characters this can be problematic. However it’s a standard first step for many people and we’ll take it.

The simplest way to binarize a gray scale (which goes from 0 to 255 in unit steps) is to classify 0-127 as “black” and 128-255 as “white”. So let’s do that:



Now if we zoom in we can see the pixels are binary:


So this is the next step on our journey – how are we going to turn this into a CSV file? Not quite as simple as I have made it out – keep your brain in gear…

I’ll leave you on the cliff edge…



Not Just a Pretty Face: Island Poppies Defend With Prickles

PoppyAlthough the vibrant, waifish petals of the poppy may appear inviting to the casual observer, a closer look reveals a pricklier message: Stay away! To discourage plant eaters like insects and birds from biting into their leafy appendages, many plant species protect themselves with defense mechanisms, like tougher leaves, distasteful latex, and armor made of prickles. Developing these defense features is part of a plant’s natural growth throughout its lifetime. Some plants, however, are able to activate additional protection when faced with attacking herbivores. The authors of a recent PLOS ONE paper investigated these defense mechanisms in two species of poppy currently found in Hawaii, where natural herbivores have long been extinct. The authors’ results reveal that island poppies may have more “nettle” in the face of simulated adversity than previously predicted.

The authors chose two species of poppy for testing, Argemone glauca, a species native to Hawaii, and Argemone mexicana, a species originally hailing from the North American continent and a recent inhabitant of the Hawaiian islands. Both species come pre-equipped with permanent features that may function as defense strategies. However, permanent defenses are costly to maintain for a plant: They divert energy away from other functions, like reproduction and growth, and are therefore an energy investment for the plant. To combat the cost of maintaining a full suite of permanent defenses, some plants respond to attacks from plant eaters only when they occur by activating additional defenses, known as inducible defenses. Unlike defense features that develop throughout the course of a plant’s lifetime, also known as constitutive defenses, inducible defenses are not permanent, only prompted by specific need.

In this study, the researchers simulated the need for additional defenses by subjecting the two species to various “attacks” to see how the poppies would respond. Plants were assigned to one of four random treatment groups:

  1. The control group, which received no treatment
  2. The damage group, where the authors clipped off portions of the leaves
  3. The Jasmonic acid group, where researchers sprayed the leaves with a harmful solution that inhibits growth
  4. And the combination group, where authors defoliated plants first and then sprayed them with Jasmonic acid

The researchers then allowed for two new leaves to grow to ensure that the plants had an adequate amount of time to respond.

Although neither species developed additional leaf toughness or produced more natural latex in response to treatments, both species exhibited increased prickle density on new leaves that grew after treatment. To evaluate prickle density, the authors harvested new leaves and counted all the new prickles on the surfaces of the leaves, excluding prickles found along the leaf edge. They also quantified the leaf area and performed statistical analyses to identify patterns in the various groups.

The authors found that Hawaiian native A. glauca responded more intensely to treatment by developing significantly more prickles than its continental North American counterpart, A. mexicana. The authors report that prickles for A. glauca were 20x more dense and 2.7x higher than A. mexicana.

Plant defenses are selected for over time due to snacking pressures from herbivores. On the Hawaii islands, however, natural herbivores of A. glauca, such as flightless ducks and beetles, are now extinct. The lack of natural predators for island plants has given rise to the idea that island plants have ‘gone soft’ over time. The authors consider A. glauca’s robust response to external attacks evidence that island plants may be better defended than previously thought.

Although it may be impossible to determine whether these island defenses have been selected for by herbivores of the past, no longer present, the inducibility of prickles in A. glauca and A. mexicana demonstrates that these poppies have the mettle to fight back against attackers and snackers.

Citation: Hoan RP, Ormond RA, Barton KE (2014) Prickly Poppies Can Get Pricklier: Ontogenetic Patterns in the Induction of Physical Defense Traits. PLoS ONE 9(5): e96796. doi:10.1371/journal.pone.0096796

Image 1: Agemone glauca by Forest and Kim Starr

About UTS ePress | UTS ePRESS

We are very happy to announce that all UTS ePRESS journal works submitted after March 31, 2014 will be licensed CC BY 4.0 or Creative Commons Attribution 4.0 International. This upgrade to CC BY 4.0 is a central pillar of our ongoing work to enhance the Open Access credentials and functionality of UTS ePRESS, the publishing arm of the University of Technology, Sydney. It is also a part of our wider committment to the key principles of the Open Knowledge movement generally, and the BOAI Declaration specifically. UTS ePRESS will soon add DOIs to its journal content and provide HTML versions (in addition to existing PDFs) to enable better web-sharing and re-use of our journal content. We hope this not only builds the global reach and impact of our journal content, for the scholars involved, but also allows the largest possible audience to reap the benefits of the global sharing and use of publicly funded research-knowledge.” …

Social Machines, SOCIAM, WWMM, machine-human symbiosis, Wikipedia and the Scientist’s Amanuensis

Over 10 years ago, when peer-to-peer was an exciting and (through Napster) a liberating idea, I proposed the World Wide Molecular Matrix (Cambridge), (wikipedia) as a new approach to managing scientific information. It was bottom-up, semantic, and allowed scientists to share data as peers. It was ahead of the technology and ahead of the culture.

I also regularly listed tasks that a semi-artificially-intelligent chemical machine – the Scientists’ Amanuensis – could do,  such as read the literature, find new information and compute the results and republish to the community. I ended with:

“pass a first year university chemistry exam”

That would be possible today – by the end of this year – we could feed past questions into the machine and devise heuristics, machine learning and regurgitation that would get a 40% pass mark. Most of the software was envisaged in the 1970′s in the Stanford and Harvard AI/Chemistry labs.

The main thing stopping us doing it today is that the exam papers are Copyright. And that most of published science is Copyright. And I am spending my time fighting publishers rather than building the system. Oh dear!

Humans by themselves cannot solve the problem – the volume is too great – 1500 new scientific papers each day. And machines can’t solve it, as they have no judgment. Ask them to search for X and they’ll often find 0 hits or 100,000.

But a human-machine symbiosis can do wonderfully. It’s time has now come – and epitomised by the SOCIAM project which involves Southampton and Edinburgh (and others). It’s aim is to build human-machine communities. I have a close lead as Dave Murray-Rust (son) is part of the project and asked if The Content Mine could provide some synergy/help for a meeting today in Oxford. I can’t be there, and suggested that Jenny Molloy could (and I think she’ll meet in the bar after she has fed her mosquitoes).

There’s great synergy already. The world of social machines relies on trust – that various collaborators provide bits pf the solution and that the whole is larger than the parts. Academic in-fighting and meaningless metrics destroy progress in the modern world – the only thing worse is publishers’ lawyers. The Content Mine is happy to collaborate with anyone – The more you use what we can provide the better for everyone.

Dave and I have talked about possible SOCIAM/ContentMine projects. It’s hard to design them because a key part is human enthusiasm and willingness to help build the first examples. So it’s got to be something where there is a need, where the technology is close to the surface, where people want to share and where the results will wow the world. At present that looks like bioscience – and CM will be putting out result feeds of various sorts and seeing who is interested. We think that evolutionary biology, especially of dinosuars, but also of interesting or threatened species , would resonate.

The technology is now so much better and more importantly so much better known. The culture is ready for social machines. We can output the results of searches and scrapings in JSON, link to DBPedia using RDF – reformat and repurpose using Xpath or CSS. The collaborations doesn’t need to be top-down – each partner says “here’s what we’ve got” and the others say “OK here’s how we glue it together”. The vocabularies in bioscience and good. We can use social media such as Twitter – you don’t need to have an RDF schema to understand #tyrannosaurus_rex. One of the great things about species is that the binomial names are unique (unless you’re a taxonomist!) and that Wikipedia contains all the scientific knowledge we need.

There don’t seem to be any major problems [1]. If it breaks we’ll add glue just as TimBL did for URLs in the early web. Referential and semantic integrity are not important in social machines – we can converge onto solutions. If people want to communicate they’ll evolve to the technology that works for them – it may not be formally correct but it will work most of the time. And for science that’s good enough (half the science in the literature is potentially flawed anyway).



[1] One problem. The STM publishers are throwing money at politicians desperately trying to stop us. Join us in opposing them.


Canada’s supreme court decision, or aren’t we all indigenous to this planet?

On June 26, 2014 Canada’s Supreme Court issued a landmark decision on aboriginal title. In my opinion this was a very wise decision, and there is at least one part of this decision that I think merits global consideration. In brief, the Supreme Court decision subjects Aboriginal title to a responsibility to group interest and the enjoyment of the land by future generations. To me, this is perfectly appropriate but begs the question: why are non-aboriginal governments not held to this standard? I’d like to suggest that this concept should be expanded – to continue to recognize aboriginal title, but also to look at the world’s entire human population as indigenous to the planet, and hold every government everywhere accountable for making decisions in the collective interest and for the benefit of future generations – and to include water along with land.

Quote from the Supreme Court decision:

The nature of Aboriginal title is that it confers on the group that holds it the exclusive right to decide how the land is used and the right to benefit from those uses, subject to the restriction that the uses must be consistent with the group nature of the interest and the enjoyment of the land by future generations.  Prior to establishment of title, the Crown is required to consult in good faith with any Aboriginal groups asserting title to the land about proposed uses of the land and, if appropriate, accommodate the interests of such claimant groups. The level of consultation and accommodation required varies with the strength of the Aboriginal group’s claim to the land and the seriousness of the potentially adverse effect upon the interest claimed. 

Citation Tsilhqot’in Nation v. British Columbia, 2014 SCC 44

Date: 20140626
Docket: 34986 

Updated July 8 to correct spelling of “indigenous”. Thanks to Douglas Carrall for spotting the error and letting me know. 

Why I am fortunate to live and work in Cambridge


Today was the Tour de France; third day – Cambridge to London. A once-in-a-lifetime opportunity. Should I “take the morning off” to watch the race – or should I continue to hack code for freedom. After all we are in a neck and neck race with those who wish to control scientific information and restrict our work in the interests of capitalist shareholders.

I’m very fortunate in that I can do both. I’m 7 mins cycle from the historic centre of Cambridge. I can carry my laptop in my sack, find a convenient wall to sit on – and later stand on – and spend the waiting time hacking code. And when I got into the Centre I found the “eduroam” network. Eduroam is an academic network which is common in parts of the anglophone world, especially the British Commonwealth. So I could sit in front of the norman Round Church – 1000 years old – and pick up eduroam, perhaps from St Johns College.

The peleton rode ceremonially through Cambridge (it speeded up 2 kilometers down the road) but even so it only took 20 seconds to pass.

So I can do my work anywhere in Cambridge – on a punt, in a pub, in the Market Square, at home

and sometimes even in the Chemistry Department…

So thank you everyone who makes the networks work in Cambridge.

And here, if you can see it half way up the lefthand side (to the left of the red shirt) , is the bearsuit who came to watch the race.


Significant updates to the Open Access Directory.

“Nicole Contaxis is a summer intern for the Harvard Open Access Project at the +Berkman Center for Internet & Society. In addition to some offline research, she has been making significant updates to the Open Access Directory <>. I’m very happy to say that she’s already updated these three sections of the OAD:

Declarations in support of OA
OA advocacy organizations
OA journal funds

Thanks, Nicole!”

SPARC and the World Bank to co-host kickoff event for 2014 International Open Access Week

Mark your calendars! The SPARC/World Bank kickoff is a great way to start your Open Access Week planning. Watch livestreamed or showcase the recording at your events!

Also make sure you add your livestream events here so others can utilize them in their event planning.

More information soon.



On Monday, October 20th, from 3:00 to 4:00pm EDT, SPARC and the World Bank will co-host the official kickoff event for International Open Access Week 2014 with a reception to follow.  The event will be held at the headquarters of the World Bank in Washington, DC with a live webcast for online participation around the world.

The program will focus on this year’s theme of “Generation Open.” Speakers will discuss the importance of students and early career researchers in the transition to Open Access and explore how changes in scholarly publishing affect scholars and researchers at different stages of their careers.

Registration for the in-person event will open in September.  For those planning to participate virtually, the live stream of the kickoff can serve as programming for a local event or a watch party.  To receive updates on Open Access Week 2014, including the SPARC-World Bank Kickoff Event, please fill in the form below.

To be held from October 20 – 26, 2014, International Open Access Week is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned with colleagues, and to help inspire wider participation in helping to make Open Access a new norm in scholarship and research.

Dramatic Growth of Open Access June 30, 2014

The June 30, 2014 Dramatic Growth of Open Access celebrates the milestone of more than half a million articles funded by the U.S. National Institutes of Health that are now freely accessible! After 3 years, the percentage of items found through a PubMed search funded by NIH rises to 71% (for NIH staff), 66% for NIH external funded research, and 31% for any article regardless of funding. At first glance, this looks a lot like evidence suggesting the NIH Public Access Policy is very effective, more than doubling the percentage of items freely available! Thanks to Jihane Salhab from the Sustaining the Knowledge Commons team for the charts, data gathering and analysis of PMC Free this quarter.

Research Support, N.I.H. Extramural + Intramural

Research Support, N.I.H., Intramural [pt]

 Research Support, N.I.H., Extramural [pt]

No Limits (No distinction based on researcher)

The Dramatic Growth of Open Access Series is a quarterly series (end of March, June, September, and December) of key data illustrating the growth of open access, with additional comments and analysis. The series is available in open data and blogpost (commentary) editions. The quarterly series began December 31, 2005, and is predated by a peer-reviewed journal article featuring data as of February 2005. To download the data or the rationale & method, see the Dramatic Growth of Open Access dataverse. Morrison, Heather, 2014-03, “Dramatic Growth of Open Access”, Morrison, Heather [Distributor] V1 [Version].  The rationale and method has not been updated; March 31 is the latest. If you are using the June 30, 2014 PMC Free data, please Morrison, Heather and Salhab, Jihane.

More highlights this quarter

By the numbers, it’s usually the large, well-established and much used services that tend to impress. This quarter, the Bielefeld Academic Search Engine added 140 content providers and over 2 million documents for a total of over 3,000 content providers (illustrating the growth of the repository movement) and 62 million items (illustrating the growth of self-archiving). The Internet Archive gathered another 14 billion webpages for a total of 416 billion. The Electronic Journals library added another 958 journals that can be read free-of-charge for a total of over 45 thousand free journals. PubMedCentral added about 100 thousand free articles, for a total of over 3 million, and the number of journals actively contributing to PMC that now provide immediate free access grew by 63 to a total of 1,315. Searchable article growth in DOAJ was 75,000, bringing the total number of articles searchable by article in DOAJ to over 1.6 million.

By percentage growth, it’s the newest services starting off with nothing that have the greatest ability to impress. SCOAP3, the high energy physics full flip to open access global collaboration, started this January and nearly doubled the article count this quarter, to a total of over 2,000 articles. The Directory of Open Access Books added 6 publishers and 175 books for a total of 68 publishers and over 200 books.

Highwire Press added 8 completely free sites, for a total of 107 completely free sites, 8% growth this quarter (annual equivalent 32%).

Items of interest since March 31, 2014

  • June 4: the home page for Peter Suber’s MIT Press book Open Access passed the milestone of 100,000 page views (I highly recommend this as an excellent brief starting point for learning about OA).

This post is part of the Dramatic Growth of Open Access series.

Open science, when you don’t have time for closed science.

“Moving his right eye to control his computer, [Eric Valor] used social media to build a coalition of other open-science activists who work in amateur labs such as BioCurious, Berkeley Biolabs, and Bio Tech & Beyond. Then he founded SciOpen Research Group, a virtual company for research that relies on crowd-sourced fundraising, global collaborations and shared scientific discoveries, all contrary to the traditional closed private-research model. The open approach, writes Valor, ‘allows people with ideas to bootstrap investigations and complete them doing professional-quality research at a tenth of the cost of traditional research institutions. The failure rate might be high, but the barrier to entry for innovation is quite low. So it will allow for novel ideas to be tried against problems which have seemed insurmountable — i.e. “unprofitable” — for a long time.’ Data, he says, is not a profit-generating commodity; it is a ‘shared commons in which ideas can flourish.’  …”

Bravo to India’s DBT/DST on proposing a new world standard for OA policy

Government of India Department of Biotechnology and the Department of Science and Technology (DBT / DST) Proposed Open Access Policy
Comments submitted by Heather Morrison to the Open Access Policy Committee and cross-posted to Sustaining the Knowledge Commons and The Imaginary Journal of Poetic Economics
Congratulations to the Open Access Policy Committee for a proposed policy that can be considered a new model for the world in almost every respect!
My two suggestions to perfect this policy are as follows:
1.                  After this sentence on page 1: “Grantees can make their papers open-access by publishing in an open-access journal or, if they choose to publish in a subscription journal, by posting the final accepted manuscript to an online repository”, this sentence were added: “Grantees who publish in an open-access journal should post the final published manuscript to an online repository based in India”.
Rationale: journals and publishers are free to come and go and change business models as they please. A journal that is open access today could cease to exist, or be sold to a publisher that uses a toll access business model in the future. The only way to ensure ongoing open access to publicly funded research is through the use of repositories under the direct or indirect control of the funding agency.
2.                  p. 2: “Suggest that the period of embargo be no greater than one year” – change “Suggest” to “Insist”, and add this phrase: “Future revisions of this policy will look to decreasing and eventually eliminating accommodation for publisher embargoes”.
“Suggest” to “Insist”: the experience of one early open access policy leader, the U.S. National Institutes of Health, illustrated very well that certain publishers will take every advantage of any policy loophole available. The 2004 policy merely requiring open access had a dismal compliance rate; this changed dramatically with the strong 2008 policy. If researchers have options, publishers will refuse open access or demand longer embargoes. If policies are strong, publishers adjust as can be easily observed through the Sherpa RoMEO Publisher Copyright Policies and Self-Archiving service, which illustrates the shifting landscape of scholarly publishing overall towards compliance with open access policy as well as concessions for specific policies.
“Decreasing and eventually eliminating…publisher embargoes”: the purpose of permitting publisher embargoes is to give the industry time to adjust. Publishers have now had more than a decade to adjust to open access policies around the world, including many by the world’s largest research funders. There are now close to 10,000 fully open access peer-reviewed scholarly journals, employing a variety of business models, including commercial operations that are quite successful financially. There is no reason for publishers to continue to need the “training wheels” support of embargo periods indefinitely.
There is no reason to delay the advance of research by one year at every step. We need clean energy solutions and answers to tough questions like climate change today. Since scientific advance is incremental in nature, a one-year embargo at every step towards an advance can mean an actual delay of many years in achieving a breakthrough.
Particular strengths of this policy that I would like to highlight:
p. 1:  “DBT/DST will not underwrite article processing charges levied by some journals”.
Bravo! The purpose of public funding of research is and should be to facilitate the conduct of research, not to subsidize secondary support services such as scholarly publishing.  The priority for DBT/DST funding should be ensuring that India’s research facilities are state of the art and providing salaries for Indian researchers and support for Indian students.
Also, there are areas (with this policy being a good example) where government policy is the best approach, and other areas that are best left to the market. It is appropriate for governments to direct researchers benefiting from public funding to make their work openly accessible. However, there are reasons to leave business models to the market. One reason is that commercial companies employing the article processing fee method are likely to be subject to the same market forces that caused distortion in the subscriptions market, and targeted government funding in this area could easily exacerbate the problem. Another is that currently many publishers using the open access article processing fee approach provide waivers for authors from developing countries; this may even be the default. This information is from my research in progress (my apologies that my data is not yet ready to share; it will be posted as open data as soon as it is ready). If governments provide funding for authors from developing countries for article processing fees, this concession may well disappear and have a severe impact on authors without the benefit of such funds.
p. 1: “The DBT/DST affirms the principle that the intrinsic merit of the work, and not the title of the journal in which an author’s work is published, shouldbeconsideredinmakingfuturefundingdecisions.DBT/DSTdoesnot recommend the use of journal impact factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions”
Bravo! This is the approach recommended by the San Francisco Declaration on Research Assessment, and an approach that I heartily support. Among other things, heavy reliance on the impact factor as surrogate for quality of academic work has been a factor in market distortion in scholarly publishing. Also, reliance on impact factor has been an incentive for scholars to focus on topics of interest to high impact factor journals generally based in developed countries. For scholars in the developing world, this is an incentive to redirect focus from problems and issues of local concern to topics of interest to the developed world. This has also been a disincentive to development of local scholarly publishing systems. The ease of publishing on the internet means that it is timely for scholars in India and elsewhere to consider growing local scholarly publishing initiatives, providing opportunities for local leadership, outlets for research on topics of particular interest to India, and taking advantage of local currency and economic conditions to get the best deal on publishing services.
Other strengths shared with previous open access policies:
·       The policy is required, not just requested
·       Strong incentives for compliance (compliance considered in future funding and promotion requests)
·       Immediate deposit of final manuscript post peer review is required, even when access must be delayed due to publisher embargoes
In summary, India’s DBT/DST proposed open access policy is sound, innovative, and in my expert opinion, sets a new standard for the world. The two recommendations for improvement is to ensure that all articles are deposited in a local open access repository, including articles published in open access journals (which may in future cease to exist, change ownership or business model), and to insist on rather than suggest an embargo of no more than one year with language indicating eventual elimination of embargoes. Particular strengths highlighted are the refusal to provide funds for article processing fees and the direction to consider the quality of the work, not the impact factor of the journal in which it is published.
Dr. Heather Morrison
Assistant Professor
École des sciences de l’information / School of Information Studies
Master of Information Studies (M.I.S.) program accredited by the American Library Association
Maîtrise en sciences de l’information (M.S.I.) accréditée par l’American Library Association
University of Ottawa
July 5, 2014

