CopyCamp: why Copyright reform has failed TDM / ContentMining – 1 The vision and the tragedy

I am honoured to have been invited to speak at CopyCamp2017,  “The Internet of Copyrighted Things” .  I’ve not been to CopyCamp before, but I’ve been to similar events and I’m delighted to see it is sponsored by organisations, some of which I belong to, that are fighting for digital freedom. In these posts I’ll show why copyright has failed science; this post shows why knowledge is valuable and must be free.

I’m giving a workshop on Thursday and talking on Friday (after scares from Ryanair) and I’m blogging (as I often to) to clear my thoughts and help add to the static slides. This is the latest in a 40-year journey of hope, which is increasingly destroyed by copyright maximalism. I am being turned from an innovative scientist who had a dream of building something excitingly new to an angry activist who is fighting for everyone’s rights. I can accept when science doesn’t work because it often just doesn’t; I get angry when mega-capitalists are using science as a way to generate money and in the wake destroying something potentially wonderful.

Here’s the story. 45 years ago I had my first scientific insight – working with Jack Dunitz in Zurich – that by collecting many seemingly unrelated observations (in this case crystal structures) I could find new science by looking at the patterns between them (“reaction pathways”). This is knowledge-driven research, where a scientist takes the results of others and interprets them in different ways. It’s as old as science itself, exemplified in chemistry by Mendeleev’s collection of the properties of compounds and analysis in the Periodic Table of the Elements. Mendeleev didn’t measure all those properties – many will have been reported in the scientific literature – his genius was to make sense out of seemingly unrelated properties.

40 years ago chemists started to use computers to carry out simple chemical artificial intelligence – analysis of spectra and chemical synthesis. I was entranced by the prospect, but realised it relied on large amounts of knowledge to take it further. I was transformed by TimBL’s vision of the Semantic Web – where knowledge could be computed. I moved to Cambridge in 1999 with the long-term aim to create “chemical AI”.  I created a dream – the WorldWide Molecular Matrix – where knowledge would be constantly captured, formalized and logic or knowledge engines would extract, or even create, new chemical insights.

To do this we’d need automatic extraction of information using machines – thousands of articles or even more. In 2005-2010 I was funded (with others) by EPSRC and JISC to develop tools to extract chemical knowledge from the scientific literature. It’s hard and horrible because scientific papers are not authored to be read by machines. I have spent years writing code to do this and now have a toolset which can read tens of thousands of papers a day (or more if we pay for clouds) and extract high quality chemistry. This chemistry is novel because it’s too expensive and boring to extract by hand and would be an important addition to what we have. As an example Nick Day in my group built CrystalEye which extracted 250,000 crystal structures, improved them and published them under an Open Licence – we’ve no joined forces with the wonderful Crystallography Open Database . Later Peter Corbett, Daniel Lowe, and Lezan Hawizy built novel, Open, software for extracting chemistry from the text of papers.

So now I have everything I want – thousands of scientific articles every day, maybe 10-15% containing some chemistry, and a set of Open tools that anyone can use and improve. I’m ready to try the impossible dream – of building a chemical AI…

What will it find?

NOTHING. Because if I or anyone use it without the PUBLISHER’s permissiom, the University will be immediately cut off by the publisher because …

… because it might upset their market. Or their perceived dominance over researchers. This isn’t a scare or over-reaction – there are enough stories of scientists of many disciplines being cut off arbitrarily to show it’s standard. One day 2 years ago the American Chemical Society’s automatic triggers cut off 200 universities. Publishers send bullying mails “you have been illegally downloading content” (totally untruee), or “stealing” (also untrue).

This is now so common that many researchers and even more librarians are scared of publishers. This blog has outlined much of this in the past and it’s not getting better. My dream has been destroyed by avarice, fear and conservatism. I’ll outline the symptoms, what needs to be done and urge citizens to own this problems and assert that they have a fundamental right to open scientific knowledge.

My slides at CopyCamp: provide additional material.