#scholrev: Strategy and decentralisation

I have already suggested our #scholrev should be decentralized (




) – now I’ll say why and suggest how we proceed.

Those of us in #scholrev are disillusioned enough that we want to do something different. Perhaps the most well promoted was “an alternative to Google Scholar” (http://www.force11.org/node/4291 ) by Stian Håklev .

We need an open alternative to Google Scholar (like OSM [OpenStreetMap] is to GMaps). Imagine OJS/EPrints/DSpace pinging a central server with bibliographic metadata whenever a new article is published (like blogs pinging pingomatic), letting users contribute their own bibliographies. Every article would have a unique ID, enabling easy citation in any setting (a simple API would give citations in any format given the identifier, would also let you look a PDF file based on its hash, like MusicBrainz, or search). The database would be available for bulk download and data mining. Strongly integrated into all OA tools/citation managers, etc.

Why hasn’t this happened already? Because libraries would rather buy things than build them. That gets us locked into an increasing cycle of deprivation – the more we buy the less capacity we have for building. And every year it gets worse. We already see that institutional repositories look 10 years out of date – they aren’t full, no-one wants to put things in, they can’t be searched etc. Compare that with Stackoverflow, Github and Bitbucket, OpenStreetMap, etc. and you can escape the sense of frustration.

We want to do our own thing.

So for me, #scholrev has the following drivers:

  • Innovation
  • Social justice
  • Cost-effectiveness
  • Democracy

How to proceed? We have a lot of ground to catch up. But if OSM could change the world in 5 years so can we. We face two main problems:

  • The indifference and possibly hostility of universities
  • Lawyers and vested interests

The first problem just requires courage and determination (Wikipedia was trashed by Universities until they couldn’t ignore it). The second is a real problem and we have to minimise it. But both suggest that we should have some or all of our work outside the current academic infrastructure. If we are to reach out to the #scholarlypoor (the global South, SMEs everywhere, patients, etc.) we cannot do this through centralised mechanisms. Wikipedia and OSM had single clear goals initially (an open encyclopedia of everything, and an open map of the world). Our task is more varied. The grand visions for reforming scholarship include (and you will think of more) :

  • Machine semantic Indexing/access to some/all of the literature (“some” if the lawyers stop us doing “all”)
  • Democratising scholarship
  • Creative approaches to combining scholarship and authoring
  • Intelligent machines for reading and interpreting the literature
  • Alternatives to monographs

(these are all impossible at present).

These visions are too large and varied to plan top-down and must be bottom-up. They are also too large to coordinate at a detailed level. However #scholrev has shown there are lots of groups starting to do-their-own-thing. The history of the web shows that some of these will flourish and others won’t. This is an absolute judgment, it’s more that the time is right for some and not for others (it’s taken us 20 years to get semantic Chemistry moving). So we shouldn’t judge new developments too quickly but give them time to flourish.

What about duplication and waste? Wouldn’t (say) 20 independent authoring systems be worse than none at all? Shouldn’t we coordinate this centrally and have just one? In fact both are problematic. In the Blue Obelisk (v.i.) we’ve effectively solved this by constantly keeping in touch and watching what others do. For example I once spent a lot of time on developing a graphical display for chemistry. It wasn’t very good. And then I saw Jmol (http://jmol.org) and realised that *I* didn’t need to do it all myself.

I junked my code. A year’s worth. And rejoiced. From there we went on to the Blue Obelisk and now we have this great ecosystem. A few partial duplicates – but that’s useful for checking correctness, different platforms. And because we have legitimised the idea of components that interoperate the world has come to understand and respect what we have done.

That’s the key step. We don’t have to boil the ocean by ourselves. Or even in our groups. We build components. It’s the right way to build.

Can you publish components in high-impact closed journals?

Probably not. But that is not why we are building them. By building components we can reach out well beyond academia. An open scholarly indexer does not have to be built solely or even by academics. Let’s get software engineers and journalists and graphic designers involved. And patients.

We couldn’t have done this 5 years ago. We can now. What’s happened?

  • Wikipedia, OSM have shown that grand visions can be accomplished
  • GalaxyZoo has shown that meaningful subtasks can be created and that huge numbers of citizens can take part. Bringing their own innovation and enhancing the process.
  • StackOverflow has shown that social tools can be compelling and exciting
  • Github and Bitbucket have shown how to create repositories that people want to put things in because these repos do something useful
  • New lightweight tools such as NoSQL , d3.js, and HTML5
  • (and in Open Knowledge Foundation) we see the world outside academia adopting new ideas by the week.

So we can’t tell where and how the new things will happen. Something that looked impossible 2 years ago may now be very tractable. Glueing distributed systems together is far easier than it used to be.

So a distributed system is now a positive asset, not a problem to be solved by aggregation and central control. In the same way the communities can be glued by modern approaches and culture. That’s why I’m suggesting we should be distributed but communicating.

There are only a few basic rules:

  • Respect others
  • Try to work with people rather than compete
  • Keep everything completely open. An open API is problematic if the data can’t be dumped. Code relying on a closed component will crash when that component disappears.
  • Creating and giving are critically important. Some jobs are boring, tedious and necessary. We must find social ways of making them worthwhile.

So we can have more than one discussion list. More than one wikipage. Let’s first see what we can offer rather than what we want to accomplish. (Doesn’t have to be gold-plated.) And make these creations and their creators easy to find.

To start the process here’s some of what I and my collaborators can Openly offer:

  • A PDF2XHTML converter for scholarly articles and converters
  • Pubcrawler to discover and collect bibliographic metadata
  • Semantic scientific units of measurement
  • Semantic tools for physical science (especially chemistry) (useful for indexing and transforming)

So let’s see what we want to bring to and get from our marketplace of tools and ideas.