Ross Mounce and I are starting to extract content (“content-mining”) from BMC journals. [Why BMC only? Because most of the other major publishers refuse to let us do it even when we subscribe.] [Why not PLoS? For technical informatics reasons which I have communicated to PLoS and which they have taken on board.]
I am going to appeal frequently for like-minded people to form a community of Open Content Miners, so if you are interested, let us know.
Anyway we are going through BMC Evolutionary Biology and looking at data types. We are optimistic in general.
DISCLAIMER: I shall use examples from BMC because this is all I can access. I shall frequently be critical – BMC is no better or worse in most of these. My criticism of Elsevier, Wiley, Springer, Nature, RSC, ACS, etc. is an order of magnitude worse.
Anyway – here’s the first diagram I came across. I’ll say later HOW we extract, but for the moment look how badly the information is presented. That’s partly because of the slavery of the printed page (and “print” is the evil word because authors and publishers expect reader to print the page). Tell ME what you think is suboptimal about this figure (I have at least 3 complaints, some of which are very common). The diagram should scale to a size where the text is (just) readable.
People have been reading this – I’d value your comments.