How Wikidata can change the world of scientific information 1/n

>> Hang on! What’s Wikidata? And Wikimedia? I’ve heard of Wikipedia, but…

Wikipedia is a free encyclopedia. It doesn’t do everything. It’s one of about 12 projects under the aegis of the Wikimedia Foundation. It’s the one everyone has heard of, but there are lots of others which are also about making structured information and knowledge available for free and freely reusable by everyone. For example Wikimedia Commons is a huge resource of free images, videos, etc. Many of them are linked from Wikipedia articles but there are lots more which can be re-used in all sorts of ways. Teaching, research, new media …

>> OK, so Wikidata is the same thing for data? …

… Yes, but it’s not “all the world’s free data”. It’s carefully described data, carefully selected, and with clear provenance. When you find some Wikidata you know:

  •  what it is
  • where it came from
  • how it can be used
  • what other data it is related to

>> so give me an example. If I want to find out where Zika is endemic, then can I find it in Wikidata?… Yes. Good example. Actually “Zika” represents quite a lot of different things. It represents a virus…

>> Yes, but surely that’s it?

… No, it also represents the fever caused by the virus. They aren’t the same …

>> OK, I can see that. OK there would have to be two entries…

… No there’s more. Do you know where Zika virus was first discovered ?

>> In Africa? But no idea where…

… In the Zika forest – in Uganda. The virus was named after the forest. So it’s got a separate identifier. Lots of diseases are named after the place where they were first identified.

And then there are people called “Zika”

>> But they wouldn’t cause any confusion?

… Yes, some of them are authors of scientific papers. Which have nothing to do with Zika virus, Zika forest, Zika fever…

>> H’mm. So if I search for “Zika” in G**gle. I’ll get all of these?

… G**gle will guess what you want, and add in what it and its sponsors want you to see. So I didn’t find any authors in the first 4 pages. It’s powerful, but it’s not objective,
and it’s not reproducible. If you search tomorrow you’ll get different results.

>> And Wikidata is more objective?

… Yes. Wikidata has different entries (items) for each of the categories above. The virus, the fever, the forest and the authors have different identifiers.

>> identifiers?

… Yes. Good information systems have unique identifiers for each piece of information. Your passport number is unique. That’s what the machines read at airports. So here are some identifiers:

  • – Zika Virus Q202864
  • – Zika Fever Q8071861
  • – Zika Forest Q22138769have a look at, that’s got masses of information about Zika virus.
    Oh, and here’s a botanist, Peter Francis Zika, whose Wikidata identifier is Q21613657.>> Help – that’s too much at once…. understood

    >> H’m. So does everything in the scientific world have an identifier in Wikidata?

    … no – there’s far too much. Even G**gle won’t get everything. But everything with a Wikipedia article will (or should) have a Wikidata item.
    And lots of things are in Wikidata that don’t have articles.
    The Wikidata community has imported lots of information directly from authoritative sources.

    >> Ok so I can assume that every *important* scientific fact is in Wikidata?

    … that depends on what is “important”? But there are already huge amounts of bioscientific information. Drugs, diseases …

    >> Hm, my brain is really starting to overheat. Let’s take a break and come back. Maybe with some more examples??

    … certainly with some more examples. I’ll show you how items can be linked together by properties…

    >> OK. We’ve not even talked about how it will change science. you may have to reteach me some of this when we next meet…

    … Just remember “Wikidata”.  be seing you