Crowdsourcing, Deep Reading, and Narrative: Part 3

Ok, so it’s no Berlin Trilogy, but the reason we wanted to break this up into three posts was so we could take the time to really tease out some subtler points about the value of citizen science approaches for natural history digitization.  The main points we want to make are:

  1. Citizen Science projects connect like-minded people and by so doing, create work communities that enhance both volunteer experience AND data quality;
  2. These tools not only connect people but also allow them to become part of the narrative of discovery, and by so doing enhance quantity and quality of the data produced.
  3. Contextual knowledge that helps explain what is being transcribed, and why, enriches the experience for citizen scientists and leads to improved data quality.

More below.

POINT THE FIRST:  There are smart amateurs out there, dammit (and we’ve relied on them for years).  It might be a “duh”, but citizen science is a FANTASTIC way to connect a task with people who are well suited to perform it.  This isn’t just about efficiency — it is about finding people who deeply ENJOY the work, who want to be involved, and who want to discover other like-minded folks and talk about the experience.  Natural history work has relied on volunteers for decades, if not centuries — in field work, in cataloguing, and, yes, in in-house transcription projects.  Online citizen science is a natural evolution from in-museum volunteerism.

We think translating the volunteer experience from a museum collection to an online community could have striking benefits both for volunteers and museum collections — benefits that aren’t possible in a local environment.  First benefit: access to an expanded pool of up-to-date knowledge that can help with difficult tasks — say, a hard-to-read label or an unfamiliar locality (we’ll explain this more in Point 3).  Second benefit: rapid access to innovations and ideas from the “hivemind.”  With enough fairly basic infrastructure (read: message boards) volunteers can (and will) talk amongst themselves to share the newest, fastest ways of solving problems and tackling tasks — be they transcription or, say, the identification of a whole new class of interstellar object.

POINT 2:  The narrative necessary to keep volunteers engaged in a project is also necessary to create good, usable data and to start the process of scientific inquiry.   Part of the charm of Old Weather is the sense of being on the boats, tracking where they went, and learning about the people on them.  But this sense of being on a journey is more than window dressing — it provides volunteers with valuable context that allows them to make better decisions while transcribing and correcting data.  The combination of mapping functions, access to other ship’s logs, and access to other people working on the same project provides volunteers with the context necessary to quickly “enter” the narrative and begin working with the data.

The benefit of engaging with history, of “entering” the narrative, is, ironically, that it allows us to release data from the very history in which they are bound (i.e. catalog ledgers), and then used in new contexts.   We argue that this process of collectively unlocking and then re-assembling biodiversity data into new contexts also increases their fitness for use; data are only considered usable after they’ve been checked (or referenced, to borrow from Latour*) against other data.  It is this process that ultimately leads to scientific discoveries.

We want to emphasize here the “science” part of citizen science, and its ability to build new, collective knowledge.  Can we show transcribers the fruit of others’ digitization efforts to build this collective knowledge?  How might this be accomplished for natural history ledger/specimen transcription?  One obvious idea is a collaborative map showing new records as they are digitized by all participants.  Such a map would show gaps in what we know about biodiversity and how those are being filled by efforts of transcribers.  Such a map could also link to other scientific projects that utilize the “what/where/when” data being transcribed in order to document species distributions.   For example, Rob routinely uses these data for just that purpose — documenting species distributions and how they are changing in response to rapidly accelerating environmental change.  What do you think!?

POINT 3:   We are uniquely fortunate in our discipline to have excellent reference materials in electronic format.  For example, The Encyclopedia of Life (EOL) is a remarkable resource that continues to fulfill the promise of its name.  Linking contextualizing resources such as the EOL directly into the transcription workflow enhances user experience and data quality.   It could help volunteers decipher hard-to-read taxon names (e.g. does that label say Pinus torreyanus or Pinus torreyana?  Bacillus or Bacillus?).  Furthermore, EOL’s value extends past facilitating data quality improvements because it has a built-in reward system: citizen scientists can create personal collections of the species they have transcribed, along with information about those taxa (e.g. where they are located, what they sound like).  These collections can be a sharable legacy of their work while also providing a link back to natural history and the excitement of discovery.

We have two closing thoughts: in our original post we couched our assertion that crowdsourcing trumps machine-only transcription techniques in the caveat that we were being “a little provocative as opposed to right.”  Well, after reading comments, researching, and cogitating we are increasingly convinced that we might just be flat out right.  Crowdsourcing science works.  Period.

Second closing thought:  All of our blog postings start as labyrinthine Google Docs, and there is always a long list of ideas, partial paragraphs, etc. that don’t get incorporated into the posting at hand.  Well, we’ve taken many of the (better) spare pieces and parts — PLUS ANY COMMENTS WE GET SO PLEASE COMMENT — and molded ideas (with attribution to you, dear readers) into a talk at the annual TDWG (Oct 16-21, New Orleans) meeting.  Talk title: “Enlisting the Use of Educated Volunteers at a Distance: Or, Why Crowdsourcing and Citizen Science Will NOT Create Nightmare Zombies That Will Destroy Us All”.  Yeah!  Hope to see you there!

*Recommended reading for this week (and thanks to the CIRSS Tuesday Reading Group for it):

Latour, B. (1999). Circulating reference: Sampling the soil in the Amazon forest. Pandora’s Hope. Harvard University Press. Retrieved from:


About Andrea

Andrea is a Ph.D. student in Library and Information Science at the University of Illinois at Urbana-Champaign, and is supported by the Center for Informatics Research in Science and Scholarship.
This entry was posted in crowdsourcing. Bookmark the permalink.

2 Responses to Crowdsourcing, Deep Reading, and Narrative: Part 3

  1. Pingback: Session Proposal: Crowdsourcing - THATCamp American Historical Association 2012

  2. Thank you Andrea. It’s good to know I’m not a touch-typing zombie! 🙂 I absolutely _love_ the idea of the map – yes please?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s