JHFNP, Post 4.5

A quick mini-post here, to tell of some interesting things:

1) Notebook 1 is DONE.  Fully annotated, and all within 3 days of our last post.  This represents many hours of work and the creation of hundreds of annotations:

{{place|…}} annotations: 218
{{dated|…}} annotations: 64
{{taxon|…}} annotations: 347

So total annotations = 629!

2) Furthermore, Notebook 1 was annotated almost entirely by 1 person… who we can’t thank by name because they didn’t make a WikiSource account.  So, thank you, IP address 173.69.207.29!  And also to: 207.145.38.73.  If you would like a “thank you” coffee mug please let us know in the comments or via email.

3)  We have set up a Wikisource Project page.  We still need to write detailed, step by step descriptions for some pages (e.g. transcription and annotation), but we do have an initial help page on upload available, and more will be posted there soon.

4)  We’re putting new notebooks and transcriptions onto Wikisource as fast as we can get transcriptions and scans associated with one another; transcriptions of Notebook 2 are already up.  This notebook features more paleontology and geology than the last, and describes several trips to Northern Colorado: “Instead of attending commencement and taking my B.A. degree I started north on foot, up the Lykins lateral valley NW of Colorado Sanitarium.”  

We are excited about, and appreciative of, the annotation help, and we really do want to acknowledge as many folks as possible in the upcoming Zookeys paper.  So once again, if you want to be credited by name, make sure to create (and login to) a WikiSource account before annotating (though if you prefer public anonymity, just email us directly to say you are helping); we probably can’t acknowledge IP addresses, and we’re not sure if we can acknowledge aliases in a published paper (though how great would it be to say, “special thanks to Paul Flemons, Laura Russell and SquirrelFan23 for her or his help”).

Next, more substantial update: annotations and occurrences, as promised last time.

Posted in Henderson Project | 8 Comments

Field Notes Challenge Part 4: Help, ‘Cause We Need Somebod(y/ies)

Co-written once again with Gaurav Vaidya.

Over the last week, Gaurav has continued to pull templates out of his hat (leaving rabbit pulling to Rob and his bunnies) and we now have templates for locations and dates.  The syntax for these templates are very similar to the “taxon” template we discussed in the last post (and touch on below).

Let’s start with date.  The general syntax is:

{{dated|<date in YYYY-MM-DD Format>|<date as transcribed>}}

So this example from Page 12 of Notebook 1 becomes, “{{dated|1905-08-04|Aug. 4}}.”

The Location template looks similar –

{{place|Location Name|Location name as transcribed}}

– but does something particularly nifty: it creates a linkout to OpenStreetMaps which immediately resolves the place name on a map along with links to Wikipedia and Wikimedia Commons.  So this example on Page 6 of Notebook 1 is annotated like this:
“{{place|Florissant Lake|Florissant Lake basin}},” which creates a box like this: The little circled map in the image above is the link out.  YAY MAPS.

As always, the million dollar question is: what next?!?  The next step is to continue using these templates to fully annotate Henderson’s first notebook — which is where you guys come in.  All these experiments are well and good, but until the rubber hits the road (or the fingers hit the keyboard), its more theory than practice. We need your help.  If you have time and inclination, please jump in and annotate.  We want to make it clear that you can’t hinder our progress, only help, and that this is really easy.  It’s a wiki, so hit “edit” on one of the pages (for example this one, which mentions pikas (!)), and just try out a taxon or location annotation.  For example, on Page 31, you could hit edit and replace the word “chickadee” with this:  {{taxon|Paridae|chickadee}}, or even {{taxon|chickadee}} and VOILA — annotation!  Go back to the main index page (the up arrow next to the ‘Page | Discussion | Image” links) and you can see the changes have also shown up in the main contents page.

We have a particular need to get this done quickly, because we have been asked to assemble our experiments into a peer-reviewed and hopefully published paper in a special issue of Zookeys.  A manuscript is due in mid-March and, yeah, that isn’t very far away.  So we could really use your help with annotating this text. The rewards — apart from a general sense of well-being and the satisfaction of contributing to the furthering of knowledge about our planet — will be a direct mention of your help in the acknowledgements section of our paper.  If you are interested in making a more substantive contribution in terms of work and writing, we’d be pleased to chat more and possibly include you as a co-author.

We have been on the fence previously about the utility of prizes, and whether these are effective incentives, or a titch gimmicky.  In the past we’ve given (very small) Amazon gift cards as prizes, as a way to say thank you to those that took the time to comment on post, but this time around we’re thinking of something different: Rob is happy to make a small donation from (very limited) personal funds to make a Junius Henderson coffee mug and then give those away as prizes to people.  But we wanna know — do prizes help motivate you to get involved in annotating?  Or are they eye roll inducing?

COME ON who doesn’t want a coffee mug?

Next post: text mining, annotations and occurrences!

Posted in crowdsourcing, field notes, Henderson Project | 1 Comment

Field Notes Challenge Part 3: New Year’s Digital Resolutions

“Resolution” is a word with many meanings. It can refer to the granularity of a digital image, or the solution to a problem, or a firm decision to do or not do something.  The Carefree Cogitation Coalition here at So You Think You Can Digitize has been thinking all about resolutions as we enter the new year. We left 2011 with some exciting developments and new challenges related to making digitized and transcribed field notes openly available.  We have resolved some of the issues mentioned in our last post, and are now resolved to tackle perhaps the biggest challenge yet: to find a flexible way to annotate these notes.

First, to recap BRIEFLY:
1) We decided to use Wikisource as a platform for providing scans of CU Museum founder Junius Henderson’s field notes along with transcriptions.
2) The first notebook scans and transcriptions are now available.  Wikisource’s navigation can be less than intuitive, so here’s the index page, which lists all the pages of the Field Notebook along with metadata.  Next, click the “Notebook 1” link on the upper right hand side of the screen to get to the contents page for that notebook.  You’ll see a Table of Contents here and, as you scroll down, the full transcription, with page numbers listed along the left border of the page.   Click on any of these page numbers (for example, Page 5) and you will go to a page displaying the scanned image of the Henderson notebook and transcription.  This page is editable according to Wiki rules.
3) We’d originally resolved to only spend 5 hours a piece TOTAL on this project.  Yeah, consider that resolution broken.  Maybe 5 hours a piece… per week?

Figure 1.  Page 5 of Henderson’s field notebook as shown on Wikisource

So far, most of our work has been focused on figuring out how to get Wikisource to represent these notes in a way that’s consistent with existing Wikisource standards and policy, while also serving our needs as field-note data-miners.  We think we’ve done this pretty well; Gaurav Vaidya has put a ton of work into developing templates for taxon annotations.  You see those little items in boxes up there in Figure 1? Gaurav’s template (which has it’s own Wikisource page) automagically creates both a direct hyperlink to the species page as well as the floating boxes that link out to Wikispecies and the Wikimedia Commons.  The mark up itself looks like this:

{{taxon|<taxon-name>|<text-to-appear on transcription>}}.

So “Lark buntings” would become “{{taxon|Calamospiza melanocorys|Lark buntings}}.”

Pretty simple!  Our next steps are similarly simple.  We (read: Gaurav) will create annotation templates for “Dates” and “Locations,” and then start marking them up in the text.  Andrea has been linking together resources (uBio’s FindIt, Europeana’s Geoparser,  and her own rudimentary code) to automate this markup, so that the future notebooks we upload will be pre-loaded with links (we’ll talk about this more in our next post).  Locations, at least, will also be inter-wiki linked so interested readers can learn about the places Henderson visited during his journeys.  As soon as we have the templates for location and date done, we’ll post the syntax here and you can just jump in and try.  We’d love the help!

So now we have annotated field notes online, readily and freely available to everybody!  Exciting!  But here is the really exciting part: we think we can push these annotations out of the World of Wikipedia and into the larger semantic web.  Our plan is to unlock those annotations from Wikisource and try to represent them as separate Darwin Core observation records; more on what those records look like here.  We… aren’t entirely sure how we’re going to do this yet, but we’ll keep you posted.  We also have some interesting ideas about what to do with these:  http://commons.wikimedia.org/wiki/Category:Junius_Henderson.  Maybe you do too.  If you do, please comment.  We live for comments.  Be resolved to let us know what you think, and Happy New Year.

Posted in Henderson Project | 4 Comments

Field Note Challenge Part 2: Veni, Vidi, Wiki

SYTYCD would like to welcome guest blog co-author Gaurav Vaidya.

A week ago, we told you about our cunning plan to play around with annotating and publishing one  transcribed notebook of Junius Henderson’s field notes. We’ve had two big successes in the last seven days, which is not bad for soul-crushing finals and project deadline week the holiday season.

Success #1:  YOU GUYS, the internet is amazing.  Within half an hour of posting our last post, we were contacted by Dena Smith and Kathy Hollis, who alerted us to the existence of scans of Henderson’s notebooks — remember, we only had transcribed text files when we started!  This started a chain of events that put us in touch with two folks from the National Snow and Ice Data Center (NSIDC): Allaina Wallace, librarian and analog data archivist, and Ruth Duerr, Manager of Data Stewardship.  Less than 24 hours later, Rob and Gaurav had a productive meeting at NSIDC offices in east Boulder, and a DVD containing all the scans.  This DVD included three notebooks we hadn’t known about, two of which cover Henderson’s travels between 1927 and 1936 — adding another decade to his life on the road — AND were accompanied by more of Peter Robinson’s transcriptions.

From the field notes of Junius Henderson, Notebook 1

From the field notes of Junius Henderson, Notebook 1

Success #2:  Having the scans made a huge impact on what we were able to do with the text.  In particular, Gaurav has made headway in using WikiSource as a platform for maximal use and re-use of Henderson notes. WikiSource is “an online digital library of free content textual sources on a wiki, operated by the Wikimedia Foundation” (i.e. Wikipedia). Uploading the scan of Henderson’s first notebook to the Wikimedia Commons was easy: these are now available as PDF or DjVu files. Once the scans were in the Commons, Gaurav created an Index page (following instructions on the Beginner’s Guide to Index: files and the Introduction to Proofreading on WikiSource). The Index maps pages from the scanned DjVu file to pages on WikiSource. Click on a yellow-coloured page number to proofread or edit an existing page (for example, page 3), or on a red-coloured page to transcribe it.   Transcription itself is dead-easy: the page image is displayed on the right, and a textbox (which accepts all MediaWiki syntax) is displayed on the left.  In our case, since we have the transcriptions “done”, it was mostly cutting and pasting sections of Peter’s transcribed text so that it aligned with Henderson’s scrawl on the scanned pages.

So yay, successes!!  The fruits of a week’s worth of work are available on the “Notebook 1″ page on WikiSource, where — using WikiSource’s <pages /> command — Gaurav mapped pages from the scanned DjVu file to pages on WikiSource.  Numbers along the left margin of the main page link back to the corresponding page from the Index, making it easy to verify or fix transcription errors.  Also, Gaurav compiled pages from the Index into sections representing field trips (just as Henderson did in his notes), and listed them in a “Contents” box at the top of the page.

Henderson’s field notes continue to be, first and foremost, a good read. “Notebook 1” features details from Henderson’s week-long trips to Florissant, Colorado (August 1905) and Silver Lake Arapahoe (September 1905). He keeps record of everything from the stamina of his comrades:

“The party showed fatigue in the following order: Sievert least, I next, then Watts, Then Markman, then Frank.” (August 30, 1905)

to train delays and opportunities for rumination:

“Train again so late as to afford ample opportunity for philosophic meditation upon the motives which inspire railroad people to advertise time which they do not expect to make except under rare circumstances.” (September 3, 1905)

What next?  Our sense of what we want to do and what is possible is rapidly evolving.  Simply having the scanned field notebook pages completely changed our game plan.  Before Wednesday of this week, we just had transcriptions.  Now we have the whole enchilada.  What we currently want is a no-cost, minimal effort system that will make scans AND transcriptions AND annotations available, and that can facilitate text mining of the transcriptions.  Do we have that in WikiSource?  We will see.  More on annotations to follow in our next post but some father to a sister of some thoughts are already percolating and we have even implemented some rudimentary examples.

We’d like to encourage you to try your hand at transcribing or annotating this notebook along with us, and to let us know what you think about the process (reminder: Henderson’s first field notebook is still available as plain text or as a Word document).  As on Wikipedia, all edits are saved, so you can’t really mess up – be bold, jump in (!) and tell us what you think.

Posted in crowdsourcing, field notes, Henderson Project, projects | 11 Comments

An Ode to Founders and a Field Notes Challenge: Part 1

Junius Henderson was the founder and first curator of the University of Colorado (CU) Museum of Natural History where Rob works.  Because Rob is the Invertebrate Curator of Zoology, and his training is in malacology (not “bad ecology” or “evil” but the study of molluscs such as squids, clams, snails), he has always been pleased that he can trace a direct taxonomic line back to Henderson, who was first and foremost one of the great descriptive malacologists working in western North America.  One hundred years later, brick and mortar testaments to CU’s Founders remain throughout campus: the Henderson building, where the CU Museum of Natural History exhibits are housed, and the Ramaley building (named after compatriot Francis Ramaley), home to the majority of the ecology and evolutionary biology department.

Junius Henderson kept copious field notes describing his many collecting trips; these were compiled into eleven volumes, and are archived in the museum.  The notes start in 1905 with this entry:

“Boulder, Colorado. July 28, 1905. Saw Say Phoebe and Siskins, Robin, Flicker.“  

Another very early entry reads,

“Expenses Florissant trip, 2 tickets to Denver Dr. Ramaley and I —-$2.00. Saw a Kingbird and Robin on way to depot…  Went to City Park and heard band and saw moving pictures including ‘Stage Robbery’ which, to say the least, was not an elevating spectacle, nor helpful to venturesome boys, apt to be carried away with the wildness of such a life.” [emphasis added for the benefit of any venturesome readers]

Twenty-two years and ten notebooks later, here is one of the last entries:

“Virginia Dale, Colo., Wednesday, June 15, 1927. Cloudy, foggy, rainy, cold morning, with a strong northwest wind. Started at 8 a.m. At edge of Laramie basin, speedometer 9728; Laramie 9747; Rock River 9787, at noon for lunch: Medicine Bow 9804; Ft. Steele 9848, about (speedometer slipped off just before reaching there); Rawlins 9864. Roads mostly gravelled and good; but in some places clay, and soft and slippery. Cleared about middle of afternoon and warmer this evening in camp at Rawlins.”

Fast forward another century (give or take): shortly after Rob’s arrival to CU in 2000, the now retired Curator of Paleontology, Peter Robinson mentioned he had personally transcribed ALL ELEVEN VOLUMES and saved each notebook as a separate Word document.  This is a best-case scenario for transcription in many ways; Peter is an expert with deep experience in natural history and paleontology, so his transcriptions of esoteric species names and locations are likely as accurate as they could possibly be.  While there are no scans of Henderson’s notes (yet), Peter did add some annotations (always using double parentheses) such as, “((at some later date Henderson wrote an emphatic ‘NO.’ at this place in the notebook))” to let readers know where they should refer back to the original notes.  So one disappointment is that Peter often added this annotation “((Drawing in field book))” to the notebook, which one cannot (yet) view.

Rob has made use of these notes in his research at CU; in 2003, he headed out on a summer-long collecting expedition as part of a State of Colorado survey of molluscs and crayfish in Western Colorado. Henderson’s field notes provided invaluable context and information about past collecting trips.  Henderson’s notes aren’t just part of the scientific record, however; they’re also a vivid image of the American West in a moment of swift change, as his modes of transportation transition from stagecoach to trains to automobiles, and his travels take him along new routes and through new towns and cities.  In our last post we talked about how we can best do work at the intersection of the sciences and the humanities; rich corpora of field notes like Henderson’s are exactly the media that tie these seemingly disparate disciplines together.

So why are we telling you all this?  Because we think that:

a) Henderson’s meticulousness and Peter Robinson’s hard work provide a remarkable resource that should be publicly available, and;
b) we’ve talked a lot about how to digitize, what to digitize, why to digitize, but we haven’t done quite as much work discussing what to do once you’ve digitized.  In other words, say you’ve transcribed 1000 pages of field notes.  Now what?

So over the last week we’ve been working on just this question of “Now What” using  Henderson’s field notes, with the following goals and caveats for this project:
1) We want to make the notes publicly accessible, easily discoverable, and preferably bundled with appropriate descriptive, structural and preservation metadata;
2) We want do so using the least restrictive licensing available (and we appreciate the support and encouragement of CU Museum Director Patrick Kociolek and Peter Robinson to do so);
3) We want to make use of some of the automated data extraction tools we’ve stumbled across over the last couple of months to do things like link names of taxa, places, people and dates to other sources of biodiversity knowledge;
4) We want to produce at least one Nifty Thing as a result of this project — like a map on Google Earth showing Henderson’s travels;
5) We don’t want to spend more than five hours each on this.  This is because we’re both super busy, and we also like the idea of figuring out what substantial products can be produced on a budget of no money and close-to-no time.

Rob’s student Gaurav Vaidya has also been working on this project with us, focusing on possible wikipedia-oriented solutions, and we’re all nearing/exceeding the end of our respective 5 hour allotments (even when excluding time spent looking up movies from the 1900’s and pictures of Say’s Phoebe).   In the interim, here (in text and Word formats) is the first notebook of Henderson’s for your perusal and to get you thinking and doing. In posts that follow, we will report some of our next steps with the full corpus along with releasing the other notebooks.  More soon!

Posted in Henderson Project | 3 Comments

Where do the digital humanities and eScience intersect? — Crosspost with VertNet

This special post was co-written with David Bloom, VertNet Coordinator and crossposted (with some minor mods) at the Vertnet Blog.  

First and foremost, digitization of natural history collections and tools to make these digitized records available, such as VertNet, support global biodiversity research.  We suspect that the majority of use of digitized records will be to generate products such as species distribution models and change assessments, and to answer questions about what is in any given museum collection.  However, in the broader context of academic endeavor, these data could also serve as a unique link between the digital sciences and the digital humanities.  Work in the digital humanities includes everything from crowdsourcing manuscript transcription to humanistic fabrication to data mining — work that is not so dissimilar in method, description, or data type from that in the digital sciences.

Biological collections aren’t the only organizations engaged in massive digitization efforts; libraries and archives have been digitizing and making their materials discoverable and interoperable for decades as well.  As a result of these efforts, an unprecedented number of research materials from a wide range of domains are now available for free on the Web.  Just as VertNet does for biodiversity data, the University of Illinois’ Digital Collections and Content project does for cultural heritage records, the Australia National Library’s Trove for newspapers, articles, and music.  The Hathi Trust makes more than 9 million books available — and the list goes on.  Digitization allows these materials to be recombined and analyzed quickly and (relatively) easily in new ways.

Our question is a simple one:  Where do the digital humanities and e-science overlap and interconnect?  One method of digital investigation that caught our attention is the mapping of novels and other historic texts; researchers take prose text and mine it for mappable units.  Erin Sells and her students, for instance, have used this method to create dynamic maps of Virginia Woolf’s Mrs. Dalloway, which incorporate “pictures, sounds, videos, and the text itself into the map.”  Similarly, in the Google Ancient Places project, researchers mine archaeological and historical texts to create databases of georeferenced ancient locales which can then be mapped.  Though these researchers are working with novels, they’re producing data in formats similar to those used for species occurrence records in databases such as VertNet.

This made us think: what sorts of questions could we ask of a data set composed of all kinds of georeferences — not just species occurrence records, but locations from history or works of fiction as well?  If students of the humanities can create maps with such texture using similarly organized data sets, could they build on this richness by including analysis of the natural world as it existed at the time described in the novel?  Perhaps searching on the VertNet portal (or GBIF or ALA) could provide a detailed list of vertebrate species and, with a little more work, the associated ranges of these species.  Suddenly, the map of Mrs. Dalloway’s world, and the atmosphere of Clarissa’s party, can be enriched not only with human influence and creation, but by the natural environment, too.  Conversely, data from diaries or other digitized sources could be mined for data about distributions of now-extinct species.  Could these data be used as observations and published as records along with those from natural history collections?

We hope that VertNet will support interdisciplinary research in the science and the humanities by providing new avenues for deeper readings, and new ways to reconstruct real and imagined worlds.  Where are the specimens that Lewis and Clark found on their expeditions and how do those link up with their journals (online already!!)?  What about whale species described by Melville?   How accurate are James Fenimore Cooper’s depictions of the animals Hawkeye and Cora encountered as they traveled through the Great Lakes?  What does this accuracy or inaccuracy tell you about Cooper as an author?  What about Thoreau’s notebooks of life at Walden Pond, and how have this iconic landscape and its animals and plants changed since his stay?

We also hope that other folks have more ideas about what new combinations of data and domains of inquiry are possible now that so many different sources of knowledge have been digitized.  How can eScience support and enrich the digital humanities and vice-versa? What happens when images of specimens* mix with drawings from the literature? Point-radius georeferences, for example, are easy enough to pull together from different sources — what further visualizations could be created with the combination of journals, books, and catalog ledgers?  What further ways can we use data and smarts to bridge gaps between the sciences and the humanities?

SYTYCD is offering the inaugural Thinky People’s Digitizaton Challenge (THIPDIC).   This first THIPDIC will go to the person or people who provide our favorite comment showing how digital science and the digital humanities intersect.  Any cool examples?  Any deeper thoughts about how this happens?  Any cute pictures of animals reading book?  Winners will be celebrated the world over and will be eligible for a (modest) prize, offered by Rob (don’t worry, it’ll be something interesting and of actual value).  You may now talk amongst yourselves.

* gigapan snakes in jar!

Posted in Uncategorized | 8 Comments

Zombies versus Unicorns at TDWG (or, a recap of citizen science talks)

So You Think You Can Digitize was in the Big Easy for TDWG 2011 last week. Summarizing the whole meeting is best left for friends Nico and Gaurav, who have longer attention spans than us. Nor should you miss the Unicorn Magic from friends at VertNet. Instead, we’ll focus our efforts on a set of talks in the citizen science session.

Batting first: Enlisting the Use of Educated Volunteers at a Distance: Or, Why Crowdsourcing and Citizen Science Will NOT Create Nightmare Zombies That Will Destroy Us All.

Presented by us! Slides are here. This talk developed organically out of the last few SYTYCD posts, but also gave us an opportunity to push a bit further on some trickier concepts we’ve been cogitating on for the last few months.  Particularly:1)  We presented some neat (and we think relevant) education literature that shows that knowledge may be constructed more quickly through peer discussion in the classroom. We argued that volunteers communicating and using existing resources to vet records is analogous to students talking to their neighbors in the classroom. What do you think? Discuss!

2)  We also argued that the creation of these large crowdsourcing interfaces and applications (e.g. Old Weather, Atlas of Living Australia,) necessarily forces “articulation work” — that is, the work explaining what one group of people wants done by another group of people (e.g. curators by web developers, collections managers by volunteers).  A fundamental concern of citizen science is about how to best connect the people collecting or annotating data back to the scientists who use them.  Using web applications to facilitate this connection forces both the citizen scientists and the experts to understand the data and encode that understanding into those apps.  For a standards group like TDWG, this act of encoding is particularly iimportant to consider and understand; we need to remember that standards aren’t just ways of passively creating databases with consistent field names, but are means of facilitating communication and shared sense of mission between people as well.

Notes: We might still have some work articulating articulation work.  Also, our best intentions to collect data on how easily people can use existing web resources to more effectively digitize foundered on the rocks of too little time to get through some, uh, minor logistics issues (in particular, IRB Human Subject approvals – facepalm).  However, we still hope to do this in the future.

Batting in the 2-Spot:  Crowd sourcing record transcription to unlock historical species data from natural history collections.

Andrew Hill, Vizzuality wunderkind and semi-erstwhile PhD student at CU Boulder with Rob, discussed Vizzuality’s rapid development of citizen science projects like “Old Weather” and a new one for NASA called NEEMO. Andrew showed that citizen scientists work together in the spirit of both cooperation and competition by relating how he and company owner Javi De La Torre kept vying for the top scoring spot in NEEMO — only to be blown away by a NASA employee who was also working/playing. It is an interesting line, at least from our perspective, where elements of competition and collaboration can both be optimized in developing citizen science applications. We here at SYTYCD have tended to focus on cooperation and narrative — not on game-ification and competition — but maybe there is a middle ground that yields the best of both worlds, and maybe the broadest appeal. Perhaps competition works better for some demographics and cooperation for others. Andrew also announced that Vizzuality is likely going to be involved, in some capacity, in developing a citizen science project for natural history transcription. We love this plan and can’t wait to hear more.

Batting Third:  Crowd-sourcing: perpetual valuable resource or a passing shower of dubious worth?

Paul Flemons, who Rob thinks looks just a teensy bit like Samuel Vimes (famous fictional cop), presented his work with the ALA’s “Australian Museum Cicada Expedition” while deftly weaving in musings about the long-term value of crowdsourcing as a digitization tool.  One thing we particularly liked seeing was a frequency plot showing the  “long tail”  of transcription efforts.  That is, most volunteers who drop by the site will only transcribe one or two records; however, there are a few extraordinarily dedicated folks who will transcribe much larger numbers — hundreds or thousands of records.  Why?  Well this gets to incentives — really, all the talks in the session ultimately touched on this essential topic.  Is it possible to build a citizen science tool that shifts that long tail to be shorter and stouter so that more people are willing to transcribe more records?   Paul ended his talk saying he wasn’t entirely sure about the future of crowdsourced transcription for natural history collections — he is still not sure that we have the critical mass of volunteers needed to transcribe EVERYTHING, or that the links between the volunteer work and science are always full exposed.

After seeing all the talks and the excellent demonstrations by Beth Mantle, Katja Schulz, and Tony Kirchgessner, we are more optimistic than Paul. One reason for optimism: we overheard comments like “Wow!  this session was amazingly well attended” and, to paraphrase, “this might actually work.” So, yeah, TDWG was indeed great, even if one of us who isn’t Rob did get suckered into Co-Chairing the Citizen Science Interest Group.  And yes, we do indeed still think we can digitize.

Speaking of digitization, we have been following the crowd-sourcing thread for a long time now, and next posts may swing back around to other topics of interest in the broader realm of natural history digitization.  With the ramping up of Thematic Collections Networks and the iDigBio HUB, the hard work of digitizing and the even harder work of innovating is just getting started….

Posted in Uncategorized | Leave a comment