This will be old news to many, but regardless: two big projects related to specimen digitization and biodiversity informatics launched in the past couple weeks. Quick impressions on both below, focusing on the good, the buggy and a few items of curiosity. Both projects are great, but — how will they fit into the broader landscape of existing resources, and into what niches?
1) Notes From Nature — a new Zooniverse project for the transcription of natural history collection ledgers. This has been a long time in the making (more details here) and as of this writing, the two available collections (Herbarium specimens from SERNEC and insects from CALBUG) are already 26% and 21% transcribed, respectively.
The Good: As always, clean and intuitive interfaces from the Zooniverse team make transcription fast and easy. Data entry screens are customized to each type of collection (e.g. plant labels often contain more detailed locality descriptions than insects, whereas insect labels often contain data about what host-organism they were found on). Awesomely, all the code is available on Github (https://github.com/zooniverse/notesFromNature) in case other Museums want to set up their own transcription engines locally. There is also an intriguing teaser buried at the very bottom of the Notes from Nature “About” page: “Interested in publishing your collection? Contact us.”
The Buggy: Maaaan, I’ve transcribed around 40 labels and my total isn’t showing up under my user profile. This bothers me more than I care to admit, though it’s primarily out of worry that my transcriptions aren’t being saved.
The Curious: It would be great to learn more about how these data get back to the collections databases, and how exactly that handoff happens. What do the transcribed files look like? How is accuracy checked? Do the museums have plans to make these records publicly available, or harvestable by aggregators like GBIF?
2) The patriotically-named Biodiversity Information Serving Our Nation (AKA BISON) biodiversity data portal out of USGS – I know less about this project, other than what I’ve learned at various conference talks — however I’ve heard it referred to as the “federal version of iDigBio.”
The Good: On first look, really nice integration of specimen occurrence data with USGS map layers, and as Hilmar Lapp pointed out, there’s an API, which is great.
The Buggy: There are no identifiers on these specimens — not even their local catalog numbers. Per Stinger Guala in the G+ thread linked above, the data is there — it’s just not yet visible (though will be soon). Perhaps there are reasons (a need for better formatting? a need for cleaner data? a need for more server space?) that they’re not yet making this data visible yet — but it struck me as a pretty glaring omission. While I realize that many researchers don’t spend a lot of time looking at catalog numbers, I imagine that they’d be absolutely critical if one was integrating BISON data with that from other sources (say, something from another portal like GBIF). Also, how could any of these records ever be linked back to the source data or any other data out there? Provenance = important, no?
The Curious: BISON is apparently the US node of GBIF — which I had assumed meant they would be providing GBIF with US data – however, the data in BISON appears to invert that model and is a US-centric mirror of GBIF. I hope that BISON becomes a platform through which US, federally owned and managed biocollections can be made publically discoverable, and would be interested to hear from BISON reps if there are any plans in place to do this.
