…overarching goal of transforming innovations in research and education into sustained software resources that are an integral part of the cyberinfrastructure.
Or, in other words, science + software = good!
This particular workshop focused broadly on the issues surrounding the digitization of biological and paleontological collections — basically, databasing label data from natural history museum collections, but also databasing images and other media* — and more specifically, on identifying gaps within current digitization practices.
Workshop organizers Chris Norris and Jim Beach had the further goal to help guide potential recipients of digitzation funding that come from yet different programs. Right now, there are several such programs that support these efforts, especially the Advancing Digitization of Biological Collections (ADBC) program ( http://www.nsf.gov/pubs/2010/nsf10603/nsf10603.htm) but also programs such as Improvements to Biological Research Collections (http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5448) and Dimensions of Biodiversity (http://www.nsf.gov/pubs/2011/nsf11518/nsf11518.htm).
So, the problem, in a nutshell:
- There are up to a _billion_ natural history specimens in the US — this includes biological specimens (animal pelts, taxidermied animals, skeletons, snakes and snails in jars of alcohol, insect collections, herbarium specimens, and more) and paleontological specimens (fossils).
- Only 10% of these (at the MOST) have been digitized and made available.
- The rate of addition to these collections is somewhere around 5-15% per annum — and though new additions tend to get digitized as they’re added, that means that older collections are continually being pushed aside.
- Digitization is important for a range of reasons, but the biggest is simply that it greatly increases the discoverability and usability of these specimens and their data for many downstream uses.
- Older collections that predate (human-caused) climatic and landscape changes are particularly useful (and irreplaceable!), providing a baseline against which to measure biodiversity changes happening now.
- Digitizing collections can also preserve the essential data about specimens. Often the only information we have is on a single label composed of ‘ink on paper’ floating around in specimen jars. Multiple copies of data — it just seems like a good idea!
It’s an intimidating problem, and one that collections managers have been addressing off and on, in different ways, for more than 30 years. There have been numerous grant-funded and volunteer-fueled projects over the years, but digitization efforts usually remain local and uncoordinated, and the problems are just as bad in large museums as small. As important as this issue of mass digitization is, it simply isn’t the only thing on museums’ dockets.
However, all is not lost: over the course of this three-day workshop, we saw numerous presentations on different museums’ digitization efforts, and left with the overwhelming feeling that the natural history museum community suffers no shortage of innovation or initiative! What they do lack: mechanisms for large scale implementation and organization — and maybe some insights from neighboring fields like Systems Engineering or Library and Information Science (the latter being where I come in). There are some fundamental changes that could be made to how we conceive of and cogitate upon this problem of mass digitization that could help us leap over that 10% hurdle, for instance:
- creating workflows for specific physical types of collections, and the way “labels” are stored for those collections, rather than taxonomic types could provide museums with transferable tools that work across domains.**
- creating collections level descriptions including detailed information about the state-of-the-ledger would not only create important finding aids between collections, but could really help us zero in on kinds of digitization activities and technology that need to be developed
- the same collections level descriptions can greatly help prioritize efforts, and maybe also consolidate some of the simplest materials to digitize (e.g. ledgers) to make rapid progress and have success.
- continuing the digitization discussion beyond workshops like this!
Digitization of natural history collections is achievable, but it requires communication and coordination; the goal of this blog is to share what we learned/discussed at this workshop, and to hopefully keep that ball rolling!
* “What do we mean when we say digitization?!?” is one of those discussions that inevitably happens at meetings like this, and for good reason: there are a range of digitization activities surrounding the capture of natural history collections. However, I think for the purposes of this discussion, it means most fundamentally digitizing so-called “label data” — the catalogue numbers, taxon names and locality information that are generally common across the diverse field of natural history collections. Sounds easy, right? Well, it would be nice and easy if everyone had easy-to-scan paper catalogue books, but many collections don’t; as became the meme of the workshops, many collections consist of thousands of “snakes-in-a-jar:” a specimen preserved in alcohol, with a label inside or on the jar, making systematic data collection or digitization next to impossible.
** The distinction here is that some “alcohol-lot” collections have labels in them, such as “snakes in a jar” and no other record of the specimen data. In RPG’s invertebrate collection, lots can be very small, and the only label in the jar is a specimen number which refers to a set of “label data” in a specimen ledger (e.g., book).