S2I2: What is it?? What does it all mean??!? (or intro part II)

So another source of inspiration for this blog’s creation: Rob and I both attended an NSF-funded S2I2 workshop at the Field Museum a few weeks ago (Rob, because he was properly invited; me, because I have developed great skill in Tagging Along).  S2I2 stands for Software Infrastructure for Sustained Innovation; in NSF’s words, these workshops are designed with the…
…overarching goal of transforming innovations in research and education into sustained software resources that are an integral part of the cyberinfrastructure.


Or, in other words, science + software = good!

This particular workshop focused broadly on the issues surrounding the digitization of biological and paleontological collections — basically, databasing label data from natural history museum collections, but also databasing images and other media* — and more specifically, on identifying gaps within current digitization practices.

Workshop organizers Chris Norris and Jim Beach had the further goal to help guide potential recipients of digitzation funding that come from yet different programs.  Right now, there are several such programs that support these efforts, especially the Advancing Digitization of Biological Collections (ADBC) program ( http://www.nsf.gov/pubs/2010/nsf10603/nsf10603.htm) but also programs such as Improvements to Biological Research Collections (http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5448) and Dimensions of Biodiversity (http://www.nsf.gov/pubs/2011/nsf11518/nsf11518.htm).

So, the problem, in a nutshell:

  1. There are up to a _billion_ natural history specimens in the US — this includes biological specimens (animal pelts, taxidermied animals, skeletons, snakes and snails in jars of alcohol, insect collections, herbarium specimens, and more) and paleontological specimens (fossils).
  2. Only 10% of these (at the MOST) have been digitized and made available.
  3. The rate of addition to these collections is somewhere around 5-15% per annum — and though new additions tend to get digitized as they’re added, that means that older collections are continually being pushed aside.
  4. Digitization is important for a range of reasons, but the biggest is simply that it greatly increases the discoverability and usability of these specimens and their data for many downstream uses.
  5. Older collections that predate (human-caused) climatic and landscape changes are particularly useful (and irreplaceable!), providing a baseline against which to measure biodiversity changes happening now.
  6. Digitizing collections can also preserve the essential data about specimens.  Often the only information we have is on a single label composed of ‘ink on paper’ floating around in specimen jars.  Multiple copies of data — it just seems like a good idea!

It’s an intimidating problem, and one that collections managers have been addressing off and on, in different ways, for more than 30 years.  There have been numerous grant-funded and volunteer-fueled projects over the years, but digitization efforts usually remain local and uncoordinated, and the problems are just as bad in large museums as small.  As important as this issue of mass digitization is, it simply isn’t the only thing on museums’ dockets.

However, all is not lost: over the course of this three-day workshop, we saw numerous presentations on different museums’ digitization efforts, and left with the overwhelming feeling that the natural history museum community suffers no shortage of innovation or initiative!  What they do lack: mechanisms for large scale implementation and organization — and maybe some insights from neighboring fields like Systems Engineering or Library and Information Science (the latter being where I come in).  There are some fundamental changes that could be made to how we conceive of and cogitate upon this problem of mass digitization that could help us leap over that 10% hurdle, for instance:

  • creating workflows for specific physical types of collections, and the way “labels” are stored for those collections, rather than taxonomic types could provide museums with transferable tools that work across domains.**
  • creating collections level descriptions including detailed information about the state-of-the-ledger would not only create important finding aids between collections, but could really help us zero in on kinds of digitization activities and technology that need to be developed
  • the same collections level descriptions can greatly help prioritize efforts, and maybe also consolidate some of the simplest materials to digitize (e.g. ledgers) to make rapid progress and have success.
  • continuing the digitization discussion beyond workshops like this!

Digitization of natural history collections is achievable, but it requires communication and coordination; the goal of this blog is to share what we learned/discussed at this workshop, and to hopefully keep that ball rolling!

* “What do we mean when we say digitization?!?” is one of those discussions that inevitably happens at meetings like this, and for good reason: there are a range of digitization activities surrounding the capture of natural history collections.   However, I think for the purposes of this discussion, it means most fundamentally digitizing so-called “label data” — the catalogue numbers, taxon names and locality information that are generally common across the diverse field of natural history collections.  Sounds easy, right?  Well, it would be nice and easy if everyone had easy-to-scan paper catalogue books, but many collections don’t; as became the meme of the workshops, many collections consist of thousands of “snakes-in-a-jar:” a specimen preserved in alcohol, with a label inside or on the jar, making systematic data collection or digitization next to impossible.

**  The distinction here is that some “alcohol-lot” collections have labels in them, such as “snakes in a jar” and no other record of the specimen data.  In RPG’s invertebrate collection, lots can be very small, and the only label in the jar is a specimen number which refers to a set of “label data” in a specimen ledger (e.g., book).


About Andrea

Andrea is a Ph.D. student in Library and Information Science at the University of Illinois at Urbana-Champaign, and is supported by the Center for Informatics Research in Science and Scholarship.
This entry was posted in S2I2. Bookmark the permalink.

3 Responses to S2I2: What is it?? What does it all mean??!? (or intro part II)

  1. Emma says:


    There are a lot of factors to consider when digitizing materials and documents. I found this museum digitization guide from CHIN very useful.

    Thanks for interesting post!

  2. Pingback: A confluence of drawers « So You Think You Can Digitize

  3. Hi! Amazing post, and really nice blog in general! We are now working in a relevant project and many of the issues just came up in one of our first project meetings, a couple of days ago! Really blown of by the relevance of the blog, the timing I found it but mainly the contents! Would be really happy to talk to you some day about some of these issues… http://www.natural-europe.eu


    Nikos (Information scientist)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s