Confluence. n. a. the flowing together of two or more streams b. the place of meeting of two streams c. the combined stream formed by conjunction [Merriam-Webster online]
Drawer. n. a sliding box or receptacle opened by pulling out and closed by pushing in [Merriam-Webster online]
Over the past years, at many collections digitization workshops, one’s head (or at least my head) can get turned around about neat idea this, or amazing technology that. It can get a little theoretical or perhaps speculative-science-fiction-y fast. But it begs the question: what are people doing in their collections, right now? What I have learned is that when it comes to pragmatic choices and space/money/efficiency, there is a lot of reason to be excited and to see, yes, confluences.
I hadn’t really realized how much digitzation solutions are beginning to converge until I saw Vince Smith give a (great!) presentation at iEvoBio 2011 on digitizing collections at the Natural History Museum London (NHML). I don’t want to over-paraphrase his talk, and the slides are excellent (from an earlier version of the talk: http://www.slideshare.net/vsmithuk/scalingup-collections-digitisation), but the gist of it was that at current rates, digitization would take a LONG time: thousands of years. So the folks at NHML are working with a company called SmartDrive. SmartDrive builds motorized cameras that move along a track above an object (such as a collections drawer), taking photos. Vince has been working with them to develop a system to photograph collections drawers at high resolution (more on the company’s hardware, software and approach here: http://www.smartdrive.co.uk/satscancollections.html) (note: not a pitch, just really interesting and great images of collections drawers!).
The important thing is that with this technology, high resolution, stitched-together images can be generated relatively quickly, scaling down the time it takes to image all the collections drawers from thousands of years to less than ten. This still leaves “snakes in jars” (see our previous S2I2 post) but we’ll come back to those at some point soon. What is intriguing is that rather than conflict, we are experiencing _confluence_ in an area where there has been a lot of wailing and gnashing of teeth about how we’ll likely end up with a billion (YES, a BILLION) different solutions.
So what about this “confluence?” While in Australia hanging out with good friend Paul Flemons (note: currently fu manchu-less), he showed me a similar set up at the Australia National Museum. Again, the idea is to image collections drawers, this time using very high resolution cameras (100 MegaPixels). Similar approaches using Gigapan (http://www.gigapan.org/) are being pioneered by Andy Deans at North Carolina State University (see their excellent and “insectlent” blog here). And Paul Tinerella at the University of Minnesota (who is almost 100% likely to be farmer-goatee-ed at this moment) is using a similar solution to first scan many slides of mounted insects en masse, and then automating the disassembly of these slides into single images of a specimen. The specifics of how the cameras move over the drawers or a set of slides may be different, but the general idea is the same:
Capture a drawer or slide collection quickly –> disassemble the IMAGE into pieces –> Capture labels –> Move data further downstream → etc.
Confluence. This is good.
So what does all this mean? Well, there are still challenges, especially for insects, where the specimen often occludes the label from top view. But assuming cameras can move all around specimens to generate photos, the answer is that there may be a fast method to capture LOTS of high resolution data in drawers. Since Andie is spending part of her summer looking at a thousands of little clams stuffed in such drawers, and Rob has even worked on similar clams in the collection he curates, and since there are hundreds of other collections folks doing the same thing, this is a big step forward.
What challenges remain? Tons. How are we going to unlock data from a 500MB image of a drawer and use those data most effectively? Argument: Data needs to be machine readable and properly documented to maximize its use and re-use. Period. Images are not so good for that! If biocollections data have further utility in new kinds of science, it likely relates to having the what, where, when (taxonomy, location, date) information readily available as simple text that is interoperable with other sources of environmental data. What is _excellent_ (so good I am waving my hands in the air with enthusiasm) is that many people are beginning to talk about similar solutions to this challenge of converting the image of a label to text. That is the subject of our next blog posting, but to presage it, we’ll just say two words: crowd sourcing. And two other words: Old Weather. See if you can connect the dots!