Field Note Challenge Part 2: Veni, Vidi, Wiki

SYTYCD would like to welcome guest blog co-author Gaurav Vaidya.

A week ago, we told you about our cunning plan to play around with annotating and publishing one  transcribed notebook of Junius Henderson’s field notes. We’ve had two big successes in the last seven days, which is not bad for soul-crushing finals and project deadline week the holiday season.

Success #1:  YOU GUYS, the internet is amazing.  Within half an hour of posting our last post, we were contacted by Dena Smith and Kathy Hollis, who alerted us to the existence of scans of Henderson’s notebooks — remember, we only had transcribed text files when we started!  This started a chain of events that put us in touch with two folks from the National Snow and Ice Data Center (NSIDC): Allaina Wallace, librarian and analog data archivist, and Ruth Duerr, Manager of Data Stewardship.  Less than 24 hours later, Rob and Gaurav had a productive meeting at NSIDC offices in east Boulder, and a DVD containing all the scans.  This DVD included three notebooks we hadn’t known about, two of which cover Henderson’s travels between 1927 and 1936 — adding another decade to his life on the road — AND were accompanied by more of Peter Robinson’s transcriptions.

From the field notes of Junius Henderson, Notebook 1

From the field notes of Junius Henderson, Notebook 1

Success #2:  Having the scans made a huge impact on what we were able to do with the text.  In particular, Gaurav has made headway in using WikiSource as a platform for maximal use and re-use of Henderson notes. WikiSource is “an online digital library of free content textual sources on a wiki, operated by the Wikimedia Foundation” (i.e. Wikipedia). Uploading the scan of Henderson’s first notebook to the Wikimedia Commons was easy: these are now available as PDF or DjVu files. Once the scans were in the Commons, Gaurav created an Index page (following instructions on the Beginner’s Guide to Index: files and the Introduction to Proofreading on WikiSource). The Index maps pages from the scanned DjVu file to pages on WikiSource. Click on a yellow-coloured page number to proofread or edit an existing page (for example, page 3), or on a red-coloured page to transcribe it.   Transcription itself is dead-easy: the page image is displayed on the right, and a textbox (which accepts all MediaWiki syntax) is displayed on the left.  In our case, since we have the transcriptions “done”, it was mostly cutting and pasting sections of Peter’s transcribed text so that it aligned with Henderson’s scrawl on the scanned pages.

So yay, successes!!  The fruits of a week’s worth of work are available on the “Notebook 1″ page on WikiSource, where — using WikiSource’s <pages /> command — Gaurav mapped pages from the scanned DjVu file to pages on WikiSource.  Numbers along the left margin of the main page link back to the corresponding page from the Index, making it easy to verify or fix transcription errors.  Also, Gaurav compiled pages from the Index into sections representing field trips (just as Henderson did in his notes), and listed them in a “Contents” box at the top of the page.

Henderson’s field notes continue to be, first and foremost, a good read. “Notebook 1” features details from Henderson’s week-long trips to Florissant, Colorado (August 1905) and Silver Lake Arapahoe (September 1905). He keeps record of everything from the stamina of his comrades:

“The party showed fatigue in the following order: Sievert least, I next, then Watts, Then Markman, then Frank.” (August 30, 1905)

to train delays and opportunities for rumination:

“Train again so late as to afford ample opportunity for philosophic meditation upon the motives which inspire railroad people to advertise time which they do not expect to make except under rare circumstances.” (September 3, 1905)

What next?  Our sense of what we want to do and what is possible is rapidly evolving.  Simply having the scanned field notebook pages completely changed our game plan.  Before Wednesday of this week, we just had transcriptions.  Now we have the whole enchilada.  What we currently want is a no-cost, minimal effort system that will make scans AND transcriptions AND annotations available, and that can facilitate text mining of the transcriptions.  Do we have that in WikiSource?  We will see.  More on annotations to follow in our next post but some father to a sister of some thoughts are already percolating and we have even implemented some rudimentary examples.

We’d like to encourage you to try your hand at transcribing or annotating this notebook along with us, and to let us know what you think about the process (reminder: Henderson’s first field notebook is still available as plain text or as a Word document).  As on Wikipedia, all edits are saved, so you can’t really mess up – be bold, jump in (!) and tell us what you think.

About Andrea

Andrea is a Ph.D. student in Library and Information Science at the University of Illinois at Urbana-Champaign, and is supported by the Center for Informatics Research in Science and Scholarship. Her research interests include text mining; scholarly communication; data curation; biodiversity, phylogenetic and natural history museum informatics; and mining and making available undiscovered public knowledge. She is particularly interested in information extraction from natural history field notes and texts, and improving methods of digitizing and publishing data about the world's 3–4 billion museum specimen records so they can be used to better model evolutionary and ecological processes.
This entry was posted in crowdsourcing, field notes, Henderson Project, projects. Bookmark the permalink.

11 Responses to Field Note Challenge Part 2: Veni, Vidi, Wiki

  1. Paul Flemons says:

    Nice work guys – very impressed. Just regarding your links – they were all done manually? I will have a play.

  2. Andie says:

    YES, for now, but next week/post we’re gonna be talking about some of our experiments with more automated ways of making annotations. We (read: Gaurav) have had a lot to figure out with how to best include these annotations in WikiSource….

  3. Paul Flemons says:

    What was the extent of coding required by Gaurav to set this up, if any? The functionality is simple and easy and seems to all work well on an IPad. Any chance of publishing the process on your blog? The discussion page seems to be very slow and not sure what it’s seeking to achieve – ideally would be good to if it facilitated discussion around transcription of pages.

    • Rob says:

      I should let Gaurav reply but I think the answer is “no coding” just utilizing existing Wiki-tools. Although I shouldn’t hazard that much of a guess, I will anyway and say that Gaurav put in maybe 10 hours on this total and I “helped” with transcriptions and annotations for another 2 or 3. We spent a bit of time together trying to determine if there was a better way to get transcriptions to match to page images besides cut and paste. And Gaurav probably also spent a bit of time sighing and explaining wiki mark up to me.

    • Paul: No coding required! The trickiest bit is creating the Index: page; I’ve just cleaned up the Beginner’s guide to Index: files, so hopefully it’s quite easy to replicate. Transcluding the transcribed file into a single page (as we do at the Field Notes/Notebook 1 page is a little tricker, but quite straightforward once you figure out how it works (I’ll put that into the “beginner’s guide” in a day or two). That’s pretty much all we did last week!

    • Re: discussions — yeah, there isn’t a straightforward place to do it on WikiSource at the moment, since there are so many talk pages where this discussion might happen! Hopefully, a consensus will work out in which most people are using a particular talk page for particular types of discussion. Meanwhile, per-page discussion pages seem to me to be most valuable in transcribing, correcting transcription errors, and deciding how to annotate that particular page, although there’s no straightforward way to monitor all the per-page discussion pages for an Index file that I know of.

  4. I left a pretty detailed digression on my blog, but wanted to ask if you’d also thought about cropping and embedding the illustrations in the transcription? ProofreadPage can do that, as you can see in the maps in the exemplary German Wikisource transcription of F. W. Winkler’s Bemerkungen über den Feldzug gegen Rußland in den Jahren 1812 und 1813

    • Rob says:

      That is a great idea, Ben. We’ll have to add that to our list of experiments. Loved the post on “manuscripttranscription” and we have some ideas about indexing, which we 100% agree is absolutely key. We’ll be leaving some feedback on your post ASAP!

    • Hi Ben! No, this isn’t something we’ve discussed very much amongst ourselves, so thanks for the idea!

      I’m a little confused by your example, though: it looks to me like those maps are coming straight out of the Commons, like any other image? I think for the few illustrations Henderson put into his notebooks (for instance, Notebook 1 has exactly three illustrations at all!), there probably isn’t a very good reason to upload them to the Commons separately: they only really make sense in the context of the notebooks. So getting Proofread Page to insert the page image into the text (such as by transcluding the page image from Proofread Page) might be the best way to add illustrations!

      Another possibility would be to illustrate this work using actual photographs taken by Henderson on his travels, available on the Glacier Photograph Collection!

  5. A very well-written post. I read and liked the post and have also bookmarked you. All the best for future endeavors

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s