SYTYCD would like to welcome guest blog co-author Gaurav Vaidya.
A week ago, we told you about our cunning plan to play around with annotating and publishing one transcribed notebook of Junius Henderson’s field notes. We’ve had two big successes in the last seven days, which is not bad for
soul-crushing finals and project deadline week the holiday season.
Success #1: YOU GUYS, the internet is amazing. Within half an hour of posting our last post, we were contacted by Dena Smith and Kathy Hollis, who alerted us to the existence of scans of Henderson’s notebooks — remember, we only had transcribed text files when we started! This started a chain of events that put us in touch with two folks from the National Snow and Ice Data Center (NSIDC): Allaina Wallace, librarian and analog data archivist, and Ruth Duerr, Manager of Data Stewardship. Less than 24 hours later, Rob and Gaurav had a productive meeting at NSIDC offices in east Boulder, and a DVD containing all the scans. This DVD included three notebooks we hadn’t known about, two of which cover Henderson’s travels between 1927 and 1936 — adding another decade to his life on the road — AND were accompanied by more of Peter Robinson’s transcriptions.
Success #2: Having the scans made a huge impact on what we were able to do with the text. In particular, Gaurav has made headway in using WikiSource as a platform for maximal use and re-use of Henderson notes. WikiSource is “an online digital library of free content textual sources on a wiki, operated by the Wikimedia Foundation” (i.e. Wikipedia). Uploading the scan of Henderson’s first notebook to the Wikimedia Commons was easy: these are now available as PDF or DjVu files. Once the scans were in the Commons, Gaurav created an Index page (following instructions on the Beginner’s Guide to Index: files and the Introduction to Proofreading on WikiSource). The Index maps pages from the scanned DjVu file to pages on WikiSource. Click on a yellow-coloured page number to proofread or edit an existing page (for example, page 3), or on a red-coloured page to transcribe it. Transcription itself is dead-easy: the page image is displayed on the right, and a textbox (which accepts all MediaWiki syntax) is displayed on the left. In our case, since we have the transcriptions “done”, it was mostly cutting and pasting sections of Peter’s transcribed text so that it aligned with Henderson’s scrawl on the scanned pages.
So yay, successes!! The fruits of a week’s worth of work are available on the “Notebook 1″ page on WikiSource, where — using WikiSource’s <pages /> command — Gaurav mapped pages from the scanned DjVu file to pages on WikiSource. Numbers along the left margin of the main page link back to the corresponding page from the Index, making it easy to verify or fix transcription errors. Also, Gaurav compiled pages from the Index into sections representing field trips (just as Henderson did in his notes), and listed them in a “Contents” box at the top of the page.
Henderson’s field notes continue to be, first and foremost, a good read. “Notebook 1” features details from Henderson’s week-long trips to Florissant, Colorado (August 1905) and Silver Lake Arapahoe (September 1905). He keeps record of everything from the stamina of his comrades:
“The party showed fatigue in the following order: Sievert least, I next, then Watts, Then Markman, then Frank.” (August 30, 1905)
to train delays and opportunities for rumination:
“Train again so late as to afford ample opportunity for philosophic meditation upon the motives which inspire railroad people to advertise time which they do not expect to make except under rare circumstances.” (September 3, 1905)
What next? Our sense of what we want to do and what is possible is rapidly evolving. Simply having the scanned field notebook pages completely changed our game plan. Before Wednesday of this week, we just had transcriptions. Now we have the whole enchilada. What we currently want is a no-cost, minimal effort system that will make scans AND transcriptions AND annotations available, and that can facilitate text mining of the transcriptions. Do we have that in WikiSource? We will see. More on annotations to follow in our next post but some father to a sister of some thoughts are already percolating and we have even implemented some rudimentary examples.
We’d like to encourage you to try your hand at transcribing or annotating this notebook along with us, and to let us know what you think about the process (reminder: Henderson’s first field notebook is still available as plain text or as a Word document). As on Wikipedia, all edits are saved, so you can’t really mess up – be bold, jump in (!) and tell us what you think.