|Movement of books from medieval libraries in the MLGB3. Medieval locations (red), Current locations (blue)|
Today I'm teaching a workshop on using "screen scraping" in the digital humanities. No workshop is really useful without practical examples so last week I decided to try out my screen scraping chops on an exciting new database of book history data. The Kislak Center at Penn (where I'm Scholar in Residence) is quickly becoming one of the most important sites for book and manuscript provenance research and I wanted to see what I could do to highlight the potential for making extant provenance data more useful through new visualizations.
Several years ago, a few of the scholars behind the monumental Corpus of British medieval library catalogues project (now at fifteen volumes) led by Richard Sharpe began working on an online database to update and provide access to the wealth of information on medieval manuscripts contained in Neil Ker's Medieval Libraries of Great Britain (1941, 1964, and 1987). These volumes include accounts of books and manuscripts known to survive today which once were owned within Great Britain before the mid-16th century. Recently, through grants from the Mellon foundation and others, the team has taken much of this information and made it available online in the MLGB3 searchable database. The site appears to be in beta mode at the moment and intermittently accessible but when it launches fully it will be an amazing resource and the culmination of a good deal of work by Sharpe and others. Looking through the database I was especially intrigued by the wealth of data on the current location of many of these medieval books and manuscripts. Given how comprehensive and detailed the project data is, even at this stage, I wanted to get a sense of what kind of picture would develop if we looked at the points of origin and current location of all these manuscripts in aggregate.
As of last week, the MLGB3's online database included over 6,000 records for books and manuscripts owned by medieval libraries. In order to look at them in aggregate I used the ever-helpful wget utility to pull down each record in order. I was left with a gigantic mess of html with the useful data hidden within it. After extensive cleanup and parsing of the data I was able to throw the location names of the original medieval libraries as well as current owners against David Zwiefelhofer's geocoding service (which I believe uses the Yahoo API) to get longitudes and latitudes. This didn't go entirely smoothly as the names of ruined monasteries tend not to register very well in geo databases. Fortunately, there are a wealth of wikipedia entries providing detailed long./lat. information on a wide range of English historical sites and I was able to fill in the blanks.
|Libraries in Medieval Great Britain (MLGB3)|
|Current Locations of Books from the MLGB3|
|Worldwide Current Location of Books in MLGB3|
|Benedictine Abbey of St. Augustine, Canterbury|
|Benedictine Cathedral Priory of the Holy Trinity, Canterbury|
|Psalters in the MLGB3|
When the data are finalized I look forward to examining in detail what mapping can tell us about the differential fate of manuscripts from certain locations, or even certain kinds of manuscripts. For example see above for the relatively similar dispersal patterns of two Canterbury libraries or right for the dispersal patterns of psalters. Likewise, in the future I would love to combine the MLGB3 records with those in the Schoenberg Database of Manuscripts (SDBM) here at Penn. For instance, manuscripts from St. Augustine's in Canterbury feature in over 100 transaction records in the database. Similarly, the database staff here has entered over 3,200 manuscripts based on entries from Ker. I can imagine also how the fantastic resources within the MLGB3 project could be linked with extant digitized copies of the manuscripts mentioned. The one Penn manuscript noted in MLGB3 (ID 316, formerly Phillipps 20547 and Lea 23) comes from the church of St. Deiniol in Bangor and it would be fantastic to display the digital facsimile of the ex libris inscription alongside the entry. In other words, there's no more exciting place to be for linked digital humanities data than provenance and book history!
|UPenn Ms. Codex 75. Ownership inscription, f. 193v: "Iste liber pertinet Ecclesie sancti Daniellis"|