Thursday, November 6, 2014

Mapping pre-1600 European manuscripts in the U.S. and Canada

Pre-1600 European manuscripts in the United States and Canada (detail)

Today marks the beginning of the 7th annual Schoenberg Symposium on Manuscript Studies in the Digital Age here in Philadelphia. This year the symposium theme is "Collecting Histories" and features a line up of speakers discussing the ways in which provenance and the history of collecting informs our wider knowledge about manuscript culture. As readers of this blog know, I'm very much interested in the historical movement of books and manuscripts and I'm excited to speak during the conference on the ways in which the Schoenberg Database of Manuscripts (SDBM) can be used to track manuscripts over time.

For this post though I want to highlight the fantastic work done by a team of scholars whose work very much informs the SDBM project. Over the past two decades, Lisa Fagin Davis and Melissa Conway have worked to create a new directory for all institutions in the U.S. and Canada which hold European manuscripts dating to before 1600. They have published their own excellent description of the origins and methodology of the project but in short their work began as a way to update the censuses of American manuscripts created by Seymour de Ricci from 1935-40 and supplemented by Faye and Bond in 1962. Their census includes entries for 937 entities:  historical owners of manuscripts derived from previous censuses, the former names of institutions now renamed, as well as current holders. Running to 126 pages in a freely available PDF sponsored by the Bibliographical Society of America, the census is an incredibly helpful resource and I wanted to find a way to make the data contained within it browseable in a different way than just on the printed page.

Example of a listing from the Fagin Davis & Conway Census (p.37)
I extracted the text from the PDF census and chopped it up into relevant delimited fields like "Name" "Address" "Holdings" etc.  and then mapped the results using CartoDB. I had to make a few decisions about display along the way, especially when it came to how to determine the size of each manuscript owning dot on the map. Most institutions provided Fagin Davis and Conway with numbers for how many manuscript codices they held as well as how many leaves, documents, and scrolls were in their collection (though others reported only an aggregate number). Most institutions with full-fledged manuscript books had a fairly well-informed count of exactly how many they had but the numbers for leaves and documents often were estimated in larger round figures. As a result, the default map view gives all locations in the census with dots on their locations by number of total manuscripts held (leaves, codices, scrolls, documents, etc.). Using the "visible layers" dropdown you can turn off and on just those locations currently holding manuscripts or just those recorded in earlier censuses which no longer hold manuscripts or both together. Of course sizing the dots by total manuscript holdings will be necessarily a bit misleading as a university with 2 codices and 37 leaves appears to have total holdings of 39 manuscripts, so there is also an option in the "visible layers" menu to view only holdings of codices.

Unsurprisingly one can see the concentration of pre-1600 European manuscript holdings along the east coast. In a league table of manuscript holders New York, Washington, and Philadelphia(!) come out on top by volume but in terms of individual institutions the Huntington and Folger with their extensive holdings of pre-1600 documents come out on top.

Top-15 current owners of pre-1600 manuscripts by "total" count in the Fagin-Davis/Conway census
Given the fuzziness of this catch-all "total" manuscript number it's helpful to also get a sense of institutions by number of codices held:

Top 15 current owners of pre-1600 manuscript codices in the Fagin Davis/Conway census
One of the advantages of using the Fagin Davis and Conway survey is that it lists private collections, and in the cases where these were dispersed or relocated, notes their current location. I don't think it would be terribly controversial to say that most collections of medieval manuscripts in the U.S. and Canada rest on substantial gifts from individual collectors or families. The remarkable extent of these private collections can be seen in part below:

Collections of pre-1600 manuscripts now identified as being relocated in the Fagin Davis/Conway census
Top 15 now-relocated collections of pre-1600 manuscript codices in the census

It's edifying to see the late Larry Schoernberg at the top of the list of codices, especially today during the conference celebrating his legacy. His manuscripts are now here at Penn but a decade ago when they were in Longboat Key, Florida they made that small community the largest holder of pre-1600 manuscript codices in the south. Others on that list will be familiar to many, including George Plimpton whose manuscripts are now largely at Columbia University and Thomas Marston whose collection is at the Beinecke, and Ricketts, whose collection is now mostly at the Lilly library.

Saturday, June 28, 2014

Tracking the Rare Book and Manuscript market

This past week I attended the annual conference of the Rare Book and Manuscript Section of the Association of College and Research Libraries. It's fantastic to be around so many wonderful book people and hear their take on the state of the field. As part of the program, RBMS hosted panel on "the market" with Nina Musinsky and other members of the trade and library world. Seeing the plenary and Musinsky's talk reminded me that I'd started several months ago to make sense of some data on 2013 book and manuscript auction sales but never finished.

On January 1st this year, the collector services site Americana Exchange (AE) posted a list of the "top 500" auction results by price for books and manuscripts for the previous year based on their valuable in-house data. I thought I'd clean up and parse this data a bit and try to make some sense from it. The AE's table makes it easy to see the list by value, capped off by the Bay Psalm Book which sold for $14 million at Sotheby's. I wanted to get a sense though of the field as a whole. First off, while I was unsurprised that Sotheby's and Christie's dominated the field in terms of auction houses selling top lots, I was impressed by the fact that 48 different auction houses were represented over all 500 lots!
Top 10 Auction houses in 2013 by number of the top 500 lots sold.

I then thought I'd look a bit at the age of the items being sold in the market - was the 20th century the hottest? The 16th? After a bit of cleanup I assigned dates to 497 of the 500 items and plotted them out.

Number of items in top 500 by century
There are no huge surprises in the above table with the 19th and 20th centuries responsible for the majority of the top value book and manuscript auction sales with the 17th century the poor relative in the printed-book era. The list is of course worth looking at carefully in comparison with the numbers, you'll see, for example, that a sale of comic books at Heritage Auctions really boosted the number of items from the 20th century.
Total value (US$) of items in top 500 by century
The twentieth century also fares well in looking at the total value of all lots by century, but you'll see the 17th century recovers thanks largely to the Bay Psalm Book whose high price compensates for lower total sales from the period.
Average price (US$) earned by items in top 500 by century
In looking at averages, medieval manuscripts, though numerically fewer on the list, shine through thanks to their higher per-item value. You'll see of course that the Bay Psalm Book is responsible for that inflated 17th century average.

The AE data also includes information on auction house estimates which provides an interesting window on which items blew away expectations (or which had artificially low estimates). I've divided the final sales price by the low estimate to get a kind of 'estimate factor' by which lots overperformed.

Top 10 auction lots of 2013 by how much they exceeded their low estimate

The two top expectation-beaters are illustratively quite different. The top lot, a 1555 first edition of the works of Louise Charly Labé was given an estimate of $3,000-$5,000 at Sotheby's (New York) on 11 June 2013, instead it sold for an astounding $485,000. Where the Labé volume is a beautiful 16th c. letterpress book with a fantastic binding, the second highest performer, sold just eight days later, couldn't be more different. The ugly little pamphlet on the right dates from 1937 and contains a printed offprint of "On Computable Numbers," one of Alan Turing's seminal articles. It was offered at Bonham's in London June 19th for the modest estimate of £3,000-5,000 and instead achieved a whopping £205,250 ($349,591). One of maybe fewer than 100 offprints of the article, this one is inscribed by Turing to a Cambridge philosopher and is clear evidence of the enormous appetite for items related to the history of computing.

Works of Louise Charly Labé
(Lyon, 1555) [USTC 1135
Turing's "On Computable Numbers,
with an Application to the Entscheidungsproblem"
Proceedings of the London
Mathematical Society

It's nice to see this juxtaposition which shows two of the many sides of the collecting market. Early printed books like the 1555 Labé continue to do well both for their physical beauty as well as their historical importance (one of the most important early printed compilations of a female poet) while the Turing offprint demonstrates the power and interest of a cohort of collectors attracted by the recent history of science and computing. Both are historically significant material and intellectual objects and I think pretty compelling evidence for why it's a great time to be working in the Rare Book and Manuscript field.  

My data set, based on that of AE but with my addition of dates and the 'estimate factor' can be found here.

Friday, January 24, 2014

Charting Former Owners of Penn's Codex Manuscripts

Today is the American Library Association midwinter meeting LibHackathon here at the Penn Libraries. I thought I'd share a project using library data that I've been working on for a little while now in the hopes that it will be not only useful to scholars but also might generate some conversation over how libraries and archives distribute their valuable descriptive information.

In short, this piece is all about how we get to this:

Network diagram of Penn codex manuscripts and former owners
From this:

MARC record for UPenn Ms. Codex 465

Over the years and especially here at Penn I've been fortunate enough to work with a number of catalogers in both special and general collections. I can't think of a more under-appreciated part of the scholarly community. I've seen first-hand how much time, energy, and bibliographic skill goes into the description of texts and objects of all kinds. I've heard heated debates over whether one piece of information or another should go into one of the million-and-one MARC fields. What comes out of the other side of this process should be a goldmine of easily usable truly 'big' bibliographic data. Instead, I think it's safe to say that 99% of library users have no idea why one might want to search the 752 field instead of the 260 field for place of publication. Moreover, this is hardly the sole fault of users. Try searching any library online catalog for just information from subfield c of field 300 and see how far you get! So much structured data ignored and thousands of hours of cataloger effort hidden from the world [1].

Fortunately the data is there if you know how to find it [2]! I've been playing around with our catalog data at Penn for a while now and decided a few weeks ago that I wanted an easy way to visually display networks of provenance in our manuscript collection. Penn has a deep commitment to provenance and book history and for my money our catalogers have done some of the richest work in describing provenance of any manuscript collection I've seen. The Kislak Center here at the Penn Libraries currently has cataloged around 1,640 codex manuscripts (manuscripts bound in book form) as well as around 300 codex manuscripts from the Lawrence J. Schoenberg collection [3]. I knew from experience that most of these had detailed descriptions of former ownership in their online catalog records and it seemed reasonable to just download them all and make a quick visualization of who owned which manuscripts in common.

I realize now that this task would have been near to impossible at most libraries where the online catalogs and back-end databases don't easily allow public users to batch download full records. Fortunately at Penn all of our catalog records are available in MARC-XML form which looks something like this:

I knew that our catalogers were keen on including structured data about former owners in the 700 field with a "former owner" phrase after their name. It was easy enough to download a list of all of the manuscripts that possessed this field. Then, after some much needed coaching from Dot Porter, the Kislak Center's XML guru and medievalist extraordinaire, I was able to write an XSL transformation which would spit out just what I wanted. At first glance though, I didn't turn up nearly as many results as I'd hoped and I seemed to be missing a lot of data. Looking through the records I saw that, on the plus side, the 700 field was highly structured with authorized name headings but didn't always incorporate all of the rich narrative textual information in the 561 field (labeled "provenance" in our public catalog.  For example, an owner like Sir Thomas Phillipps would have his name included in the 700 field but the auction house which sold the manuscript would appear only in the 561. This is for very good reasons ("Sotheby's" is rarely a "former owner") but I really wanted to know everything about a text so I moved on to extracting every 561 field from the manuscripts. Instead of nice, neat authorized names, I of course got a lot of fascinating narrative:

Provenance note for UPenn Ms. Codex 234
I broke each of these lines of narrative into sentences and began the arduous work of identifying each owner in a chain of provenance uniquely. After some maddening time using OpenRefine, regular expressions, and plain copying and pasting I got a list I was happy with. In the end I came up with 3,252 manuscript/provenance pairs, like so:

1,285 of our 1,640 odd codices (including two ms. rolls, because: why not) had at least some provenance data recorded as well as an additional 265 of the 311 Schoenberg manuscripts we've cataloged. Out of these I was able to identify 985 "unique" entities through whose hands our manuscripts had passed. More interestingly, 225 of these owners had formerly been in possession of two or more of our manuscripts.

Past possessors of Penn's manuscript codices in yellow with individual manuscripts in grey. (Gephi network diagram rendered with sigma.js).[Full Screen View]

The historical strengths of our collection and Penn's institutional history can be seen pretty clearly here at  the center of the cluster. Our codices primarily come from European and American collections as mediated by the prominent dealers and auction houses of London, New York, Philadelphia, Paris,Florence, and Munich. In addition we have received several very large collections over the years including the Gondi-Medici collection via the dealer Bernard Rosenthal and the recent gift of the Lawrence J. Schoenberg  collection.

Center Cluster showing a variety of donors, bookdealers, and auction houses

Thursday, January 2, 2014

Linking Archival Sources in the 2013 AHR

With the annual conference of the American Historical Association (AHA) starting today I'm excited to see friends and hear some great papers. I'm always struck by just how broad a field 'history' represents but yet how often historians are able to make connections to each others work, even when far removed temporally and geographically. In reading the AHA's flagship journal, The American Historical Review (AHR) this year I especially enjoyed seeing places where seemingly unconnected articles spoke from similar frames of reference, and most interestingly, from overlapping source bases (be sure to check out my Penn colleague Vanessa Ogle's great article on the history of time reform!).

Authors of articles in the 2013 AHR connected by commonly used archives

As this site indicates, I'm very interested in tracking the circulation of texts, ideas, and archives over time as well as how these sources are used by scholars. Tracking networks of citation is nothing new and has been a favorite activity of scholars for centuries but recently there's been a surge of interest in quantitative analysis of academic citation patterns. Most of this interest has been in the sciences and social sciences where "impact factors" (put simply, the quantity and importance of articles citing one's work) are de rigueur in weighing scholarly merit. Though I'm wary of many of the developments in this "bibliometrics" field, some of the more useful advances have been in using data about authorship and citation to show the material ways fields are constructed, i.e. the influence of certain universities, graduate programs, or scholars in a specific sub-discipline. Here at Penn for instance, my colleagues at the library have helped the school of Medicine and others to create a way for viewing co-authorship networks of particular researchers.

Though tracking citation of articles and secondary sources in a journal like the AHR would really illuminate networks of influence, interest, and argument, I'm more interested in how historians use archival sources. This is especially important given that the bibliometric wizards at big publishing companies like Elsevier and Proquest have done a decent job at figuring out article and book citations and linking them together, but with much less success with archival sources.

I extracted data on archival sources from 16 of the 17 feature articles in the five AHR issues for 2013 [1]. The authors of these pieces did not disappoint, citing 66 different archives and libraries located in 54 different cities from Berkeley to Sarajevo to Zanzibar [2].

Location of Archives and Libraries cited in 2013 AHR articles [Interactive map]
Despite disparate topics and the relatively random assortment of scholars and articles across the year's issues (as far as I can tell none of the articles were grouped in 'theme' issues) there were several nodes of archival overlap.

Archives used by multiple 2013 AHR authors

Obviously one year of the AHR is a pretty weak sample but I suspect the pattern established would hold across a wider swath of the journal - i.e. an impressive array of geographically dispersed archives based on the focus of particular authors as well as a concentration of overlapping citation from the major state and university archives and libraries of Europe and North America. Along these lines I would be curious to see how the influence of particular archives have waxed and waned over the years in the profession, I imagine that a select number of repositories (NARA, the UK national archives, the British Library, Library of Congress, the BN in Paris, various German archives, etc.) have long been dominant across geographic and temporal fields given the institutional makeup of the historical profession but I would also be surprised if the dominance of these central archives haven't decreased given methodological and theoretical shifts in the discipline since the 1970s.