Saturday, June 28, 2014

Tracking the Rare Book and Manuscript market

This past week I attended the annual conference of the Rare Book and Manuscript Section of the Association of College and Research Libraries. It's fantastic to be around so many wonderful book people and hear their take on the state of the field. As part of the program, RBMS hosted panel on "the market" with Nina Musinsky and other members of the trade and library world. Seeing the plenary and Musinsky's talk reminded me that I'd started several months ago to make sense of some data on 2013 book and manuscript auction sales but never finished.

On January 1st this year, the collector services site Americana Exchange (AE) posted a list of the "top 500" auction results by price for books and manuscripts for the previous year based on their valuable in-house data. I thought I'd clean up and parse this data a bit and try to make some sense from it. The AE's table makes it easy to see the list by value, capped off by the Bay Psalm Book which sold for $14 million at Sotheby's. I wanted to get a sense though of the field as a whole. First off, while I was unsurprised that Sotheby's and Christie's dominated the field in terms of auction houses selling top lots, I was impressed by the fact that 48 different auction houses were represented over all 500 lots!
Top 10 Auction houses in 2013 by number of the top 500 lots sold.

I then thought I'd look a bit at the age of the items being sold in the market - was the 20th century the hottest? The 16th? After a bit of cleanup I assigned dates to 497 of the 500 items and plotted them out.

Number of items in top 500 by century
There are no huge surprises in the above table with the 19th and 20th centuries responsible for the majority of the top value book and manuscript auction sales with the 17th century the poor relative in the printed-book era. The list is of course worth looking at carefully in comparison with the numbers, you'll see, for example, that a sale of comic books at Heritage Auctions really boosted the number of items from the 20th century.
Total value (US$) of items in top 500 by century
The twentieth century also fares well in looking at the total value of all lots by century, but you'll see the 17th century recovers thanks largely to the Bay Psalm Book whose high price compensates for lower total sales from the period.
Average price (US$) earned by items in top 500 by century
In looking at averages, medieval manuscripts, though numerically fewer on the list, shine through thanks to their higher per-item value. You'll see of course that the Bay Psalm Book is responsible for that inflated 17th century average.

The AE data also includes information on auction house estimates which provides an interesting window on which items blew away expectations (or which had artificially low estimates). I've divided the final sales price by the low estimate to get a kind of 'estimate factor' by which lots overperformed.

Top 10 auction lots of 2013 by how much they exceeded their low estimate

The two top expectation-beaters are illustratively quite different. The top lot, a 1555 first edition of the works of Louise Charly Labé was given an estimate of $3,000-$5,000 at Sotheby's (New York) on 11 June 2013, instead it sold for an astounding $485,000. Where the Labé volume is a beautiful 16th c. letterpress book with a fantastic binding, the second highest performer, sold just eight days later, couldn't be more different. The ugly little pamphlet on the right dates from 1937 and contains a printed offprint of "On Computable Numbers," one of Alan Turing's seminal articles. It was offered at Bonham's in London June 19th for the modest estimate of £3,000-5,000 and instead achieved a whopping £205,250 ($349,591). One of maybe fewer than 100 offprints of the article, this one is inscribed by Turing to a Cambridge philosopher and is clear evidence of the enormous appetite for items related to the history of computing.

Works of Louise Charly Labé
(Lyon, 1555) [USTC 1135
Turing's "On Computable Numbers,
with an Application to the Entscheidungsproblem"
Proceedings of the London
Mathematical Society

It's nice to see this juxtaposition which shows two of the many sides of the collecting market. Early printed books like the 1555 Labé continue to do well both for their physical beauty as well as their historical importance (one of the most important early printed compilations of a female poet) while the Turing offprint demonstrates the power and interest of a cohort of collectors attracted by the recent history of science and computing. Both are historically significant material and intellectual objects and I think pretty compelling evidence for why it's a great time to be working in the Rare Book and Manuscript field.  

My data set, based on that of AE but with my addition of dates and the 'estimate factor' can be found here.

Friday, January 24, 2014

Charting Former Owners of Penn's Codex Manuscripts

Today is the American Library Association midwinter meeting LibHackathon here at the Penn Libraries. I thought I'd share a project using library data that I've been working on for a little while now in the hopes that it will be not only useful to scholars but also might generate some conversation over how libraries and archives distribute their valuable descriptive information.

In short, this piece is all about how we get to this:

Network diagram of Penn codex manuscripts and former owners
From this:

MARC record for UPenn Ms. Codex 465

Over the years and especially here at Penn I've been fortunate enough to work with a number of catalogers in both special and general collections. I can't think of a more under-appreciated part of the scholarly community. I've seen first-hand how much time, energy, and bibliographic skill goes into the description of texts and objects of all kinds. I've heard heated debates over whether one piece of information or another should go into one of the million-and-one MARC fields. What comes out of the other side of this process should be a goldmine of easily usable truly 'big' bibliographic data. Instead, I think it's safe to say that 99% of library users have no idea why one might want to search the 752 field instead of the 260 field for place of publication. Moreover, this is hardly the sole fault of users. Try searching any library online catalog for just information from subfield c of field 300 and see how far you get! So much structured data ignored and thousands of hours of cataloger effort hidden from the world [1].

Fortunately the data is there if you know how to find it [2]! I've been playing around with our catalog data at Penn for a while now and decided a few weeks ago that I wanted an easy way to visually display networks of provenance in our manuscript collection. Penn has a deep commitment to provenance and book history and for my money our catalogers have done some of the richest work in describing provenance of any manuscript collection I've seen. The Kislak Center here at the Penn Libraries currently has cataloged around 1,640 codex manuscripts (manuscripts bound in book form) as well as around 300 codex manuscripts from the Lawrence J. Schoenberg collection [3]. I knew from experience that most of these had detailed descriptions of former ownership in their online catalog records and it seemed reasonable to just download them all and make a quick visualization of who owned which manuscripts in common.

I realize now that this task would have been near to impossible at most libraries where the online catalogs and back-end databases don't easily allow public users to batch download full records. Fortunately at Penn all of our catalog records are available in MARC-XML form which looks something like this:

I knew that our catalogers were keen on including structured data about former owners in the 700 field with a "former owner" phrase after their name. It was easy enough to download a list of all of the manuscripts that possessed this field. Then, after some much needed coaching from Dot Porter, the Kislak Center's XML guru and medievalist extraordinaire, I was able to write an XSL transformation which would spit out just what I wanted. At first glance though, I didn't turn up nearly as many results as I'd hoped and I seemed to be missing a lot of data. Looking through the records I saw that, on the plus side, the 700 field was highly structured with authorized name headings but didn't always incorporate all of the rich narrative textual information in the 561 field (labeled "provenance" in our public catalog.  For example, an owner like Sir Thomas Phillipps would have his name included in the 700 field but the auction house which sold the manuscript would appear only in the 561. This is for very good reasons ("Sotheby's" is rarely a "former owner") but I really wanted to know everything about a text so I moved on to extracting every 561 field from the manuscripts. Instead of nice, neat authorized names, I of course got a lot of fascinating narrative:

Provenance note for UPenn Ms. Codex 234
I broke each of these lines of narrative into sentences and began the arduous work of identifying each owner in a chain of provenance uniquely. After some maddening time using OpenRefine, regular expressions, and plain copying and pasting I got a list I was happy with. In the end I came up with 3,252 manuscript/provenance pairs, like so:

1,285 of our 1,640 odd codices (including two ms. rolls, because: why not) had at least some provenance data recorded as well as an additional 265 of the 311 Schoenberg manuscripts we've cataloged. Out of these I was able to identify 985 "unique" entities through whose hands our manuscripts had passed. More interestingly, 225 of these owners had formerly been in possession of two or more of our manuscripts.

Past possessors of Penn's manuscript codices in yellow with individual manuscripts in grey. (Gephi network diagram rendered with sigma.js).[Full Screen View]

The historical strengths of our collection and Penn's institutional history can be seen pretty clearly here at  the center of the cluster. Our codices primarily come from European and American collections as mediated by the prominent dealers and auction houses of London, New York, Philadelphia, Paris,Florence, and Munich. In addition we have received several very large collections over the years including the Gondi-Medici collection via the dealer Bernard Rosenthal and the recent gift of the Lawrence J. Schoenberg  collection.

Center Cluster showing a variety of donors, bookdealers, and auction houses

Thursday, January 2, 2014

Linking Archival Sources in the 2013 AHR

With the annual conference of the American Historical Association (AHA) starting today I'm excited to see friends and hear some great papers. I'm always struck by just how broad a field 'history' represents but yet how often historians are able to make connections to each others work, even when far removed temporally and geographically. In reading the AHA's flagship journal, The American Historical Review (AHR) this year I especially enjoyed seeing places where seemingly unconnected articles spoke from similar frames of reference, and most interestingly, from overlapping source bases (be sure to check out my Penn colleague Vanessa Ogle's great article on the history of time reform!).

Authors of articles in the 2013 AHR connected by commonly used archives

As this site indicates, I'm very interested in tracking the circulation of texts, ideas, and archives over time as well as how these sources are used by scholars. Tracking networks of citation is nothing new and has been a favorite activity of scholars for centuries but recently there's been a surge of interest in quantitative analysis of academic citation patterns. Most of this interest has been in the sciences and social sciences where "impact factors" (put simply, the quantity and importance of articles citing one's work) are de rigueur in weighing scholarly merit. Though I'm wary of many of the developments in this "bibliometrics" field, some of the more useful advances have been in using data about authorship and citation to show the material ways fields are constructed, i.e. the influence of certain universities, graduate programs, or scholars in a specific sub-discipline. Here at Penn for instance, my colleagues at the library have helped the school of Medicine and others to create a way for viewing co-authorship networks of particular researchers.

Though tracking citation of articles and secondary sources in a journal like the AHR would really illuminate networks of influence, interest, and argument, I'm more interested in how historians use archival sources. This is especially important given that the bibliometric wizards at big publishing companies like Elsevier and Proquest have done a decent job at figuring out article and book citations and linking them together, but with much less success with archival sources.

I extracted data on archival sources from 16 of the 17 feature articles in the five AHR issues for 2013 [1]. The authors of these pieces did not disappoint, citing 66 different archives and libraries located in 54 different cities from Berkeley to Sarajevo to Zanzibar [2].

Location of Archives and Libraries cited in 2013 AHR articles [Interactive map]
Despite disparate topics and the relatively random assortment of scholars and articles across the year's issues (as far as I can tell none of the articles were grouped in 'theme' issues) there were several nodes of archival overlap.

Archives used by multiple 2013 AHR authors

Obviously one year of the AHR is a pretty weak sample but I suspect the pattern established would hold across a wider swath of the journal - i.e. an impressive array of geographically dispersed archives based on the focus of particular authors as well as a concentration of overlapping citation from the major state and university archives and libraries of Europe and North America. Along these lines I would be curious to see how the influence of particular archives have waxed and waned over the years in the profession, I imagine that a select number of repositories (NARA, the UK national archives, the British Library, Library of Congress, the BN in Paris, various German archives, etc.) have long been dominant across geographic and temporal fields given the institutional makeup of the historical profession but I would also be surprised if the dominance of these central archives haven't decreased given methodological and theoretical shifts in the discipline since the 1970s.

Tuesday, November 12, 2013

The Dispersal of the Medieval Libraries of Great Britain

Movement of books from medieval libraries in the MLGB3. Medieval locations (red), Current locations (blue)

Today I'm teaching a workshop on using "screen scraping" in the digital humanities. No workshop is really useful without practical examples so last week I decided to try out my screen scraping chops on an exciting  new database of book history data. The Kislak Center at Penn (where I'm Scholar in Residence) is quickly becoming one of the most important sites for book and manuscript provenance research and I wanted to see what I could do to highlight the potential for making extant provenance data more useful through new visualizations.

Several years ago, a few of the scholars behind the monumental Corpus of British medieval library catalogues project (now at fifteen volumes) led by Richard Sharpe began working on an online database to update and provide access to the wealth of information on medieval manuscripts contained in Neil Ker's Medieval Libraries of Great Britain (1941, 1964, and 1987). These volumes include accounts of books and manuscripts known to survive today which once were owned within Great Britain before the mid-16th century. Recently, through grants from the Mellon foundation and others, the team has taken much of this information and made it available online in the MLGB3 searchable database. The site appears to be in beta mode at the moment and intermittently accessible but when it launches fully it will be an amazing resource and the culmination of a good deal of work by Sharpe and others. Looking through the database I was especially intrigued by the wealth of data on the current location of many of these medieval books and manuscripts. Given how comprehensive and detailed the project data is, even at this stage, I wanted to get a sense of what kind of picture would develop if we looked at the points of origin and current location of all these manuscripts in aggregate.

As of last week, the MLGB3's online database  included over 6,000 records for books and manuscripts owned by medieval libraries. In order to look at them in aggregate I used the ever-helpful wget utility to pull down each record in order. I was left with a gigantic mess of html with the useful data hidden within it. After extensive cleanup and parsing of the data I was able to throw the location names of the original medieval libraries as well as current owners against David Zwiefelhofer's geocoding service (which I believe uses the Yahoo API) to get longitudes and latitudes. This didn't go entirely smoothly as the names of ruined monasteries tend not to register very well in geo databases. Fortunately, there are a wealth of wikipedia entries providing detailed long./lat. information on a wide range of English historical sites and I was able to fill in the blanks.

Libraries in Medieval Great Britain (MLGB3)
Current Locations of Books from the MLGB3

Worldwide Current Location of Books in MLGB3
What most struck me from this preliminary view (I'll wait until the final MLGB3 release to make sure) is how much less movement there was than I expected. That is, if books owned by medieval libraries are any indication, the cultural patrimony of Great Britain has not moved far from its home. Over 93% (5900/6316) books from the MLGB3 data show up as being currently held in Great Britain leaving just 416 in other locations. This visualization of course elides the many movements of books between when they were cataloged or inventoried in the medieval period and when they reached their current place of residence. That being said, I wonder how a similar map of the dispersal of French or German monastic libraries would look? Are 93% still in their country of origin (loosely defined)? I doubt it.

Benedictine Abbey of St. Augustine, Canterbury
Benedictine Cathedral Priory of the Holy Trinity, Canterbury
Psalters in the MLGB3

When the data are finalized I look forward to examining in detail what mapping can tell us about the differential fate of manuscripts from certain locations, or even certain kinds of manuscripts. For example see above for the relatively similar dispersal patterns of two Canterbury libraries or right for the dispersal patterns of psalters. Likewise, in the future I would love to combine the MLGB3 records with those in the Schoenberg Database of Manuscripts (SDBM) here at Penn. For instance, manuscripts from St. Augustine's in Canterbury feature in over 100 transaction records in the database. Similarly, the database staff here has entered over 3,200 manuscripts based on entries from Ker. I can imagine also how the fantastic resources within the MLGB3 project could be linked with extant digitized copies of the manuscripts mentioned. The one Penn manuscript noted in MLGB3 (ID 316, formerly Phillipps 20547 and Lea 23) comes from the church of St. Deiniol in Bangor and it would be fantastic to display the digital facsimile of the ex libris inscription alongside the entry. In other words, there's no more exciting place to be for linked digital humanities data than provenance and book history!

UPenn Ms. Codex 75. Ownership inscription, f. 193v:  "Iste liber pertinet Ecclesie sancti Daniellis"

Thursday, October 10, 2013

Library Markings from Looted Books

Bookplate of the Komite zur Förderung Thoradienst Gemeinden in Palästina Frankfurt a.M.
Here at Penn, the rare books cataloging team has been working for the past several years to put images of bookplates, bookstamps, and other provenance markings online in order to facilitate identification of former owners and libraries.  Thanks to the project, I've become increasingly interested in how digital tools might help scholars reconstruct historical libraries and networks of texts.

I've long been interested in the mass movement of books that took place over the 19th and 20th centuries, whether as a result of the dissolution of monasteries, the increased economic and cultural resources of the United States, or the unprecedented tragedies of the World Wars. The wide-scale looting and destruction of books and cultural artifacts by the Third Reich in the 1930s and 40s has drawn an increasing amount of scholarly interest in the past few decades [1]. Even George Clooney is getting in on the action with his upcoming movie on the "Monuments Men" team that worked to locate and preserve works of art during the last months of the war. In reading more about the fate of books and libraries destroyed or stolen by the Nazi regime I was excited to see that the records kept by the central collecting point for looted books at Offenbach were available both in microfilm and (for a fee) online. These records were largely compiled and saved by Ardelia Hall (1899-1979) who was an adviser to the State Department with a tireless focus on returning looted WWII property.

Front Cover of Vol. II (Western) of the albums assembled at Offenbach
(this image: S.J. Pomrenze Papers, Center for Jewish History)

By mid-1946, U.S. and other allied forced had assembled more than 2 million books from Nazi repositories at Offenbach with the aim of returning books to rightful owners wherever possible. The records of this endeavor are voluminous and are available in some 13 reels of microfilm from the National Archives as NARA M1942. This microfilm series has been digitized by Fold3 and is available to subscribers of that service.

To aid their work of cultural restitution, officers at the Offenbach depot made several albums of photographed bookstamps and marks found inside books in their care, which they organized by apparent place of origin. They also created additional albums featuring markings from private libraries and owners which bore no readily identifiable geographic point of origin. All told the albums contain thousands of ownership marks, a perfect candidate for mapping. Feeling decidedly unqualified to tackle the album of markings from Eastern Europe or the vast number of miscellaneous private stamps,  I started with those from Western Europe. The Western European album compiled at Offenbach includes pages categorized by country, i.e. America, Argentina, Austria, Belgium, Denmark, France, Germany, Great Britain, Italy, the Netherlands, Palestine, Spain, and Switzerland, with by far and away the greatest number coming from Germany (344/514). In all there are more than 500 ownership markings present in this geographically sorted album [2].

Distribution of book markings in "Germany" volume from the Offenbach Archival Depot

Each page of the album usually contained many reproductions of book markings crammed together with a reference number but no textual caption. Wanting to create a database of individual library marks, I began by isolating each bookstamp or mark from the album, beginning with those from Germany. I wanted to see geographically where these likely-destroyed libraries and private collections were located and to be able to sort out different types of institutions which had been targeted by the Nazis. The results of this mapping can be seen above and are searchable at [does not work in IE].

In all I mapped 289 library markings to 127 locations with 55 markings remaining unknown to me (images of each individual library mark including the 55 unknown are also available on Flickr). The very top of the list is not surprising, Berlin and Frankfurt virtually tied (32 and 31) for the cities with the most library markings recorded in the Offenbach album, but I was a little surprised that the relatively small city of Hildesheim had as many markings recorded as Hamburg.

It should be kept in mind as well that these figures only represent those library markings in the "Germany" Offenbach album, countless private and otherwise unidentified-by-place markings exist in the other albums. I faced more difficulty in coming up with vocabulary with which to categorize the types of libraries present in this album. The overwhelming majority of book markings of course came from Jewish institutional or private libraries but in my cataloging of the book markings I have largely reserved the "Jewish" library label for institutional libraries such as those of synagogues and communal organizations and not private libraries of those who have names that might suggest Jewish ancestry. As a result, a significant number of library markings are coded as "other." Nonetheless as the map shows below, there is still value in looking at the library markings by type:

Cluster of Jewish libraries near Koblenz

NARA M1942 (reel 12, frame 541)
These caveats pale in comparison though to one of the central problems with making conclusions about wartime destruction of libraries based on the Offenbach albums. The Offenbach team photographed all the provenance marks on a book they could find, which do not necessarily represent the library from which they were looted. This can be readily seen in the "State Library" category on the map. The stamp of the Bibliothek des Bayerischen Landtags in Munich (right) is included in the "Germany" album but this obviously does not mean that the library was looted by the Third Reich, rather that the book had once been in the collections of that library at some unspecified prior point. Thus without further investigation it is difficult to know from this evidence exactly which library owned a given book on the day it was seized by the Third Reich.

First Page of "Germany" in album II (marks 1-7)

(this image: S.J. Pomrenze Papers, Center for Jewish History)
Nonetheless, I think mapping out these places of origin is exceedingly important when done with a more nuanced set of questions in mind. Taking the markings as evidence more broadly of the location of Jewish and other libraries in the decades prior to World War II  provides both a kind of historical recovery and might eventually offer data that could be used by scholars to make new arguments about the diffusion of reading and book culture in central Europe as well as its subsequent destruction.

Obviously mapping just shy of 350 library markings is not going to accomplish this task and I'm excited to move forward to try and catalog all of the markings in the Offenbach albums. This can only be accomplished by a large number of participants with the knowledge and language skills to identify often hard-to-read reproductions [3]. Fortunately, the Center for Jewish History in New York has digitized copies of the albums owned by Col. Seymour Pomrenze, one of the American officers assigned to Offenbach. These albums are virtually identical to those in the National Archives and the CJH digitized images are of better quality than the NARA microfilm. Though I haven't cataloged or geo-located them yet I have used the CJH images to put online the remaining 174 library markings from the "Western Europe" album on Flickr. Melanie Meyers and others at the CJH are working on identifying a broad swath of Eastern European and other marks from the albums and I hope in time a more complete picture, usable for research and discovery, emerges.



This literature is large, a good introduction is the collection of essays The Holocaust and the Book: Destruction and Preservation (University of Massachusetts, 2001). Here at Penn, historian Kathy Peiss has been working for several years on the responses of the library profession to wartime looting and post-war book distribution/repatriation policies. See her "Cultural Policy in a Time of War: The American Response to Endangered Books in World War II," Library Trends 55.3 (2007), 370-386. For a specific example of work on the bookplates in the Offenbach collection see  Frederik J. Hoogewoud, "Dutch Jewish Ex Libris Found among Looted Books in the Offenbach Archival Depot (1946)” in Chaya Brasz & Yosef Kaplan, Dutch Jews as Perceived by Themselves and by Others. (Leiden, 2001), pp. 247-261 (a version is available online through the Clinton Presidential Library).


The "Western European" album (album II) can be found at the National Archives as NARA 260-LM-II-F and on microfilm as M1942 reel 12, frames 506-548. Another copy of this album is in the Colonel Seymour J. Pomrenze papers (P-933) at the Center for Jewish History  An additional copy of the bookplates can also be found at the University of Chicago (Codex Ms 1393). The NARA microfilm has also been digitized through Fold3 and is available online to members of that service at .For the 344 "Germany" library markings mapped here I have used microfilm images from M1942 via Fold3, for the remaining 174 markings from album II on Flickr I have used digitized images from the Pomrenze papers at CJH.


Library markings from WWII-era books are already available online in a few forms outside of the Offenbach records. See for example the Brisman collection digitized at Washington University in St. Louis. The Koordinierungsstelle Magdeburg in Germany maintains a database at which includes some descriptions and pictures of library markings in looted books.

Monday, July 29, 2013

Mapping pre-1500 Printed Books Today

Last week the Penn Libraries hosted a Rare Books School course on the 15th century European book in print and manuscript taught by Will Noel and Paul Needham. As someone interested in the history of libraries and the movement of books over time, I've long been impressed by the volume of detailed information available in digital form about early European printed books. Online catalogs like the Incunabula Short Title Catalog (ISTC) and the Gesamtkatalog der Wiegendrucke (GW) contain tens of thousands of entries about these books including the whereabouts of known copies today. In browsing both catalogs I had been surprised by the wide distribution of incunabula in libraries throughout the world and inspired by the work of the Atlas of Early Printing, I figured it would be interesting to see the global scope of these collections in visual rather than textual form.

Both the ISTC and GW allow users to browse by lists of libraries which hold incunabula but where the ISTC displays library abbreviations/codes (see e.g. this list), the GW actually lists geographic locations with libraries grouped by city. In addition, the GW provides helpfully detailed alternate spellings and names for locations which make them easier to geocode, for example:   "Alba Julia [Gyulafehérvár, Karlsburg, Weißenburg]/Rumänien." For that reason I decided to use data from the GW here, which in all contains listings for some 2,330 place names with institutions holding incunabula. 

I scraped the raw data from the GW web interface and then parsed it on my own which resulted in a few problems, namely while I captured all the place names accurately, some holdings libraries seem to have been lost in the shuffle. I've worked to manually correct these but would not be surprised if further corrections are needed. Likewise, the GW helpfully lists some libraries which formerly owned incunabula and which are now defunct or subsumed into other libraries.For example, for Philadelphia, I know that the number of holdings libraries listed (19) includes the former Mercantile Library of Philadelphia with 5 incunabula. All of these books are now in the Free Library of Philadelphia which means that the total for Philadelphia in my visualization includes one extra holdings location and 5 extra incunabula. In addition, and most importantly, my results from the GW are most useful in counting editions rather than actual physical books. That is, while there may be just over 5,000 separate 15th c. editions in Stuttgart, the Landesbibliothek there holds closer to 7,000 actual 15th c. books as a result of having multiple copies of the same edition (many thanks to Paul Needham for pointing this out). As a result, the exact numbers contained in the visualization should be taken with a grain of salt.

Top 15 cities by holdings of Incunable editions. Number of editions in center column, number of holdings institutions in a given city in right column.

So, despite these caveats, what does the data look like? The top 15 list is hardly surprising, Munich tops the list thanks to the Bayerische Staatsbibliothek and its massive collection, but thinking geographically rather than nationally, Rome would come out as the clear winner if Vatican City and its libraries were included. Likewise, if judging by number of libraries/institutions reporting incunabula holdings (admittedly a somewhat hazy category), London emerges as the extreme outlier. I found the numbers further down the list more surprising, I would not have guessed that Dallas (1013) holds roughly the same number of early printed editions as Zurich (1002) or that Copenhagen (4146) would have a more diverse collection than Venice (3464), one of the centers of early printing.

That being said, if anything the map hews more closely to the geographic origins of the books themselves than I fully realized (excepting the large holdings in the US of course!). The densest clusters of holdings institutions and indeed of incunabula themselves are in the homelands of early printing, German-speaking central Europe and Italy. Compare for example the two maps below, one from the current holdings data and the other from the excellent Atlas of Early Printing showing where incunabula were actually printed. The two pair up pretty well!

Current Incunabula Holdings in Europe (GW data)
Volume of Book Production by Place of Printing 1450-1500 (Atlas of Early Printing)
I expected that thanks to monastic dissolution and library centralization throughout the 19th century would have resulted in a fairly spread-out pattern of incunabula holdings with capital cities and regional centers being the big players with a few scattered libraries in between. This seems certainly to be the case in France and Spain where provincial cities and towns are less well-represented, but in central Europe, the big state and university libraries may have a large share of books, but there are still hundreds of small religious colleges, town libraries, and monasteries holding incunabula in the hinterlands. (If anyone is interested, the weighted geographic center of all current institutions holding incunabula is near the Atlantic coast of France outside of Nantes).

Incunabula holdings in the Adriatic Region
These maps also drew my eye to blank spaces which in turn highlighted borderlands between book-dense areas and those with relative scarcity today. The Adriatic seems to be one such area, with its string of Catholic and state libraries extending down the Croatian coast including Dubrovnik, Zadar, and Šibenik serves to highlight the lack of 15th-century printed books in the interior of the former Yugoslavia - perhaps reflecting the ravages of war, different book/manuscript cultures in Orthodox and Muslim regions, or just the simple lack of good library data.

Something similar struck me about the region to the east of Berlin and the west of Poznan, a seemingly "empty" salient stretching south from the Baltic sea (left). I know next to nothing about this area but would have thought expected a more even distribution of libraries.

Of course, scale is everything. While the views above are intended to highlight cities which possess truly significant incunabula collections, the map below is perhaps a fairer representation of the data - with the sizes of the dots scaled by quartiles. In this view, the truly broad range of holdings locations comes into play, as on this map the top quartile (largest dot) is reserved for any place holding 65 incunabula or more - a seemingly low bar which reflects just how many locations own a very small number of early European printed books.
Current Incunabula Holdings Worldwide - scaled in quartiles.

Finally, this world-view impressed on me the lack of reported holdings in North Africa and the Middle East generally. The fact that there are only four incunabula from Istanbul reported in the GW is somewhat shocking (for more see Les incunables de la bibliotheque des Musees Archeologiques d'Istanbul). Considering the place of the Ottoman Empire in Mediterranean and world history, the lack of greater numbers of early printed books in Turkish libraries begs an explanation (library destruction? lack of cataloging?). Likewise, the lack of reported holdings in Egypt prompted me to start searching library catalogs. I found six unreported in the new Bibliotheca Alexandrina but am sure there must be more in other Egyptian libraries as well.

I look forward to discovering more in the data over the coming weeks and I can't stress enough how important rich bibliographic databases like the ISTC and GW are for scholars. They are exceptional resources that took decades of work to put together. Given the amount of work that went into creating their data I hope that in the future there will be a way for both to offer machine interfaces which make the downloading of raw data simple and these kinds of visualizations second nature to researchers. 

Saturday, July 20, 2013

Expanding the Republic of Letters: India and the Circulation of Ideas in the Late Eighteenth Century

Today I'm presenting at the Society for the History of Authorship, Reading & Publishing (SHARP) annual conference which is being held here at Penn. Rather than giving a traditional conference paper I will be participating in the "digital project showcase" which features a number of really fantastic digital book history projects. I thought it would be helpful to post here some of what I will be showing today at the conference.

My project was inspired in a way by one of the most successful visualization projects of the last few years, Stanford’s Mapping the Republic of Letters project (ROL). The project uses data about thousands of seventeenth and eighteenth century letters to provide a powerful visual representation of how intellectual and correspondence networks functioned over the long eighteenth century. The visualizations that result from the project are quite powerful and illustrative and have immediate impact on students and others trying to get a sense of the geography of the Enlightenment. Taking as an example the 1751-1800 period below, one finds in the ROL visualization what one might expect: Paris, London, Edinburgh, Geneva, all show up
brightly as nodes of discourse and communication:

Without diminishing the ROL's achievements though, I was immediately struck by the absences encoded into this sweeping view of the Enlightenment. As a historian of 18th-century India, I was especially concerned about what it meant that it is visualized in the ROL as connected to the European Enlightenment in this period by just a single slender line: 

In my own research on legal culture in early modern India I had long been struck by the ways in which legal information and texts flowed in all different directions between and through India and Europe. For the SHARP showcase then I proposed a new visualization of the eighteenth-century, one which would focus on circuits of knowledge exchange in the form of textual movement between India and the rest of the world.

The resulting project is based on extensive research and data from wills, inventories, auction and library catalogs, as well as correspondence and other records. To be more precise, the visualizations below come from some 2,400 mentions of print and manuscript texts sent to India from abroad or which were produced or owned in parts of European-ruled India. Spelling out these sources I think makes clear the limits as well as the potential of the project. Records of book ownership and text circulation in 18th-century India are difficult to get at and since my goal was to show connections with the wider world, I necessarily focused on nodes of greatest contact, especially the East India Company port cities of Bombay, Madras, and Calcutta, as well as other European enclaves like Tranquebar and liminal zones like Lucknow. Much is obviously lost in this survey, especially the enormous body of Persianate literature that circulated throughout central and south Asia as well as those texts which moved between China, southeast Asia, and India. Yet for now, there is only so far I can go and I look forward to building on the project with the assistance of other scholars.

So what were the results:

Instead of that measly thin line connecting India with Europe in the 18th century we see a robust array of connections. The blue lines represent texts flowing from Europe/Americas to India and, perhaps more importantly, the red lines represent texts moving outward and within India. Though you can manipulate the visualization above as you chose I thought I would highlight some of the more significant questions that I think come out of this view. 

First is the need to look beyond print to see networks of circulation. In his impressive bibliography of printing in South Asia, Graham Shaw lists just 1,344 imprints from mainland South Asia before 1800 (Another 427 come from Dutch Sri Lanka). Many of these books were printed in extremely small numbers and are not known to have circulated particularly far. As a result, the print connections between Indian-produced materials pale in comparison to the inflow from Europe. If we select only flows of manuscript material however we remove much of those large blue print-lines from Europe and see a richer picture of the circulation of Indian texts:

Movement of manuscripts to and from India c. 1750-1800

In addition to showing the movement of texts in aggregate I also wanted to be able to say something about the nature of these texts. Thinking of the Stanford ROL project I decided to see what the movement of texts by authors whose correspondence is represented in that project (~40 or so including Adam Smith, Voltaire, Rousseau, and Locke). Their texts were some of the most popular in my records though notably, because of the nature of the data, most in English translation:

Flow of texts by "Enlightenment" authors c.1750-1800
This kind of geographic visualization also flattens different kinds of textual transmission. Should the fact that an English soldier in Calcutta owned a European-printed copy of Goethe's Sorrows of Young Werther be represented equally with the fact that a pirated translation was printed at Calcutta in 1792 (though no copy survives today)?

Though slightly disappointed with the informational value of the Enlightenment authors map, I was more curious about those texts which I labeled as being broadly scientific, algebra texts, accounts of experiments, journals of temperatures, Persian treatises on medicine, etc. :

The map to the right shows the interplay and diversity of transmission of these "scientific" texts. Rather than a homogeneous block of European science entering India, there was a robust interested in locally produced scientific and medical accounts by authors of all kinds.

Yet, perhaps the most well distributed exchange of ideas seems to have taken place in the realm of historical texts and the accounts of political structures produces in both Europe and India. Though scholarship on early Orientalism has often focused on religious and philological translation and collecting, perhaps more than anything else, 18th century Indian readers and collectors relished histories. These included texts from Europe like Paul de Rapin's History of England or those from India like the Alamgir-Nama of Mirza Muhammed Kazim both of which circulated widely:

"Historical" texts and their circulation c. 1750-1800

I'm just starting to take a look at these maps in an attempt to formulate further research questions and I do hope readers will play with the interactive features to ask questions of their own.

Geographic maps only go so far though in representing this circulation of texts. They tend to aggregate and obscure individual books and historical actors. For that reason I turned to another type of visualization in an attempt to understand which books and readers featured most prominently in my data.

To the right is the bewildering array of connections formed when one plots texts with common owners, that is, who is connected by shared ownership of particular titles and what can that tell us about the circulation of texts in India. This view is of course barely useful in its current state other than to show a central cluster of connected people and texts and at the bottom an array of people and texts who remain unconnected. To see the full network in PDF form see here.

A different view of the same data I think proves more instructive:
Books (black) by size according to number of connections in the data
red dots represent individual owners

This view shows the very center of that cluster above, this time however, the size of each node (dot) is determined by the number of connections it shares with its neighbors. In this case the black dots represent particular titles and the red dots particular owners. The large nodes here are the most popular texts, including the Bible, the Works of Jonathan Swift, Alexander Pope, and Shakespeare, Addison and Steele's Spectator, a variety of print and manuscript Persian dictionaries, Tristram Shandy, and the classic Persian prose work, Sa'di's Gulistan. Looking further afield from the classics though there are some interesting questions to be asked. I noticed in perusing the records that two Bengali men in Calcutta seemed to be purchasing a number of books at estate sales. One of these, "Gopee Tagoror [Tagore]" seems to have been especially interested in anti-onanism tracts. In fact in 1767 he bought a hot-of-the-press warning on the "Detestable Vice of Self-Pollution" [ESTC T207134] which is today only held in two libraries worldwide. Was he a bookseller? A fan of self-improvement literature? A committed anti-onanist?There is much to interpret here and I hope both at today's showcase and in future conversations to begin mining these connections for what they can and cannot tell us about the cultural world of colonial India in the late 18th century.
Gopee Tagore's books 1767

As a final coda, just hours away from the showcase itself, I'm completely humbled by how frustrating a task this proved to be. At the end of this stage of work I realize just how central absence and omission are to any visualization of historical information. No matter how much I tried to "fill in the gaps" my visualization remains constrained by data available and historical uncertainty and I've come away knowing that while I may have added a useful addenda to the vision of the Enlightenment that ROL offers, it is far from complete and perhaps offers its greatest value in forcing us to ask what is missing. 

Sources of records

966 records from
Inventories of Estates at Madras, 1768-1779 [3 volumes]
Inventories of Estates at Calcutta, 1764-1772 [7 volumes]
Sample of Madras, Bombay, and Calcutta wills 1750-1780

415 records extracted from provenance information contained in Graham Shaw's magisterial  South Asia and Burma retrospective bibliography (SABREB) (London, 1987). 

359 records from 1777-1800 (majority 1778-1782) taken from official lists of inventories sent to the East India Company in London. These were coded by Margot Finn and her team under the ESRC funded project: "Colonial possession : personal property and social identity in British India, 1780-1848" and are available as UK Data Archive: Study Number 5254.

255 records extracted from 28 major catalogs of Persian and other oriental manuscripts including those of the British Library, India Office Library, Oxford, Cambridge, Edinburgh, the Bibliotheque National, the Salar Jang Library, Harvard, Yale, Michigan, the Royal Asiatic Society, the Phillipps collection, The Danish Royal Library, the Khuda Bakhsh Library, and others. This work is ongoing.

234 records extracted from sampled newspaper advertisements in three Bombay and Calcutta newspapers 1782-1793

141 records based on notes from assorted inventories, library lists, and mentions of books contained in official East India Company Correspondence, printed reports of the Supreme Court at Calcutta (1774-1800), and other secondary sources.