Thursday, January 2, 2014

Linking Archival Sources in the 2013 AHR

With the annual conference of the American Historical Association (AHA) starting today I'm excited to see friends and hear some great papers. I'm always struck by just how broad a field 'history' represents but yet how often historians are able to make connections to each others work, even when far removed temporally and geographically. In reading the AHA's flagship journal, The American Historical Review (AHR) this year I especially enjoyed seeing places where seemingly unconnected articles spoke from similar frames of reference, and most interestingly, from overlapping source bases (be sure to check out my Penn colleague Vanessa Ogle's great article on the history of time reform!).

Authors of articles in the 2013 AHR connected by commonly used archives

As this site indicates, I'm very interested in tracking the circulation of texts, ideas, and archives over time as well as how these sources are used by scholars. Tracking networks of citation is nothing new and has been a favorite activity of scholars for centuries but recently there's been a surge of interest in quantitative analysis of academic citation patterns. Most of this interest has been in the sciences and social sciences where "impact factors" (put simply, the quantity and importance of articles citing one's work) are de rigueur in weighing scholarly merit. Though I'm wary of many of the developments in this "bibliometrics" field, some of the more useful advances have been in using data about authorship and citation to show the material ways fields are constructed, i.e. the influence of certain universities, graduate programs, or scholars in a specific sub-discipline. Here at Penn for instance, my colleagues at the library have helped the school of Medicine and others to create a way for viewing co-authorship networks of particular researchers.

Though tracking citation of articles and secondary sources in a journal like the AHR would really illuminate networks of influence, interest, and argument, I'm more interested in how historians use archival sources. This is especially important given that the bibliometric wizards at big publishing companies like Elsevier and Proquest have done a decent job at figuring out article and book citations and linking them together, but with much less success with archival sources.

I extracted data on archival sources from 16 of the 17 feature articles in the five AHR issues for 2013 [1]. The authors of these pieces did not disappoint, citing 66 different archives and libraries located in 54 different cities from Berkeley to Sarajevo to Zanzibar [2].

Location of Archives and Libraries cited in 2013 AHR articles [Interactive map]
Despite disparate topics and the relatively random assortment of scholars and articles across the year's issues (as far as I can tell none of the articles were grouped in 'theme' issues) there were several nodes of archival overlap.

Archives used by multiple 2013 AHR authors

Obviously one year of the AHR is a pretty weak sample but I suspect the pattern established would hold across a wider swath of the journal - i.e. an impressive array of geographically dispersed archives based on the focus of particular authors as well as a concentration of overlapping citation from the major state and university archives and libraries of Europe and North America. Along these lines I would be curious to see how the influence of particular archives have waxed and waned over the years in the profession, I imagine that a select number of repositories (NARA, the UK national archives, the British Library, Library of Congress, the BN in Paris, various German archives, etc.) have long been dominant across geographic and temporal fields given the institutional makeup of the historical profession but I would also be surprised if the dominance of these central archives haven't decreased given methodological and theoretical shifts in the discipline since the 1970s.

This question though speaks to the point in writing this piece. Doing such an analysis of the AHR or any historical journal across the last 50 or 100 years would be extremely difficult (at least for me) given both the available formats of back-issues of electronic journals as well as the way we as historians cite archival sources. The first point is the easier to address. Looking at citations in the 2013 AHR was relatively easy given the availability of all but one of the featured articles in HTML with their notes arranged in one clump at the end for easy scraping. For older issues of the journal only OCR'd PDFs are available which make precise scraping of just footnotes a bit more complex [3].

The second issue though is maybe the more difficult one. While the names scholars use for archives and archival sources within their work follow the rule of internal consistency (state an acronym or short title for an archive and stick with it), they don't always link up well to how other scholars might cite similar repositories. Take the vexing problem of the National Archives of the United Kingdom at Kew for starters which was the most commonly used archive of the 2013 AHR authors. When it comes to Kew, my own citation practices are willfully disobedient, I can't stomach writing the preferred abbreviation "TNA" (really? 'the' in an acronym? - not cricket) but even when historians do the right thing it still isn't that simple. When citing the National Archives in the 2013 AHR Stanwood and Hilliard both used the identical form "The National Archives, Kew [hereafter TNA]" but Mikhail and Ogle used slight variants: "The National Archives of the United Kingdom [hereafter TNA]"(Mikhail), "British National Archives, Kew [hereafter NA]"(Ogle). Additionally, having a computer sort he data I was convinced Fair has also cited the Kew archives in her work on East Africa with several footnotes reading something like "TNA 435/B/2/2" (p.1088, n.44). But wait...on the first page of Fair's article:

Laura Fair, "Drive-In Socialism: Debating Modernities and Development in Dar es Salaam, Tanzania," The American Historical Review (2013) 118 (4): 1077
Now that's a proper acronym! Of course in context to human readers this isn't much of a problem (though one can easily imagine both TNAs being cited in an article on British East Africa), but when looking at patterns at scale this has the potential to be a bit troublesome.

But why is this really all that important except in doing macroanalysis of journals or dreaded 'metrics'? I was a bit skeptical myself until I saw the power of linking sources as brought to bear by Proquest in its electronic edition of my dissertation. Their system managed to extract 452 "items" from my dissertation bibliography and though striking out completely on archival sources did an amazing job with book and article citations, opening up a world of possibilities for finding colleagues and sources in the field I didn't know of before.

For example, in its automated searching of my bibliography, Proquest identified correctly a somewhat obscure (with apologies to epigraphers!) monograph of Indian inscriptions. Helpfully,  anyone looking at my work can then click the name of the book and see a list of the 3 other dissertations that also cited it:

Unsurprisingly, this led me to hours of clicking through to other dissertations and finding all sorts of people I'd never heard of before.

This kind of linking and search capability would obviusly also be enormously useful for archival sources. Taking just the 2013 AHR articles as examples, imagine if every one of Pettit's citations to the Atherton Papers at the Bancroft Library linked to other articles citing this collection, or when Stanwood cites State Papers 96/9 at Kew if other scholars work citing the same record number also popped up. As a historian of India I know I would have been fascinated to see Ogle's use of the Indian newspaper summaries in BL IOR L/R/5/161 also link to other work using the same documents or be excited to see the connection when Stanwood cited a particular letter to St. Helena (BL IOR E/3/92 f.17) which is also referenced in several pieces by Phil Stern in his work on the East India Company [4].

Needless to say, the possibilities are pretty exciting but a lot of work needs to be done to get there. For a start I wonder if historians and historical journals could begin adopting some of the linked data sources available for libraries and archives. Much like airports have codes (IAD, DCA, BWI, etc.) IDs are also available for many of the world's major libraries and archives. For more than you ever wanted to know about the history of library identification codes in the US see the Library of Congress summary here. Though an imperfect solution, when seeking unique identifiers for libraries and archives I usually start with OCLC's registry, see e.g. the entry for the Bancroft Library (OCLC-RQE) which also lists ISIL numbers. I don't know how much I want to push this but what about embedding ISIL numbers or other codes within electronic versions of articles? Use "TNA" "BNA" or "BA" all you want in the text of a piece but perhaps with "OCLC-UKARC" embedded in some fashion. Even for archives and libraries in parts of the world not covered by these indexes there are usually some kinds of identifiers available. For example, the archives in which Smith found some of her fantastic sources (the "Gosudarstvennyi arkhiv Iaroslavskoi oblasti") could also be identified as ArcheoBiblioBase R-263.

In a perfect world I can even imagine these kinds of identifiers being integrated with other linked data sources for proper nouns. Why not even go all-in and include references for people and places as well. For example, Michael Pettit, the author of one of the AHR's 2013 articles, is better known in the library linked data world as VIAF 233554130 which includes information on alternative forms of his name.

Though much of this linking work is impractical for individual historians I would love to see flagship journals like the AHR step forward and contribute. In a world in which libraries and scholars balk at the increasing cost of purchasing electronic journal subscriptions this seems like a clear value-added that is easily justifiable. In addition, in many ways the people most well placed to see the whole range of work cited in a field are those involved at the journal level and thus might be able to implement linked data more universally.

Finally, I want to make sure to say that this is a great problem to have as a field. How exciting is it that historians get to visit archives in every corner of the globe and seek out new sources to better our understanding of the past. The fact that we don't have an easy set of archival identifiers yet is probably a great indication of how diverse and fast-growing the discipline is and I hope no effort to implement these kinds of standards gets in the way of that!



I chose only those pieces identified by the AHR as "articles" and not discussion pieces or forum articles. Much of the extracting of the names of archives and libraries was done by hand after some computer pre-processing. Importantly, this survey ignores the citation of printed sources outside of the archival context. This is no small issue. It is not common for historians to cite the name of a library at which they used a copy of a printed text (though much more common in book history and some literary history). This necessarily biases my survey towards what we think of as "archives" - manuscripts and papers which themselves may contain printed texts.


See the full list here. It's likely I missed some along the way and I'm more than happy to be corrected by the authors!


Data from long runs of journals, even in OCR'd form has proved useful in other analyses, especially that of the full run of the Proceedings of the Modern Language Association done by Ted Underwood and Andrew Goldstone: "What Can Topic Models of PMLA Teach Us About the History of Literary Scholarship?" Journal of Digital Humanities 2.1 (2012).


See this letter cited in P. Stern, The Company State (Oxford, 2011), p. 36.

AHR articles mined for sources above:

Owen Stanwood, "Between Eden and Empire: Huguenot Refugees and the Promise of New Worlds,"
The American Historical Review (2013) 118 (5): 1319-1344 doi:10.1093/ahr/118.5.1319

Michel Gobat, "The Invention of Latin America: A Transnational History of Anti-Imperialism, Democracy, and Race," The American Historical Review (2013) 118 (5): 1345-1375 doi:10.1093/ahr/118.5.1345

Vanessa Ogle, "Whose Time Is It? The Pluralization of Time and the Global Condition, 1870s–1940s,"
The American Historical Review (2013) 118 (5): 1376-1402 doi:10.1093/ahr/118.5.1376

Daniel Magaziner, "Two Stories about Art, Education, and Beauty in Twentieth-Century South Africa,"The American Historical Review (2013) 118 (5): 1403-1429 doi:10.1093/ahr/118.5.1403

Sarah M. S. Pearsall,“Having Many Wives” in Two American Rebellions: The Politics of Households and the Radically Conservative," The American Historical Review (2013) 118 (4): 1001-1028 doi:10.1093/ahr/118.4.1001

Alison K. Smith, "Freed Serfs without Free People: Manumission in Imperial Russia,"
The American Historical Review (2013) 118 (4): 1029-1051 doi:10.1093/ahr/118.4.1029

Michael Pettit, "Becoming Glandular: Endocrinology, Mass Culture, and Experimental Lives in the Interwar Age," The American Historical Review (2013) 118 (4): 1052-1076 doi:10.1093/ahr/118.4.1052

Laura Fair, "Drive-In Socialism: Debating Modernities and Development in Dar es Salaam, Tanzania,"
The American Historical Review (2013) 118 (4): 1077-1104 doi:10.1093/ahr/118.4.1077

Talbot C. Imlay, "International Socialism and Decolonization during the 1950s: Competing Rights and the Postcolonial Order," The American Historical Review (2013) 118 (4): 1105-1132 doi:10.1093/ahr/118.4.1105

Christopher Hilliard, “Is It a Book That You Would Even Wish Your Wife or Your Servants to Read?” Obscenity Law and the Politics of Reading in Modern England," The American Historical Review (2013) 118 (3): 653-678 doi:10.1093/ahr/118.3.653

Max Bergholz, "Sudden Nationhood: The Microdynamics of Intercommunal Relations in Bosnia-Herzegovina after World War II," The American Historical Review (2013) 118 (3): 679-707 doi:10.1093/ahr/118.3.679

Alan Mikhail, "Unleashing the Beast: Animals, Energy, and the Economy of Labor in Ottoman Egypt
The American Historical Review (2013) 118 (2): 317-348 doi:10.1093/ahr/118.2.317

Ryan Tucker Jones, "Running into Whales: The History of the North Pacific from below the Waves,"
The American Historical Review (2013) 118 (2): 349-377 doi:10.1093/ahr/118.2.349

Daniel J├╝tte, "Interfaith Encounters between Jews and Christians in the Early Modern Period and Beyond: Toward a Framework,"The American Historical Review (2013) 118 (2): 378-400 doi:10.1093/ahr/118.2.378

Nile Green, "Spacetime and the Muslim Journey West: Industrial Communications in the Making of the “Muslim World”The American Historical Review (2013) 118 (2): 401-429 doi:10.1093/ahr/118.2.401

Yumi Moon, "Immoral Rights: Korean Populist Collaborators and the Japanese Colonization of Korea, 1904–1910" The American Historical Review (2013) 118 (1): 20-44 doi:10.1093/ahr/118.1.20

I did not use Jennifer Evans' excellent article in the April issue ("Seeing Subjectivity: Erotic Photography and the Optics of Desire") given its unavailability in html form due to copyright restrictions.

1 comment:

  1. I'm so glad to have found this information finally. The thing is that I am working on a custom paper and this post seems to be very useful for my project. Many thanks to you!