Digital History and the Digital Modern

Digital History and the Digital Modern

Center for Digital History Aarhus (CEDHAR) Launch Keynote

22 February, 2019

This talk was an early attempt to explore ideas that I've been pondering for some time. I'm conscious some areas need work. My intention is to expand it to include work by R. G. Collingwood and other writers, along with a brief history of digital history, and to provide more precision when discussing probabilistic methods.

In my book The Digital Humanities and the Digital Modern I make one primary claim: that to understand the digital humanities properly (to understand its full implications for epistemology and method, rather than to merely position it as a transitory epiphenomenon of Silicon Valley capitalism) it needs to be positioned within the wider context of what I termed ‘The Digital Modern’. In the introduction I note that “The ‘digital modern’ is offered as a conceptual device to understand the contemporary world. It is presented as a metonym to describe the computational rationalities, tools, methods, and products used across government, business, culture, and society. The term is also designed to connect digital culture with the shock of the new that was experienced during the late nineteenth and twentieth centuries, emerging references to postdigital aesthetics that advocate a return to simpler forms of modernist culture, and future shock discourses associated with twentieth-century commentators such as Jacques Ellul, Lewis Mumford, and Alvin Toffler. The digital modern is circumscribed by its cultural, discursive, political, legal, geographical, temporal, physical, and mathematical boundaries. My aim is to describe the limits of the digital world in order to offer a critical frame for the chapters that follow, but—in doing so—to diminish claims for its ubiquity.”[1]

Historians will recognise my construction of ‘the digital modern’ as a conceptual category as an act of simple historicization, and so it is: it is more productive to position our new tools and methods within the broad sweep of history than to critique them one by one, in isolation, as they appear. We put up with that for ten or twenty years, after the ‘digital turn’ started to gain pace in the mid-1990s, but it has become intellectually tiresome and rhetorically unsustainable. By historicizing our present moment we examine our predicament from all angles.

The fact I felt the need to construct a new conceptual device to provide context for my argument is suggestive: even with so many commentators crowding the field, it is challenging to find modes of analysis that survive contact with the technology it is designed to analyse. And I might be tilting at windmills, but I don’t want to shift register when I move from the library to the laboratory, from text to code, from reading to engineering. It feels inadequate – or at best clunky – to think one way in the humanities and another in the digital humanities: there must be a way to square the circle, to develop modes of thinking properly suited to the historical context we live and work in.

This talk represents a continuation of that effort, but with the increased disciplinary focus offered by digital history. My hope is that this kind of ‘return to the disciplines’, so nicely encapsulated by CEDHAR, will result in improvements in the quality of digital scholarship that the wider (and still absolutely necessary) digital humanities community sometimes lacks. Initiatives like CEDHAR are of their time and representative of the emerging cutting edge in digital humanities scholarship and engineering: to my mind they hold the promise of an intellectual reconciliation between the old and the new, a return to traditional values of methodological continuity, scientific transparency, and scholarly precision.

To do so they will need guidance from historical theory and method. Unfortunately, however, historical theorists have a habit of disregarding contemporary digital experience. The blind spot is apparent in even rather avante garde theoretical initiatives that we might expect to be attuned to signals emanating from the wider culture. In their 2008 theoretical manifesto, for example, Ethan Kleinberg, Joan Scott, and Gary Wilder pointed to the psychical, epistemological, ethical, and political implications of historical scholarship but (although a generous reading could position it as an element within their 'epistemological' category) didn’t mention the technical.[2] The journal Rethinking History is more open to technical topics, but they still sometimes seem ‘tacked on’ to the main historiographical and theoretical tradition, rather than integral to it: we struggle to naturalise technical topics within History as a discipline, making them appear unsophisticated or brittle.

Why is this elision of technically-oriented historical theory troubling? It should hardly need spelling out. When researchers and students research their histories they don’t often walk into ivy covered stone buildings containing only card catalogues and physical books and newspapers. And if they do enter a physical building, they often quickly immerse themselves in digital worlds, comprised of digital catalogue systems, digital repositories of newspapers, books, and documents – and now mind-numbingly large archives of social media posts and websites that are often impossible to navigate. Our entanglement with technical research infrastructure is profound, and we should assume it has a profound impact on our understanding of the past. Even more so when we start designing and building digital tools and content ourselves, to help us answer our research questions or disseminate our findings to colleagues and the wider public. More confoundingly, as I realised in my attempt to sketch the boundaries of our contemporary digital humanities cyberinfrastructure [represented in the slide diagram], our technical entanglements extend to the massive source collections guarded by the likes of Google, Facebook, and Amazon. Any historical theory that omits that domain is missing an elementary feature.

The implications of this entanglement are profound. It’s worth pausing for a moment to understand the scale of the problem. I often go back to a comment made by Dan Cohen in 2011, when he pointed out that a single historian might have been able to read and analyse the 40,000 memos issued at the White House during the Johnson administration, but that such a methodology could never handle the 4 million email memos sent while Bill Clinton was in office.[3] We don’t need to get started with the Clintons and their email servers, or even to remember that Dan made that comment before Web 2.0 and the use and then weaponization of social media for political gain: the simple fact is that contemporary historians are hopelessly entangled with vast and complex archives of digital material. And medieval historians – and historians of all periods – are swimming in much the same pool. Understanding our psychical, epistemological, ethical, and political entanglement with History is almost always now dependent to some degree upon path dependencies exerted by technical systems and data structures: it is surely one of the key theoretical issues of our day.

So what do I mean by entanglement? British archaeologist Ian Hodder articulates it nicely, by pointing out that – rather than being children of nature in the romantic sense, divorced from the built world – humans have always had intimate relationships with the objects we design and use. We ‘are always busy along the strings or cables of entanglement mending things, putting fingers in dykes, fixing holes in buckets and so on.’[4] This amounts to a degree of dependency or ‘entrapment’ with everything from the plants and animals we breed to the buildings we occupy. Technological historians will be relieved to hear that ‘entanglement’ doesn’t imply hard determinism, more a kind of gently, symbiotic, evolving relationship that shapes (and is in turn shaped by) human identity and society. Accepting that we are entangled with ‘things’ brings us to a closer understanding of how we act and what we value.[5]

And so my rather simple claim is that contemporary life entangles us – historians – with the digital world (or to use my historicized version of it, the digital modern). Understanding the entanglement of historians with the digital modern is important work. But where to start?

Perhaps we can start by understanding the way historians are entangled with a large source of historical content like Europeana? We could talk about the federated nature of the archive, spread across multiple institutional locations, issues related to its information architecture – (the way items are described and indexed), what is included and what is excluded, how indigenous groups are represented or misrepresented, the gender balance of content producers but also content across the archive as a whole: in short, the stories the archive tells, about the past but also itself. There are a huge number of issues to explore there, but perhaps too many for a 40 minute paper.

What about with a digital research tool like DigiPal (built by Peter Stokes and company at King’s College London, and now supported by King’s Digital Lab as the Archetype framework)? We could pull a vast number of threads with a tool like this too, designed from the ground up to help answer a set of quite specific palaeographic questions, and now being used for a wide variety of applications from art history to religious history. We’re being contacted by people all around the world wanting to use the tool, raising questions of sustainability, as well as design: who’s research questions should the tool help answer, and who do we need to point in another direction?

Or Omeka – an archive management system built at the Centre for History and New Media at George Mason University, and used for everything from student projects to post-disaster archiving. Again, it would take at least 40 minutes to introduce the methodological implications of a tool like that. Omeka makes it incredibly easy to produce historical archives, allowing content that might never be able to be made available otherwise available to very wide audiences. It has deep pedagogical as well as methodological implications, and huge implications for public history.

Or Mapping the Republic of Letters, the famous Stanford University project that used Geographic Information Systems (GIS) technology to map correspondence amongst Enlightenment philosophers? No chance. Its sophisticated visualizations entrance us, but the scholarly findings it produced require significant engagement with the historical literature for proper understanding.

Each of these examples is comparable to a monograph in the demands it places on interpretation, in other words. And perhaps moreso if we remember that any adequate interpretation will attend to the technical and architectural issues alongside more traditional questions of interpretation and method.

So where do we start? To get down to brass tacks, we need to reduce our argument to the purely abstract level, represented here in a random database diagram I found online. Each of these fields represents a piece of data, the data model as a whole representing the sum total of the information and its inter-relationships held in the database it was used to build. Our core theoretical and methodological problems as historians in a digital age – if we are to analyse the issue at the level of abstraction it deserves – is to work out our epistemological relationship to this collection of - can I say ‘data as data’? Not ‘content as data’ (not a digitized manuscript, or even metadata about a manuscript), but simply the logical representation of digital bits organised in digital space. How do we confront that, theoretically and methodologically? What opportunities does it offer, and what problems does it create?

It turns out that the theoretical and methodological implications are rather profound and have a neat tie-in with two seminal essays (or more accurately, ‘thought experiments’) in historical theory, both published in 1960: Isaiah Berlin’s article ‘The Concept of Scientific History’ published in the first issue of History and Theory,[6] and Thomas Nagel’s ‘Determinism in History’ published in Philosophy and Phenomenological Research.[7] The essays were written at the height of professional angst about Marxist interpretations of history that brought issues of economic and technological determinism to the fore. (I sometimes wonder if historical logic had a similar resonance in Cold War Europe to our various ‘digital’ logics today, in its pressing relationship to political economy and statecraft.)

Berlin pressed the case for History being an inductive science, taking readers through a rigorous logical process to demonstrate that, in the end, such a method would always fall short. Human societies cohere around stories about the past that will always result in different interpretations of events: final historical knowledge is a chimera, the historical record is too fragmentary to support it and human culture too unstable to transmit it.

Nagel’s article could be said to have dealt with the same topic from the other side of the coin. He asked whether history was determined and could therefore be deduced from universal laws. In much the same way as Berlin, after walking readers through rigorous logical steps, he decided that yes History was in fact determined….but only at the very large and very small scales. Like quantum physics, strange phenomena were at work that made it impossible to capture universal historical Truth on the wing. At the end of the day, both philosophers decided, History depended too much on narrative to ever be classed as an inductive or deductive science.

We’ve come a long way since the 1960s but that tension at the heart of History as a discipline - between science and narrative – still exists. I suspect that the turn towards digital history (especially data-intensive digital history) will intensify it. The urge to turn History into a nomothetic science has existed for centuries, of course. Berlin notes that the tendency was especially strong in the 19th century, following Auguste Comte. The goal was worthy: to produce a vast labor-saving system that could automate the historical imagination, and ‘like a perpetual motion machine’ continuously generate new insights into the past independent of human bias, drawn only from a vast ocean of facts. [8] Epistemological niceties such as the relative merits of inductive or deductive method would become trivialities next to the great firehoses of historical knowledge that would be produced. Theorists taught us the folly of such imaginings during the twentieth century, though, didn’t they? Hayden White, Keith Jenkins, Frank Ankersmit and company: all is narrative, and representation….and even (following Fukuyama) the end of History itself? Positivism has not fared well in historical theory over the past 50 years or so.

But digital history changes that, in ways that I don’t think the historical profession has quite grasped yet. Berlin and Nagel would have been fascinated by the myriad online services cropping up, with names like Amazon Sagemaker, that allow data scientists to analyse massive datasets with minimal infrastructural overhead. I’m sure if Auguste Comte was confronted with Google’s Ngram viewer, which allows you to search for words and their immediate collocates (or ‘Ngrams’) across millions of volumes, he would have immediately typed in ‘Truth’ and waited for the results. But he would have been disappointed wouldn’t he.

Numerous studies have pointed out the inadequacies of the Google Books corpus, from inadequate Optical Character Recognition (OCR) and metadata to collections compromised by the inclusion of scientific papers – and the primary problem of it being a library, with one copy of each book, making it impossible to deduce the historical importance of any single book or word.[9] As invaluable as it is when used with caution (and even better, via the tools available at the Hathi Trust Research Centre), the underlying dataset (think of that data model slide I showed earlier) is at best tainted and at worst horribly misleading.

Tim Hitchcock made a good point when he noted that the ‘Googleization’ of the world has led to a ‘deracination of knowledge’ rather than a transition towards all-knowing machines, uprooting ‘what was once a coherent collection of beliefs and systems for discovering and performing taxonomies on information’.[10] As promising as the many new services are, any purity of ‘scientific’ method attained through the logical rigour of a Berlin or Nagel can easily be confounded by poor quality data or naïve application of analytical tools. I’m not saying that Google Ngrams is of no use to historians, then – I will probably use it in an important section of my next book – but it needs to be used with caution and due diligence. Nomothetic it most definitely is not.

But we historians are a persistent bunch, and that nomothetic machine sounds very cool. I suspect that’s the reason we’re now seeing a new generation of very large-scale data intensive projects that aim to produce very large but also very high-quality datasets, that will be more conducive to scientific methods than Google Books. We’re entering an era of the ‘historical macroscope’,[11] and it holds great promise. By not only attending to the quality of the underlying datasets and organising data models, but providing powerful analytical tools and reproducible workflows, projects are pushing the boundaries of empirically grounded history and enabling a range of methods: inductive, deductive, and merely exploratory. True, the datasets and tools will always be limited when placed against the great oceans of the historical past, but they offer hope of real iterative improvement and a degree of transparency and reproducibility hitherto thought impossible. In many ways they are an example of historians showing the engineers at Google how to do things properly: take care with your sources, make sure your provenance is sound and your arguments traceable, let other people test and share your findings.

But just as our friendly positivists start sitting forward on their seats in anticipation, push-button truth slips from our grasp again. Because many of the most interesting analytical techniques – often used to explore the very large datasets – are neither inductive nor deductive but probabilistic. Straight-forward word counts, or searches for places of interest or dates will always be useful (and these large new datasets will surely provide plenty of fodder for that kind of work), but a lot of the work they enable will be enabled in turn by algorithms driving mathematical models of remarkable complexity.

Rather than working from a preconceived theory, or building a theory up using discrete pieces of data, probabilistic methods use Bayesian statistics to establish a degree of belief – which can be updated as new information comes to light. This is no place for a lesson on Bayesian statistics from a numerically challenged historian but – to over-simplify things to a reckless degree – we could perhaps view it as a mathematically mediated compromise between inductive and deductive methods, allowing the researcher to move between their preconceived expectations and the available evidence in an ongoing process of refinement and calibration. The mathematical models that determine this process are various, and complex. Latent Dirichlet Allocation, or LDA, is one of the best known amongst digital humanists.[12] It is implemented in so-called ‘topic modelling’ algorithms, which traverse corpuses of texts grouping together semantically similar words into ‘topics’ they ‘probably’ correspond to. The results can range from being utterly opaque to quite revealing but are neither inductively or deductively provable. Researchers are confronted with a gestalt of information that needs to be weighed and incorporated into their narratives as evidence based on a very wide range of factors, from the quality of the dataset, to the chosen mathematical model, and calibrations made to the algorithm implementing the model.

A wide variety of these techniques exist, variously gathered under the label of ‘machine learning’. Any historians using them need to take great care with what might be termed ‘methodological hygiene’, carefully delimiting the boundaries of their probabilistic evidence and taking great care to colligate it against a wide variety of other more straight-forward historical evidence. Having large high-quality datasets is one thing: understanding your algorithms and their mathematical models is another thing entirely.

I was a little worried a couple of years ago that the DH community were adopting probabilistic machine learning methods uncritically, making claims that couldn’t be supported by the methods, but that problem seems to have dissipated. It might be related to increasing publicity about the dangers of probabilistic methods even for the hard sciences, with commentators noting the drastic fall in replicable studies in a range of disciplines, but I suspect it was simply that most historians who experimented with the techniques shrank away in horror at the opaqueness of the results and the complexity of implementation. For most of us the only option is to enlist the help of colleagues in the computational sciences and statistics, which I did in a study of the Old Bailey Online in 2015.[13] CEDHAR’s relationships with colleagues in computational statistics and humanities computing will be important to ensure that kind of work is done to a high quality.

As ever, though, the trick is to view this kind of evidence as just one more historical source, with its own affordances and limitations. Some excellent work has appeared deploying probabilistic methods and reporting on them in responsible ways. Eun So and Mark Algee-Hewitt provided useful insights into the past using neural networks in an article they produced last year, but actively resisted overly empirical readings of their work, noting that ‘As with other digital humanities applications of machine learning, these “distant” or “macro” findings aid us in formulating globally-aware hypotheses but require closer analysis for more substantial humanist interpretations.’ [14] Roe, Gladstone, and Morrissey are similarly circumspect in their claims, despite offering fascinating insights into Enlightenment discourse. These kinds of papers are, to my mind, like specialist legal or medical opinions: the ones that have been accepted by the community become strong sources of historical evidence, but only ever within the very tight constraints they define for themselves. A good computational statistician will guide historians towards the right approach but be prepared to be an active traveller on the journey: defining datasets, choosing appropriate algorithms, and testing for accuracy require significant amounts of domain knowledge. This is history as a team sport.

So really, after taking this deepest of dives into the complexities of (one particular) mode of digital history, where do we end up? If you ask me we’re back at the same place Berlin, and Nagel, and many other theoreticians have landed since time immemorial: narrative & representation. The digital modern entangles historian with a complex new range of tools, from digital archives to mathematical models, but it doesn’t change the fundamental theoretical concerns that lie at the heart of History as a discipline. Unlike in the hard sciences, the question isn’t so much whether these fancy new techniques are undermining the status of History as a discipline, but how they are best integrated into our existing practices, and whether they require us to change the way we represent the past.

And we don’t always need to reach for new concepts, like I did with the digital modern. In rehabilitating Hayden White from his more venomous critics in 1998, Frank Ankersmit pointed out his fascination with the Greek ‘middle voice’, a mode of speech somewhere between the active and passive voice: not ‘I wash’ or ‘I am washed’ but ‘I wash myself’. The middle voice suggested to White a way to reconcile the contradictory impulses at the heart of History as a discipline: a way to transcend the subject / object dichotomy implicit in both our sources and the stories we write about them.[16] It seems nicely suited to the new-found complexities of our Bayesian age, too, doesn’t it. For me, it suggests a depth of continuity we should aspire to as we seek to integrate digital history with our theoretical and historiographical traditions. Not only Google and Amazon, but also Herodotus and Thucydides. Nothing less will do.

Centers like CEDHAR offer us the opportunity to naturalise exciting new tools and methods deep into the historical grain of our disciplinary practices, speaking to long-standing debates and issues in historiography, theory, and method. We are only at the start of a long journey in digital history, navigating our way through – and I hope eventually out of – the digital modern into something more elegant, more integrated into our traditional approaches to history.

That will take a great deal of experimentation with a wide variety of tools and methods. I wish Helle and Adela and the many colleagues who will join you in CEDHAR well, as you explore the many possibilities.

References

1] James Smithies, The Digital Humanities and the Digital Modern (Basingstoke: Palgrave Macmillan, 2017), pp.9-10.

2] Ethan Kleinberg, Joan Wallach Scott, and Gary Wilder. “Theses on Theory and History.” Accessed January 28, 2019. http://theoryrevolt.com.

3] W.J. Turkel, Kevin Kee, Spencer Roberts, “A Method for Navigating the Infinite Archive.” In Toni Weller ed. History in the Digital Age. Routledge, 2012, p.62.

4] Ian Hodder. Entangled : An Archaeology of the Relationships between Humans and Things. Hoboken: Wiley, 2012, p.98.

5] Ian Hodder. “Wheels of Time: Some Aspects of Entanglement Theory and the Secondary Products Revolution.” Journal of World Prehistory 24, no. 2–3 (September 1, 2011), p.178.

6] Isaiah Berlin. “The Concept of Scientific History.” History and Theory 1, no. 1 (1960): 1–31.

7] Ernest Nagel. “Determinism in History.” Philosophy and Phenomeological Research XX, no. 3 (1960): 291–317.

8] Berlin, “The Concept of Scientific History", p.7.

9] Melissa K. Chalmers and Paul N Edwards. “Producing ‘One Vast Index’: Google Book Search as an Algorithmic System.” Big Data & Society 4, no. 2 (December 1, 2017).

10] Tim Hitchcock. “Confronting the Digital.” Cultural and Social History 10, no. 1 (March 1, 2013): 9–23.

11] Pieter François, J. G. Manning, Harvey Whitehouse, Rob Brennan, Thomas Currie, Kevin Feeney, and Peter Turchin. “A Macroscope for Global History: Seshat Global History Databank, a Methodological Overview.” Digital Humanities Quarterly 010, no. 4 (October 24, 2016); Hoekstra, Rik, and Marijn Koolen. “Data Scopes for Digital History Research.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 0, no. 0 (November 14, 2018): 1–16;Shawn Graham, Ian Milligan, and Scott Weingart. Exploring Big Historical Data: The Historian’s Macroscope. Imperial College Press, 2015.

12] David Blei, Andrew Y. Ng, and Michael I. Jordan. “Latent Dirichlet Allocation.” J. Mach. Learn. Res. 3 (March 2003): 993–1022.

13] Jasper Mackenzie, Raazesh Sainudin, James Smithies, and Heather Wolffram. “A Nonparametric View of the Civilizing Process in London’s Old Bailey.” UCDMS Research Report. Christchurch, N.Z: University of Canterbury, 2015. http://www.math.canterbury.ac.nz/~r.sainudiin/preprints/20150828_civilizingProcOBO.pdf.

14] Eun Seo Jo and Mark Algee-Hewitt. “The Long Arc of History: Neural Network Approaches to Diachronic Linguistic Change.” Journal of the Japanese Association for Digital Humanities 3, no. 1 (2018): 1–32.

15] Glenn Roe, Clovis Gladstone, and Robert Morrissey. “Discourses and Disciplines in the Enlightenment: Topic Modeling the French Encyclopédie.” Digital Literary Studies, 2016.

16] F.R. Ankersmit, “Hayden White’s Appeal to the Historians.” History and Theory 37, no. 2 (May 1998), p.189.