OldBaileyVoices

This demonstration site is designed to illustrate how a range of tools for textual and data analysis, might be combined to create a working ‘macroscope’ – an environment where ‘big data’ can be explored both at scale, and at the level of the single datum. Its purpose is to allow a new ‘open eyed’ way of working with data of all sorts – to allow macro-patterns and clusters to be identified; while single words and phrases can be fully contextualised.

The site builds on the Old Bailey Online dataset, which encompasses accounts of some 197,745 trials held at the Old Bailey in London, between 1674 and 1913. The Proceedings contain some 127,000,000 words of accurately transcribed texts, which has in turn, been substantially marked-up in XML to encode the administrative process (crime, verdict, punishment, etc) reflected in the accounts. Additionally, text that purports to reflect direct speech has also been marked up to allow for analysis of verbal-linguistic change.

The Old Bailey dataset has been chosen because it combines a consistent textual record at scale; because it incorporates structured data; because it includes varieties of text types (speech vs administration); and because it represents a uniquely consistent dataset reflecting historical change over 240 years. But the purpose of this demonstrator is to suggest a more generic approach to ‘big data’ as a whole – an approach in which large scale patterns can be rapidly explored, and their constituent elements identified; or where single words or phrases (or just datum), can be radically contextualised within the full body of evidence available.

Old Bailey Voices