Or, ‘how we wrote Elasticsearch queries.’ It’s good to have these different skills and forms of expertise in the team, not least because we have to explain to each other what we’re doing – and why it’s worth doing. Nat has been wrestling with the data all day today, while Rioghnach has been doing two essential jobs – 1. providing the project with some kind of documentation and reflection, and 2. checking some of our searches on the London’s Pulse and JISC Medical Heritage Library sites. I’ve been trying to work out whether the search terms we came up with are too focused (meaning we are only going to find the sorts of ideas about alcohol we started off with) or too broad (meaning we get a broad selection but also a whole load of red herrings.)
I suggested there might be three kinds of results:
- Discussions of alcohol we were expecting. Medical Officers returned the number of deaths from cirrhosis in many reports, for example, because that was their job.
- Discussions of alcohol we weren’t expecting. There’s much more on checking samples of beer and other drinks for adulteration than I was expecting, and that’s rather different to worrying about cirrhosis, because Officer were trying to make sure that drinkers got all that lovely alcohol they’d paid for and not (for example) arsenic poisoning.
- Hits that aren’t actually talking about alcohol at all, despite one or more terms that look like they are.
We cut out a few of the terms, partly because of this last point, but also because Nat’s wizardry began to show us some actual patterns (for the test sample, at least). There were a few OCR issues, too, where terms were absent from one search though we knew they were these somewhere. We ended up asking Elasticsearch to run a two step search, looking for key terms that must refer to drink first, and then searching for other terms. We then thought we might wait until all the data sets were ready before tweaking the searches any further (including one code-named ‘all the drinks.’)
So we have decided to look at a set of about 3,000 MOH reports, 50 cookery books and household manuals, and 14 handbooks for medical examiners, covering the period between 1842 and 1930. We hope to have this ready to go tomorrow, and then we need to settle some lingering questions about how we present the results. Are we interested in the distant reading of these three data sets, so can we quickly identify relationships between terms and different kinds of documents? Or in using these broadly-conceived maps to dive into the more complicated contexts in which these words surfaced, roaming free range through the documents?
Team Drink and Health