A funny sort of day. We played with the search terms some more, with an eye to getting a smaller number of better quality results. We were also keen to create a bigger gap between the neutral and medical/critical search terms, so we made sure these were exclusive, in one list or another but not both. Now we were only getting hits where the document contained ‘alcohol’ and/or ‘cirrhosis,’ and then sorting these into two columns, ‘neutral’ and ‘critical.’ And we managed to get all the documents into the system – well, nearly all of them. Elasticsearch wasn’t so helpful today and left us scratching our heads over a few mysterious problems.
I’m really quite new to this kind of thing, but it seems to me that the appeal of digital humanities is
- adopting a methodical approach, finding things you hadn’t anticipated finding;
- and working with more results than one person could manage on their own.
We certainly got somewhere with the first of these. The (imperfect) results we started to see showed us a few unusual results, just as the extent of the Medical Officer’s dealings with adulterated alcoholic drinks had been rather surprising. For a start some of the household manuals seemed to use the more critical or medical language you would expect from doctors – though this is because they did, of course, offer medical advice to their readers. But I was still surprised to find ‘cirrhosis’ in the index of the 1906 edition of Isabella Beeton’s famous Book of Household Management, between ‘Cinnamon, The Tree’ and ‘Cisterns, Closing of Polluted.’
The entry does not even mention alcohol, one of the key causes of cirrhosis of the liver, but this discussion is still situated within a classic text on middle class living, where alcohol is normally something you are told to use in making a trifle. It wasn’t the only example we found, and that’s interesting. These types of sources are more mixed than I had first anticipated
But we could not claim that this was the result of a search of the entire set of works we had chosen; it was simply a case of having some of the legwork done for us before we dove into the texts themselves. Hopefully the end result of this process will allow us to see at a glance which terms crop up where, and to see with some certainty where terms appear to be where we expected them to be, or not.
And that’s where we ended the penultimate day. Tomorrow Rioghnach and Nat will reflect on the process and see what can be done with these searches.
Seeing the scattered records of Maria Beauclerk brought together in a virtual file through the brilliant work by Frankie and Natalie was amazing. For me it symbolised what Ticehurst – for all its faults and Victorian prejudices – was genuinely trying to do: reintegrate shattered personalities and restore them to health. Beauclerk herself was seemingly beyond help as she never left the asylum.
My main task for the day was to compile brief biographies for the five patients I had selected as guinea pigs for the project. Although more by accident than design they turn out to be an interestingly diverse bunch: an incurable schizophrenic spinster from one of England’s premier landed families; the ‘nymphomanic’ wife of a possibly abusive husband ; a manic-depresssive author; a clergyman hymn-writer who may have been in the last stages of syphilis; and an exotic Egyptian prince despatched to England to deal with his acute paranoia.
This is all very well of course but the test of success will be to see if we can create a tool that allows researchers to track the careers of all one thousand or so patients through the Ticehurst archive. More than that, since most Victorian private (and for that matter public) asylums used standardised forms of record-keeping, how great it would be to develop a methodology that could be widely applicable across multiple digitised collections.
We ended the day by diving into the store to look at some of the records ‘in the flesh’. It doesn’t matter how good your digitised images are, it is always a surprise to see the actual paper documents: somehow they never seem to be the same size or quite the same colour you imagined looking at their surrogate on the screen. There are still some parts of our imagination it seems that lie just beyond the boundary of even the best digital technology.
We now find ourselves more than half-way through the week! After the initial introductions, realisations of feasability and refinment of ideas, we have begun to create some really interesting tools and uncover some unanticipated finds.
Project boards are now looking full, with post-its now in the done column.
We even have some fully fledged tools to show, check out team Ticehurst’s project’s webpage – have a go at tagging the records yourself!
Or, ‘how we wrote Elasticsearch queries.’ It’s good to have these different skills and forms of expertise in the team, not least because we have to explain to each other what we’re doing – and why it’s worth doing. Nat has been wrestling with the data all day today, while Rioghnach has been doing two essential jobs – 1. providing the project with some kind of documentation and reflection, and 2. checking some of our searches on the London’s Pulse and JISC Medical Heritage Library sites. I’ve been trying to work out whether the search terms we came up with are too focused (meaning we are only going to find the sorts of ideas about alcohol we started off with) or too broad (meaning we get a broad selection but also a whole load of red herrings.)
I suggested there might be three kinds of results:
- Discussions of alcohol we were expecting. Medical Officers returned the number of deaths from cirrhosis in many reports, for example, because that was their job.
- Discussions of alcohol we weren’t expecting. There’s much more on checking samples of beer and other drinks for adulteration than I was expecting, and that’s rather different to worrying about cirrhosis, because Officer were trying to make sure that drinkers got all that lovely alcohol they’d paid for and not (for example) arsenic poisoning.
- Hits that aren’t actually talking about alcohol at all, despite one or more terms that look like they are.
We cut out a few of the terms, partly because of this last point, but also because Nat’s wizardry began to show us some actual patterns (for the test sample, at least). There were a few OCR issues, too, where terms were absent from one search though we knew they were these somewhere. We ended up asking Elasticsearch to run a two step search, looking for key terms that must refer to drink first, and then searching for other terms. We then thought we might wait until all the data sets were ready before tweaking the searches any further (including one code-named ‘all the drinks.’)
So we have decided to look at a set of about 3,000 MOH reports, 50 cookery books and household manuals, and 14 handbooks for medical examiners, covering the period between 1842 and 1930. We hope to have this ready to go tomorrow, and then we need to settle some lingering questions about how we present the results. Are we interested in the distant reading of these three data sets, so can we quickly identify relationships between terms and different kinds of documents? Or in using these broadly-conceived maps to dive into the more complicated contexts in which these words surfaced, roaming free range through the documents?
Team Drink and Health
We’re midway through Data Week, and our group is still in the thick of trying to figure out how to detect plagiarism—and other forms of similarity and difference between texts—through the UK Medical Heritage Library. We’re hoping that this will be worked out soon, so we’ll have some time to develop a way of visualising what we find in the UK MHL.
One of the challenges we’ve encountered is figuring out how to detect identical phrases and chunks of text, rather than similarity between all of the words in two books. The latter process will suggest, misleadingly, that two works on, say, syphilis, are virtually identical. Not too helpful for research purposes.
Another challenge has been figuring out the best source for extracting our data from the UK MHL. We can download full text from the Wellcome Library website or from the Internet Archive. The text from the Wellcome Library comes with metadata that we think could be useful, but the Internet Archive offers us a wider range of texts to choose from and compare. We’re giving text from the Internet Archive a go right now, with the dataset we have prepared, and have another one in the works for if we end up using text from the Wellcome site.
Thankfully, one of the books we’ve been using as a key text for testing out matching capabilities, Manhood: The Causes of its Premature Decline with Directions for its Perfect Restoration is fun to work with!
Up to this point, our project – to develop ways of searching the enormous digital resource that is The Chemist and Druggist journal to present meaningful and visually stimulating results to researchers – has been a three-pronged attack. How to carry out a comprehensive search across the whole data set (that doesn’t take 3 hours to run), spearheaded by David; how to present the resulting adverts and images in a coherent and stimulating way, spearheaded by Olivia; how to interpret the results to answer research questions and set them in historical context, spearheaded by Briony. As the technical side of the work focusses on improved ways of running searches, David jokingly described what we’ve been up to as “artisan production” and he has a point – the manual, labour- and time- intensive nature of what we’ve been able to achieve up to this point is clearly very restrictive. But we feel optimistic that we’re edging closer to a better way to bring the mass of material into focus.
Using our best existing search results, Olivia has created a timeline which displays each page that contains the search term “asthma”. For the first time, we can visually see our results which is an exciting development. However, it also made it clear that the original search had not been as comprehensive as we had thought – back to the drawing board to refine the process. The visualisation has also thrown up a major challenge in the sheer quantity of material. The pages cascading into the structure as they load is certainly impressive, but raises lots of questions about how best the presentation can ultimately be de-cluttered to allow effective use of the material. Our approach has been to organise the results by individual year and then issue date, which has resulted in some obvious patterning of similar adverts run over consecutive issues and prompted interesting ideas for research questions about subtle design changes and updated content. However, visually the mass of pages will need much more work, and inevitably means a return to the problems surrounding searching such an enormous data set, in order to filter the results to be useful.
Our initial aim was that a timeline approach would allow clear presentation of trends over time, but we also want to be able to carve up the results further to provide answers to research questions that are more refined than the overarching “what medicines were advertised to treat asthma in the Chemist and Druggist between 1859 and 2010?” Exploring themes and filters to allow exploration of questions such as “what additional products were advertised by the same companies that made asthma remedies?” or “are there common active ingredients in the medicines across time?” would obviously be enormously helpful.
And we also want to allow users of the resource to take advantage of glimpses of other research avenues that they might grasp, so the context is all important. Hence, for example, our decision that showing a cropped asthma advert is not as valuable as keeping it situated within its full journal page with its neighbouring adverts. If the end result for a researcher was that they were distracted by an adjacent advert for, say, ballroom floor polish (!) and pursued this through the journal, this would be an equally satisfying result for our project. While grappling with technical and visualisation issues, our motivation is still to make the richness of the journal’s content more accessible.
Continuing our project of making the archives of Ticehurst House Hospital more accessible, Frankie updated our website on day two.
The archives hold a lot of information scattered through different series, e.g. admissions certificates, case notes, discharge records and bills. We wanted to bring all the separate material together for each patient. This is not the easiest thing to do in the paper or online archives, so we felt it would be useful to consolidate all this information by patient. The somewhat tedious but ultimately satisfying task of plucking out all the digitized images of the relevant information and was my task for the day.
Richard had already identified 5 interesting-looking patients, and useful pointers such as record numbers and dates for each, that put us on the right track. Then it was a case of trawling the archives online to identify the relevant images. As these are handwritten papers and not printed books there’s no easy text search available. It’s often a question of just opening a set of papers online and scanning through until you find the person you want. Scanning through handwritten papers, even in nice Victorian copperplate, takes some time whilst you’re trying to get your eye in.
By the end of the day we’d managed to get a lot of image references for our patients and were using that to pull in the digitized images to our web pages. Our first patient to get the image treatment is Lady Maria Beauclerk. You can now see images of her admissions certificate, case notes (including some interesting doodles), her death note and her bills. Lady Maria was admitted to Ticehurst in 1851 and died over 20 years later from epilepsy whilst still an inmate at Ticehurst.
Richard has been researching more about these particular patients so we can learn more about them than what’s only in the Ticehurst papers. We’ve even got retrospective diagnoses for three patients from Trevor Turner’s thesis ‘A diagnostic analysis of the casebooks of Ticehurst House Asylum, 1845-1890’
Next up: trying to find a way to semi-automate finding and recording the digital images for these records to display on many more patients pages.
Written by Natalie Pollecutt, library systems officer