At the end of 5 days, perhaps inevitably, we are just getting our teeth into the meat of the issues! As we planned yesterday, Alex has run a number of searches within our existing subset of pages that contain the word “inhaler.” Although not without its own problems, the results have enabled Olivia and David to investigate what more refined searches might look like. What would a search looking for pages containing “asthma” + “inhaler” look like? What about finding illustrations only that contain those terms? Would you be able to look for a brand name such as “Potter’s Asthma Cure” or “Ventolin” within these results? The resulting timeline visualisations show how advanced searches and filters might work and show masses of potential to prompt further and interesting research questions.
David has also created a timeline that shows, through density of coloured dots plotting the results, the frequency of results occurring in each issue of the journal.
Spending time today experimenting with more of this user interface confirms that our ambitions are justifiable, but not currently attainable. Hopefully, if the time and resources are found in the future to spend on the back end search functionality of The Chemist and Druggist, it will be a much more accessible and useful resource for a wide range of historians. So, we haven’t reached our holy grail this week, but it has been really enjoyable trying.
Check out what team MOH managed to create in a week :
Their interactive map demonstrates where in the MOH reports terms relating to women’s work have been mentioned, mapping these across boroughs and time, with the size of each bubble relating to the number of relevant reports in that borough.
Day 5 of data week really concentrated our minds as we knew we didn’t have long left. It turned out that one good way to get more done was to have more people working on our project. We put out a call amongst Wellcome Library staff to help us for half an hour or so using our Image annotator tool (described on Day 4). The task of transcribing page numbers is pretty straightforward, and quick, so it was easy enough to explain to our helpers to get them doing a bit of page numbering. It was really gratifying to receive help from across the Wellcome Library (Frankie coined this ‘staffsourcing’), and our thanks go to Danny, Tania, Lalita, Hannah, Chloe, Juulia, Philippa and Jonathan.
Frankie did yet more tool optimisation, this time for our second tool for transcribing the case book indexes. We can now delete items that have been added, rather than have Frankie have to amend the database, which makes for more streamlined working. The case books page also now have tables of contents showing patients and which pages their case notes are on:
We’d talked about doing some data visualisations on our first day of data week and did want to get some done even though we were running out of time. Frankie pulled out the case books’ spine images to make a clickable display which looks like books on a shelf:
Although this isn’t how the case books look in the archive I like to think that they may have looked like this whilst in use in the 19th century.
Frankie also did a useful visualisation with the stay dates of the patients. Stays are plotted for each patient between 1792 for the first stay to 1989 for the last. These are colour-coded showing their status upon leaving, e.g. ‘cured’ or ‘not improved’. Frankie also gave the patients list clickable column headings so you can sort by name alphabetically, length of stay, and stay dates:
And that brought us to the end of data week. I think we’ve all found it very interesting and gratifying to try out so many new things in a just a week. If you get the right mix of subject and technical knowledge around a table a lot of good work can be achieved very quickly. Another Wellcome data week would definitely get my vote.
Written by Natalie Pollecutt
After manually pulling together the records for our five selected patients (which took a couple of days), we’ve spent the remainder of our time investigating ways to speed up doing this work for the remaining 1,641 patients.
To try and achieve this, we’ve built two tools. The first is an Image annotator. This lets us perform the simple yet crucial task of identifying what types of page each image contain (from Front Cover through the Index to the Back Cover).
The interface we built for this is fairly simple. It shows a full image, which helps us identify the type of page it is, e.g. cover, title page, index page, content, etc. It also shows the top left and top right hand corners of the image. This zooms into where the page numbers usually appear, making our reading and transcribing quicker:
Frankie added some nifty optimisations to this tool to help us speed through the work. When index pages are indicated with a letter, e.g. ‘a’, ‘b’, the ‘Index’ page type gets automagically selected. When a number is indicated, as above, the page type is filled with ‘Content’. Sequences of letters or numbers can be zipped through just by tapping the space bar. This fills in the page numbers, the page type is dealt with as previously described, and you just have to keep an eye on it to make sure the images and numbers don’t get out of alignment. I was amazed at how much quicker we can progress with these small but crucial improvements.
The second tool of the day built on the output of the first. As we’ve now identified which images are the index pages, we want to transcribe the names from the index pages and assign images to them. Doing that will start to fill up the patient pages with images of their records. The tool shows an image of the index page to be transcribed along with fields to transcribe into. A very helpful dropdown list of all the names with their associated dates makes it easy to find the person – if you can read the handwriting on the page! With the person picked, you add the page number from the index page:
Saving this name information along with the page number makes the person’s case notes appear on their patient page:
Another nice feature completed on day 4 was adding brief biographies of our five selected patients to their patient pages. You can see these by selecting the top five highlighted names on the patients page.
Frankie & Natalie
Confession: we spent 3 minutes today watching a stop motion version of Paddington Bear dancing to ‘Singin’ in the Rain.’ https://www.youtube.com/watch?v=kHg6QjhvsCM Light relief? Yes, but it came about from a discussion of issues facing our project – honest.
Bear with me (excuse the pun) while I explain: ‘Singin’ in the Rain’ shows the impact of movies going from silent to sound in the late 1920s, a major technological development which we now take entirely for granted. Our parallel thoughts were that we today entirely take our ability to perform complex searches across massive amounts of data for granted. The digitised material exists, and we expect to be able to find what we want from it. However, the root of our problems on the penultimate day of Data Week is that just because The Chemist and Druggist is fully digitised does not automatically mean that it is fully searchable. Far from it.
So having grappled with searches for four days, we have decided to concentrate on our final day fully on the front end. Our approach is to pretend that the complete search functionality is running perfectly, and play with what we would we be able to present to a researcher. So we’re faking this with a series of searches carried out on the full run of C&D firstly to create a subset containing the word “inhaler”, then subsequent searches to split these results into adverts and/or articles, another search looking for the brandname “Ventolin” and a final look at presenting results by the decade they appear. It isn’t feasible to carry these out in ‘real time’ at the moment, so we’re probably going to produce an animation that pretends we can. A stop motion animation if you like – thank you Paddington!
Written by Briony Hudson
A funny sort of day. We played with the search terms some more, with an eye to getting a smaller number of better quality results. We were also keen to create a bigger gap between the neutral and medical/critical search terms, so we made sure these were exclusive, in one list or another but not both. Now we were only getting hits where the document contained ‘alcohol’ and/or ‘cirrhosis,’ and then sorting these into two columns, ‘neutral’ and ‘critical.’ And we managed to get all the documents into the system – well, nearly all of them. Elasticsearch wasn’t so helpful today and left us scratching our heads over a few mysterious problems.
I’m really quite new to this kind of thing, but it seems to me that the appeal of digital humanities is
- adopting a methodical approach, finding things you hadn’t anticipated finding;
- and working with more results than one person could manage on their own.
We certainly got somewhere with the first of these. The (imperfect) results we started to see showed us a few unusual results, just as the extent of the Medical Officer’s dealings with adulterated alcoholic drinks had been rather surprising. For a start some of the household manuals seemed to use the more critical or medical language you would expect from doctors – though this is because they did, of course, offer medical advice to their readers. But I was still surprised to find ‘cirrhosis’ in the index of the 1906 edition of Isabella Beeton’s famous Book of Household Management, between ‘Cinnamon, The Tree’ and ‘Cisterns, Closing of Polluted.’
The entry does not even mention alcohol, one of the key causes of cirrhosis of the liver, but this discussion is still situated within a classic text on middle class living, where alcohol is normally something you are told to use in making a trifle. It wasn’t the only example we found, and that’s interesting. These types of sources are more mixed than I had first anticipated
But we could not claim that this was the result of a search of the entire set of works we had chosen; it was simply a case of having some of the legwork done for us before we dove into the texts themselves. Hopefully the end result of this process will allow us to see at a glance which terms crop up where, and to see with some certainty where terms appear to be where we expected them to be, or not.
And that’s where we ended the penultimate day. Tomorrow Rioghnach and Nat will reflect on the process and see what can be done with these searches.
Seeing the scattered records of Maria Beauclerk brought together in a virtual file through the brilliant work by Frankie and Natalie was amazing. For me it symbolised what Ticehurst – for all its faults and Victorian prejudices – was genuinely trying to do: reintegrate shattered personalities and restore them to health. Beauclerk herself was seemingly beyond help as she never left the asylum.
My main task for the day was to compile brief biographies for the five patients I had selected as guinea pigs for the project. Although more by accident than design they turn out to be an interestingly diverse bunch: an incurable schizophrenic spinster from one of England’s premier landed families; the ‘nymphomanic’ wife of a possibly abusive husband ; a manic-depresssive author; a clergyman hymn-writer who may have been in the last stages of syphilis; and an exotic Egyptian prince despatched to England to deal with his acute paranoia.
This is all very well of course but the test of success will be to see if we can create a tool that allows researchers to track the careers of all one thousand or so patients through the Ticehurst archive. More than that, since most Victorian private (and for that matter public) asylums used standardised forms of record-keeping, how great it would be to develop a methodology that could be widely applicable across multiple digitised collections.
We ended the day by diving into the store to look at some of the records ‘in the flesh’. It doesn’t matter how good your digitised images are, it is always a surprise to see the actual paper documents: somehow they never seem to be the same size or quite the same colour you imagined looking at their surrogate on the screen. There are still some parts of our imagination it seems that lie just beyond the boundary of even the best digital technology.