Ticehurst – What Next?

It’s been a week since the Data Week finished, and whilst it’s still fresh in our minds I thought I’d jot down a few thoughts on possible next steps.

Firstly, given that we managed, remarkably, to annotate all the case books with page numbers and types, thanks to one of the tools we built, it would be great to incorporate this data back into the main Wellcome Library interface.

If you view Volume 38 of the Case Books in the Wellcome Library viewer interface right now, you’ll see that there’s a “jump to page” feature at the top, but that it doesn’t work for this book because the data isn’t available to the player. There’s also no table of contents, a feature usually available for scanned books, but not for archival material.

To demonstrate how this could be improved, we generated a IIIF manifest for each of the case books (see an example for Volume 38), which is a data file (in JSON format) in the IIIF metadata format that the player can use. If you take this URL and plug it into the Universal Viewer demo you can see how it improves the interface: there’s now a table of contents, and the page-switcher works. (One slight bug is that the pages are labelled as “page 12 – 13 of Spine”, as “Spine” is the label of the last image.)

Screen Shot 2016-11-11 at 12.51.14.png

There’s room to improve the viewer further: currently each image shows two pages (left side and right side) together, but for reading purposes it would probably be better to only show one side at a time (so that it can be larger, making the handwriting easier to read). This isn’t something the viewer currently supports (I don’t think?) but would be a useful improvement.

At the moment, there isn’t an easy way to save the new metadata about the case books back into the Wellcome Library systems – but this is something that can be investigated further.

The other metadata we’ve generated is annotating which pages are about which patients. This could perhaps be imported back into the current main library systems using “person as subject” fields, but only at the book level, not the page-range level. (Although it’d still be better than nothing).

Ultimately, given our goal was to enable researchers to follow the stories of individual patients, that might require a specialised interface which is more detailed than the generic library book finder & reader interface. Whether this should be bespoke to the requirements of the Ticehurst archive, or whether it might apply more generically across archival material, is an open question.

Finally, a thought on how all this metadata might be generated. One of the reasons given for none of this metadata already existing (not even page-number annotations) is that it is so time-consuming to create, especially in the context of mass-digitisation projects. I think we’ve partly answered this by showing that specialised tools can make this process a lot faster, and also the usefulness of the results, but there will always be a trade-off between quality and quantity of metadata. Personally, I’d adjust this balance slightly by ensuring that some basic metadata (like page numbers) is always captured at the time of scanning. However doing full indexing of the content might be something that’s best done later.

One approach, which we discussed a few times over the week, is to open up metadata annotation so that it’s not done just by library staff, but can also be contributed to by researchers using the material. After all, it’s quite possible that researchers are already selectively transcribing the handwritten material that they’re interested in, for use in their own publications or research, and so it makes sense to ask them if they’d mind contributing it back to the library so that future researchers don’t have to do the same job all over again.

Advertisements

Day 5 – Ticehurst

Day 5 of data week really concentrated our minds as we knew we didn’t have long left. It turned out that one good way to get more done was to have more people working on our project. We put out a call amongst Wellcome Library staff to help us for half an hour or so using our Image annotator tool (described on Day 4). The task of transcribing page numbers is pretty straightforward, and quick, so it was easy enough to explain to our helpers to get them doing a bit of page numbering. It was really gratifying to receive help from across the Wellcome Library (Frankie coined this ‘staffsourcing’), and our thanks go to Danny, Tania, Lalita, Hannah, Chloe, Juulia, Philippa and Jonathan.

Frankie did yet more tool optimisation, this time for our second tool for transcribing the case book indexes. We can now delete items that have been added, rather than have Frankie have to amend the database, which makes for more streamlined working. The case books page also now have tables of contents showing patients and which pages their case notes are on:

ticehust-1

We’d talked about doing some data visualisations on our first day of data week and did want to get some done even though we were running out of time. Frankie pulled out the case books’ spine images to make a clickable display which looks like books on a shelf:

ticehust-2

Although this isn’t how the case books look in the archive I like to think that they may have looked like this whilst in use in the 19th century.

Frankie also did a useful visualisation with the stay dates of the patients. Stays are plotted for each patient between 1792 for the first stay to 1989 for the last. These are colour-coded showing their status upon leaving, e.g. ‘cured’ or ‘not improved’. Frankie also gave the patients list clickable column headings so you can sort by name alphabetically, length of stay, and stay dates:

ticehurst-3

And that brought us to the end of data week. I think we’ve all found it very interesting and gratifying to try out so many new things in a just a week. If you get the right mix of subject and technical knowledge around a table a lot of good work can be achieved very quickly. Another Wellcome data week would definitely get my vote.

Written by Natalie Pollecutt

 

Day 4 – Ticehurst

After manually pulling together the records for our five selected patients (which took a couple of days), we’ve spent the remainder of our time investigating ways to speed up doing this work for the remaining 1,641 patients.

To try and achieve this, we’ve built two tools.  The first is an Image annotator. This lets us perform the simple yet crucial task of identifying what types of page each image contain (from Front Cover through the Index to the Back Cover).

The interface we built for this is fairly simple. It shows a full image, which helps us identify the type of page it is, e.g. cover, title page, index page, content, etc. It also shows the top left and top right hand corners of the image. This zooms into where the page numbers usually appear, making our reading and transcribing quicker:

b4

Frankie added some nifty optimisations to this tool to help us speed through the work. When index pages are indicated with a letter, e.g. ‘a’, ‘b’, the ‘Index’ page type gets automagically selected. When a number is indicated, as above, the page type is filled with ‘Content’. Sequences of letters or numbers can be zipped through just by tapping the space bar. This fills in the page numbers, the page type is dealt with as previously described, and you just have to keep an eye on it to make sure the images and numbers don’t get out of alignment. I was amazed at how much quicker we can progress with these small but crucial improvements.

The second tool of the day built on the output of the first. As we’ve now identified which images are the index pages, we want to transcribe the names from the index pages and assign images to them. Doing that will start to fill up the patient pages with images of their records. The tool shows an image of the index page to be transcribed along with fields to transcribe into. A very helpful dropdown list of all the names with their associated dates makes it easy to find the person – if you can read the handwriting on the page! With the person picked, you add the page number from the index page:

 

b5b6 

Saving this name information along with the page number makes the person’s case notes appear on their patient page:

 b7

Another nice feature completed on day 4 was adding brief biographies of our five selected patients to their patient pages. You can see these by selecting the top five highlighted names on the patients page.

Frankie & Natalie

 

Day 3 – Ticehurst

Seeing the scattered records of Maria Beauclerk brought together in a virtual file through the brilliant work by Frankie and Natalie was amazing. For me it symbolised what Ticehurst – for all its faults and Victorian prejudices – was genuinely trying to do: reintegrate shattered personalities and restore them to health. Beauclerk herself was seemingly beyond help as she never left the asylum.

My main task for the day was to compile brief biographies for the five patients I had selected as guinea pigs for the project. Although more by accident than design they turn out to be an interestingly diverse bunch: an incurable schizophrenic spinster from one of England’s premier landed families;  the ‘nymphomanic’ wife of a possibly abusive husband ; a manic-depresssive author; a clergyman hymn-writer who may have been in the last stages of syphilis; and an exotic Egyptian prince despatched to England to deal with his acute paranoia.

This is all very well of course but the test of success will be to see if we can create a tool that allows researchers to track the careers of all one thousand or so patients through the Ticehurst archive. More than that, since most Victorian private (and for that matter public) asylums used standardised forms of record-keeping, how great it would be to develop a methodology that could be widely applicable across multiple digitised collections.

We ended the day by diving into the store to look at some of the records ‘in the flesh’. It doesn’t matter how good your digitised images are, it is always a surprise to see the actual paper documents: somehow they never seem to be the same size or quite the same colour you imagined  looking at their surrogate on the screen. There are still some parts of our imagination it seems that lie just beyond the boundary of even the best digital technology.

Day 2 – Ticehurst

Continuing our project of making the archives of Ticehurst House Hospital more accessible, Frankie updated our website on day two.

The archives hold a lot of information scattered through different series, e.g. admissions certificates, case notes, discharge records and bills. We wanted to bring all the separate material together for each patient. This is not the easiest thing to do in the paper or online archives, so we felt it would be useful to consolidate all this information by patient. The somewhat tedious but ultimately satisfying task of plucking out all the digitized images of the relevant information and was my task for the day.

Richard had already identified 5 interesting-looking patients, and useful pointers such as record numbers and dates for each, that put us on the right track. Then it was a case of trawling the archives online to identify the relevant images. As these are handwritten papers and not printed books there’s no easy text search available. It’s often a question of just opening a set of papers online and scanning through until you find the person you want. Scanning through handwritten papers, even in nice Victorian copperplate, takes some time whilst you’re trying to get your eye in.

By the end of the day we’d managed to get a lot of image references for our patients and were using that to pull in the digitized images to our web pages. Our first patient to get the image treatment is Lady Maria Beauclerk. You can now see images of her admissions certificate, case notes (including some interesting doodles), her death note and her bills. Lady Maria was admitted to Ticehurst in 1851 and died over 20 years later from epilepsy whilst still an inmate at Ticehurst.

beauclerk-casenotes

Richard has been researching more about these particular patients so we can learn more about them than what’s only in the Ticehurst papers. We’ve even got retrospective diagnoses for three patients from Trevor Turner’s thesis ‘A diagnostic analysis of the casebooks of Ticehurst House Asylum, 1845-1890’

Next up: trying to find a way to semi-automate finding and recording the digital images for these records to display on many more patients pages.

Written by Natalie Pollecutt, library systems officer

Day 1 – Ticehurst

Hello. Our project for the week is to explore the archives of Ticehurst House Hospital, a private lunatic asylum which opened in 1792.

In particular, we’d like to try and make it easier to follow the individual stories of the patients who resided there, sometimes for many years. To do this currently involves a lot of work, as the patient records are mostly ordered chronologically, not by patient.

As part of the preparatory work, we looked at some of the existing research that has been done on these archives. One very useful resource is an index of patient names, which was painstakingly compiled by researcher Charlotte MacKenzie in the 1980s. Wellcome Library had a copy of this list (actually two copies, one organised by admission date, the other by patient surname), but only in printed form.

So our first task was to digitise the list. With limited time, we did this initially ourselves by scanning one of the multiple copies of the list using a photocopier – not the way that digitsiation usually works here at Wellcome. (It has also since been put through the Internet Archive digitisation workflow)

The result of this scanning was a set of 36 images. Digital, but still not that useful. To turn them into text we put the images through an OCR process, where a computer algorithm try to detect the actual typewritten characters.

This worked pretty well. Like, probably 95% accurate. The 5% of errors were mostly caused by extra marks on the pages such the holes from the spiral-binding being detected as nonsense characters. There were also some handwritten notes on the pages, either adding corrections, or references to which file in the archives the data came from (useful!). The OCR software seems pretty terrible with handwriting, even though this is fairly neat.

So there was quite a bit of manual correction to do, plus additionally re-formatting the text into tabular form where the OCR assumed columns. (I suspect there’s a way to configure OCR software to better scan tabular data, but I have no idea how to do that).

At around 10 mins per page to do the OCR tidy-up, the total process took about 6 hours! Good job we started this before the week began.

With the index now formatted as plain text, most of Day 1 was spent importing this into a mini database (after first transforming the text files into CSV via some quick regular expressions).

One thing to note is that the index is a list of patient stays (from admission to discharge), and many patients returned multiple times. So where the patient name is exactly the same, we’re grouping the information about the stays under a single patient record. It’s quite possible though that there were two patients with the exact same name which we’ve incorrectly grouped together, or that sometimes the same patient is recorded with a slightly different name in two places. Hopefully we’ll spot these errors later.

For now, you can browse the list of patients on a simple website we’ve built.

Screen Shot 2016-11-01 at 10.33.45.png