Techniques for recovering lost texts

Research Fellow Renate Smithuis and Research Associate Stefania Silvestri, are working on a Catalogue of Codices, Scrolls, and Other Texts in Hebrew Script in The University of Manchester Library.

The Library holds one of the most important smaller collections of Hebrew manuscripts in Europe and this project will create a full, online catalogue compliant with current cataloguing and metadata standards. To support the production of the catalogue, digitisation of a number of manuscripts is being undertaken. All images, included fully digitised volumes, are added to the Hebraica Collection in the John Rylands Library Image Collections, LUNA.

A substantial portion of the Rylands Gaster manuscript collection have already been selected for digitisation, including a number of manuscripts that suffered water damage during the Second World War. The level of water damage varies, some texts are still legible but faint, others have whole sections of pages rendered illegible.

The Heritage Imaging Team have been investigating the best way to recover the text in these volumes, unsurprisingly, we have found that a single solution does not fit all. The aim of this blog post is to demonstrate the different processing options available to researchers. We are researcher-led in the work that we do with Multispectral Imaging of our collections, so if you come across a text you cannot read, please get in touch to discuss your needs in more detail (email:

I should note at this point that these examples are not exhaustive and we are always in the process of developing new techniques.

Trials in image processing have been run on pages from Gaster Hebrew MS 1832. The first step of carrying out any specialist techniques is to produce a high resolution ‘standard’ light photographs. These are the images that you can access in high resolution in our online image collections. Often, close inspection at high resolution enables a reader to decipher more than they can read with the ‘naked eye’.

In this example, the first image shows page 1 recto of Gaster Hebrew MS 1832 in ‘standard’ light. You are able to see that there is some faded text on the page but it is extremely faded in some areas:

Gaster Hebrew MS 1832

Gaster Hebrew MS 1832 1 recto standard image

The second image shows a standard high resolution image which has undergone additional image processing in Photoshop. The image has been inverted to help the text show through in certain areas of the page.

Gaster Hebrew MS 1832

Gaster Hebrew MS 1832 1 recto standard image with processing

The third image shows page 1 recto again, a standard high resolution image which has undergone processing in Photoshop to bring out the most faded central areas of the text.

Gaster Hebrew MS 1832

Gaster Hebrew MS 1832 1 recto standard image with additional processing

Here is a detail from each file type for comparison:

The benefit of this approach is that these results can be achieved without any additional imaging of the manuscript and standard photo manipulation software can be used. In addition, once results have been achieved, these can be batch applied to a set of images for an entire manuscript. The results may not be 100% consistent depending upon the range of damage to each page, but if the results are ‘good enough’ it will save many hours of image processing time.


Our next example shows firstly, page 2 recto of Gaster Hebrew MS 1832 in ‘standard’ light, plus an example of the same image which has been processed in Photoshop:

However, with this example we took several further steps to recover the lost text. In this instance, the manuscript has also been imaged using Multispectral Imaging. We now use a Phase One Achromatic IQ260 digital back, iXr camera body and standard lens combined with Megavision LED lighting panels and a filter wheel to capture 17 images at different points along the electromagnetic spectrum. I have included 2 images here, take at 370nm (UV) with a long pass violet filter, and at 448nm (Deep blue) as these single images give the best results. In the infrared wavelength, the text on this manuscript disappeared completely, which suggests that it is an iron gall based ink.


Using multispectral imaging we are able to take our image processing and textual recovery even further. Using ImageJ software I have combined several of the individual wavelengths to create a ‘pseudocolour’ image. This applies false colours to areas of difference across the page. Note the two images below in colour.

The colour results are not attractive to every eye, especially to the colour blind so can be converted in to greyscale. In the examples here, I have added an additional filter using the Channel Mixer in Photoshop to increase the contrast of the text even further.

Here you can see details of all 8 examples described above, click on the image to flick through each detail.

There are obvious benefits of taking every possible step of image processing to recover as much text as possible. However, there are also drawbacks. The manuscript must be subjected to a second round of digitisation using the Multispectral Imaging system, this is not only time consuming, but for fragile items it also increases the possible risk of damage to the physical item. There is additional time required for the photographer to process the images and store the additional data. Metadata must be produced to accompany the new images and to detail the processing work that has been carried out on the images.

Finally depending upon the nature of the damage to the page, a reader may need to consult a combination of 2 or 3 final processed images in order to read the entire page. Additionally, there must be a flow of communication between the reader/researcher and the person processing the images in order to process the ‘best’ results.

Specialists are currently working on software solutions to allow us to present the data to readers which will allow the reader themselves to combine and ‘play’ with images to suit their needs. We will report on developments in this area when they are available. Until then, we will continue to take a ‘triage’ approach to image recovery, assessing each item against the needs of the researcher to take the right steps to uncover lost texts.


Tagged , , , , , , , , , , , , , , , , , , , , , , , , ,

4 thoughts on “Techniques for recovering lost texts

  1. Mike says:

    Great work, great blog! Might post on the Eureka Facebook group.

    Mike Toth

    Sent from my T-Mobile 4G LTE Device

  2. […] archivists: Techniques for recovering lost texts. “A substantial portion of the Rylands Gaster manuscript collection have already been […]

  3. […] archivists: Techniques for recovering lost texts. “A substantial portion of the Rylands Gaster manuscript collection have already been […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: