Transcribing Medieval Manuscripts and Archival Material

Kathleen Walker-Meikle for the CENDARI project (October 2015)


The aim of this Archival Research Guide (created in the context of the CENDARI project) is to assist the user when reading primary sources and transcribing them into their own notes onto the Virtual Research Environment platform of CENDARI. It complements the many thematic medieval Archival Research Guides already uploaded by researchers onto the CENDARI system. As this guide is focused on online resources, the lack of stability of such resources must be stressed: Links might be broken, software might be non-compatible, etc.


Palaeography is the study of ancient handwriting. Letter forms make up a ‘script’, a particular type of handwriting for a certain place and time. The way in which each individual scribe writes a script is called a ‘hand’. Hands can be divided into ‘book’ and ‘documentary’ hands, depending on the source. Codicology is the study of the codex, and examines the book as a physical object and how it was produced. Objects of study could include writing material (parchment, paper), ink, binding, the folding and division of the leaves, page layout, the illumination, the book’s ‘history’ (ownership), marginalia, etc. Details of all of these concepts and their presentation in primary source materials can be found in the digital resources presented in this guide and in the bibliography. Nomenclature often varies from one author to another. Palaeography is an essential skills for medieval scholars, as nearly all of the source material predates the invention of printing.

Note of the author

The digital resources listed in this Archival Research Guide were selected as relevant sources for training and teaching transcription. It is aimed to assist researchers who are starting to work on original documents. A few relevant resources for early modern historians have also been included. This guide does not aim for completeness and does not cover all digital resources on manuscript transcription currently available. Future contributors are encouraged to add new digital resources to the CENDARI environment if they cover the following subjects: palaeography, codicology, digitisation of medieval manuscripts, and medieval book history. Similarly, if the digital resource linked to the ARG is no longer extant, contributors may delete the entry.

Palaeography Tutorials

There are various online tutorials and training materials available to improve palaeographical skills. They are also helpful if the researcher wishes to practice transcribing digitised documents.

Tutorials for medieval material

Tutorials for early modern material

English handwriting
Scottish handwriting
French handwriting
German handwriting
Italian handwriting

Digital resources on palaeography

This guide does not aim to list all manuscript libraries and archives that have digitised all or part of their collections. The following links detail manuscript projects, a selection of digitised medieval material, along with additional resources for palaeography.

Transcription – in archival institutions or digitally

The researcher, depending on the availability of their sources, will either transcribe material in situ in the reading room of the archive or library in question.

All archives and libraries will have strict conventions for handling manuscript material. Research if possible before visiting the institution, and always ask for either copy of their handling rules or a verbal explanation. These rules might include the use of weights or ‘snakes’, cotton gloves for certain items, a ban on all ink pens even for taking personal notes, boxes for unbound material, or special regulations for opening rolls, maps or large items.

Many libraries and archives now allow personal photography of material for research purposes. As always, check with the reading room staff before photographing any material. Normally, written permission does not need to be sought if it is for personal research, but if you plan on uploading material to CENDARI and making it public (‘publishing’), contact the staff to confirm that you can have permission to do this, as it might fall under copyright rules and permissions and payment for publication might be needed. In addition, there might be further handling conventions if photographing material.

If the relevant material has already digitised by the institution and you plan on uploading digitized images of manuscript folios or archival material directly onto the NTE (as a ‘document’) for future transcription, check with the institution before doing so and confirm that you have permission, particularly if you plan on making it ‘public’. Most institutions keep the copyright of digitised material so take care when downloading this material. Once you have uploaded your image files as ‘documents’ onto the NTE, you can be transcribing. There are transcription conventions regarding issues such as expanded abbreviations, textual omissions, cancellations, illegible letters, or the end of manuscript lines (see suggestions for transcription conventions here). It is usually a good idea to make a note of which conventions you will be using (for example, italicising all expanded abbreviations) for both yourself and future readers of any publication that uses the transcribed text, so that consistency can be kept. When transcribing texts onto a digital environment, you will want to be aware and follow the standards set by the TEI: Text Incoding Initiative.


Medieval and early modern dates may be tricky, due to the change of calendars and different starts to the new year. For example, many medieval countries regarding the 25th of March (the Feast of the Annunciation) as the start of the New Year, rather than the current 1st of January. In addition, many documents are dated by the regnal year of the ruling monarch. Some sites can convert dates, see The Perpetual Calendar and Ian's English Calendar. As always, adopt a convention when transcribing dates and make it clear to any future readers whether you are planning on transcribing the dates as written or are converting them.

Non-Latin medieval scripts

When transcribing non - Latin letters onto the CENDARI Note Taking Environment (NTE), use Unicode characters if you do not have the font available. Apart from non - Latin alphabets (Armenian, Greek, etc.), Unicode characters may be needed for particular letters in some Latin alphabets. For example in Old Norse or Old English, the characters thorn or eth can be transcribed using Unicode. The Medieval Unicode Initiative is focused on encoding special medieval characters.

Importing transcriptions and edited material onto the NTE

Apart from uploading images of manuscript material and transcribing directly onto the CENDARI NTE, it is also possible to import transcriptions. The user could import these transcriptions by a simple copy and paste, but online transcription software is available that allows the importation of transcribed text in html code onto the NTE.

T-Pen, a free web-based tool developed by St Louis University, allows the user to attach new or imported transcription data line-by-line to uploaded manuscript images. The transcribed html text can then imported onto the NTE. Xml tagging of elements of interest in the transcription is also possible. Transcript 2.5 is another online tool for digital transcription, although unlike T-Pen the transcription is not done line-by-line. Another tool that could be useful is IIF - Mirador, an open-source web-based viewer which allows the user to display and compare simple images and annotate them.

Editing medieval manuscripts

After having transcribed the text, medieval scholars might have to collate a number of different manuscript witnesses of the same text. The edited text can then be uploaded to the NTE. Several tools are available for editing medieval manuscripts:

  • Tradamus is a free digital critical edition creation web application, developed by the same team behind T-Pen, St Louis University. Transcriptions can be imported in a variety of formats and can be fully TEI encoded. Witnesses can be annotated and collated, commentary and editorial material can be added, and multiple users can collaborate on the text. The final edition can be exported in multiple formats (XML, TXT and JSON).
  • Classical Text Editor is a licensed software application developed by the Austrian Academy of Science and the Corpus Scriptorum Ecclesiasticorum Latinorum. It helps the user create critical editions, commentaries and parallel texts, notes and apparatus, which can be exported in HTML format. Other available software for digitally editing manuscripts is TextGrid and Oxygen Xml Editor. If collating texts, Collatex and Junta are also very suitable.


Abbreviations are very common in medieval and early modern handwriting. Abbreviations can be suspensions (e.g. quid[em]), contractions (e.g. m[ih]i), Tironian Notae (e.g. ÷ for est) and Nomina Sacra (e.g. d[omi]n[u]s). In a semi-diplomatic transcription, the researcher would normally expand the abbreviated text. This in contrast to a diplomatic transcription, which would transcribe the text exactly as it appears on the page. Abbreviations are conventually italicised or underlined when expanded by the transcriber in a semi-diplomatic transcription.

  • The Ad fontes project is a crowdsourcing project that aims to digitise Cappelli’s Lexicon abbreviaturarum and make every abbreviation searchable. It will not be limited to alphabetical searches and users would be able to filter their search results by identified letters or the position of abbreviation signs. The data will be freely available to all users.

Sources for Medieval Culture Archival Research Guides in CENDARI

From the author: The Dominican lectores were not just preachers or teachers. The also composed philosophical and theological treatises, pastoral works on confession, appropriate Christians’ conduct, and cases of conscience. Bartholomew of Pisa, know as Bartolomeo da San Concordio, wrote two works widely diffused during the Middle Ages: the Summa de casibus and the Libro degli ammaestramenti degli antichi. Iacopo Passavanti is the author of the Specchio di vera penitenza, an adaptation of his own sermons that was widely copied. Remigio de’ Girolami was a major lector and prior of the convent of Santa Maria Novella. As a philosopher he developed a doctrine on the bonum commune, the common good of the city. Several Dominican lectores had a relevant role in the cities, and contributed to the formation and consolidation of civic identities and new social and political awareness.

  • ARG - Medieval Preaching (Martin Hlouch) - This ARG details available sources on medieval preaching in Central European libraries which are available on the Manuscriptorium digital library, with pages on individual preachers and their geographical remit and available manuscript sources.

From the author: A major problem when editing medieval sermons involves researching the sources. Catalogue records do not contain exact or complete information on preaching collections. Manuscript collections often do not include names of authors, sermon titles or indications of the sermons themselves. The identification of collections is based on the cataloguer’s knowledge, and a research often cannot easily find all the manuscripts containing sermons of interest. For example Bishop Robert of Olomouc’s collection of sermons in a Prague National Library manuscript (ref. XX.A.11) contains the name of the author and the title of collection: Opus Rudolperti Holomouciensis super epistolas. But the collection is divided into two parts, divided by a note written by different hand. The cataloguer considered this text as two collections. Only after studying other manuscripts can be recognized as a single collection. ARG Science in Medieval Central European Sources (Tomáš Klimek, with contributions by Miroslava Hejnová & Tomislav Kolar) - This ARG offers an overview of medieval science (astronomy, mathematics, botany, zoology, medicine, geography, among others). Details of authors and texts are provided, along with references to relevant manuscripts in Central European libraries.

From the author: When dealing with medieval geographical works such as maps and texts containing geographical information, scholars have the challenge of analyzing, identifying, and editing local names. As with personal names, many local names had several variants. There are different language variants (both coexisting local variants and exonyms), and also synonymic versions in one language with different etymologies. Apart from personal names, some local names developed through time, and it is not always easy to identify the same locale under various different names. Another issue is caused by the existence of unknown localities. There can be a local name for an abandoned settlement, whose position is only known approximately, which would have to be resolved by archaeology Sometimes not even not an approximate position is known. Another example are the fictional or symbolic names names of nonexistent places known from the theological and literary tradition.

  • ARG Medieval Collections of Saints' Lives (Silvia Nocentini) - An ARG on medieval legendaries, with lists of extant manuscripts and bibliography.

From the author: A major problem when dealing with medieval collections of saints' lives is that they are almost all unedited and can only be read in manuscript form. The cataloguer's description of these manuscripts is crucial for the researcher, but unfortunately can often be ambiguous. A description such as “lives of saints” is insufficient to confirm whether the manuscript contains a legendary or only a collected lives, or what category of legendary is might be, which saints are in included and so on. Also attention must be given to the history of single manuscripts, to add to our knowledge on how the collection was disseminated throughout European scriptoria.

  • ARG - Italian Vernacular Bibles (13th-15th c.) (Caterina Menichetti) - This ARG focuses on manuscripts and archival materials on Italian translations of the Latin Bible. Detailed codicological and palaeographical information is provided regarding materials, layout, dimensions, scripts and decoration
  • ARG - Italian Books of Poetry (Irene Tani) - This ARG focuses on the manuscript tradition. Detailed codicological and palaeographic information is provided for the main manuscripts that preserve medieval Italian lyrics.

From the author: A manuscript's handwriting can tell us about the copyist, the period and the environment in which it was created. Some of the manuscripts that contain medieval Italian lyrics were written in littera textualis (also known as textura or Gothic script). This script developed in the second half of the 12th century and was usually used for the texts rather than documents. Ms Vatican latin 3793 was written in a documentary merchant script, which was used by those involved in a mercantile environment, and mostly for account books and documents. Secretary script developed in chancelleries from the beginning of the 13th century, and a century later was widely used in manuscripts containing vernacular lyrics (such as Ms El Escorial, e. III. 23 and Chigi L. VIII. 305)

  • ARG - Homiliaries (Lidia Buono and Eugenia Russo) - This ARG presents sources and project links for the study of homiliaries.

From the authors: The ARG Homiliaries covers all manuscripts in Beneventan and Caroline manuscript dating from the 10th to the 12th century. Beneventan script takes it name from the ancient duchy of Benevento and remained in use for nearly five centuries in the monasteries and schools throughout Southern Italy, and across the Adriatic in Dalmatia. It is possible to distinguish four periods in the scripts development, from the tentative period (8th c.) to its decline (13th c.). Main features of the Beneventan script’ are the cordellato effect, some ligatures (ti, li, ei, ri, fi, gi) and the shape of some letters as a and t. The most important study on the Beneventan Script is E. A. Loew, The Beneventan Script. A History of the South Italian Minuscule, I-II, Rome: Edizioni di Storia e Letteratura, 19802. Caroline minuscule was used all Europe c. 800-1200. VII/IX. It is named after the emperor Charlemagne, who created a hugely influential court school at Aachen. Caroline minuscule is a clear and easy to read script, in which letters are clearly separated. For more details, see Bernhard Bischoff, “Die karoligische Minuskel”. Karl der Grosse. Werk und Wirkung. Ausstellungskatalog, Aachen: Schwann, 1965.

  • ARG - Dispersed Monastic Sources (Roberta Giacomi, Vinicio Serafini) - This ARG focuses on surviving archival materials for Florentine monastic institution. After an overview of the process of dissolution of these institutions and the subsequent dispersal of their archives, it is organised by sub-pages of each relevant institution, with details of extant sources.

From the authors: Working with primary sources was one of the major issues for this ARG. We focused on analysing information in data collections. A principal problem involved placed names, as some of them are now different than the ones used in the sources that register the suppression. This made finding exact archival sources for a specific institution difficult. In addition, homonyms are common, so the data had be accurately filtered. Another issue in some cases was the lack of digitized archival material. In these cases, we had to physically find the sources in question or at least find references on them. Last but not least, internet searching for sources was not easy, as search results generated many inaccurate results for events and places, which had to be filtered as well.


The original of this CENDARI Archival Research Guide is available here.

