TEI P5

Description of TEI
TEI stands for Text Encoding Initiative, which is a consortium for the contributed development of a standard metadata format for the representation of texts in digital form. The provided guidelines by the initiative are standard specifications for encoding methods for machine-readable texts. The TEI guidelines are widely used by libraries, museums, publishers and individual scholars to present texts for online research, teaching and preservation. The most recent version of the guidelines is TEI-P5.

Concerning the generic structure of the whole TEI tagset and the different specific projects and user groups, the guidelines can be customised to meet the particular requirements of several specific use cases. Until these days, there are some well-documented TEI customisations, which are widely used by the community. One of the most popular customisation is the so called TEI Lite. The TEI Lite subset is quite simple and can be learned relatively easy. Therefore, this subset has been widely adopted and is one of the most used TEI subsets. The current version of TEI Lite is derived from the P5 version of the guidelines. Other customisations provided by the consortium are TEI Tite, Bare, All, Corpus, MS, Drama, Speech and others. All schemas are available as ODD, DTD, RNG and XSD.

Staatsbibliothek zu Berlin/Berlin-Brandenburg Academy of Sciences - Deutsches Textarchiv
The German Manuscript Archive (Deutsches Textarchiv, DTA) of the Berlin-Brandenburg Academy of Science provides over 1300 manuscripts published from the 17th to the 20th century. All full texts of these manuscripts are encoded as TEI-XML.

The analysis and the transformation of the TEI-XML documents have also been carried out by the Berlin State Library (SBB).

Download links
SBB BBAW-DTAMapping.xls

University of Bergen
The Wittgenstein Archives at the University of Bergen (WAB) contributes primary sources and metadata of 5000 pages Wittgenstein Nachlass kept at Trinity College Cambridge (TCC), Wren Library, Cambridge.

The 5000 pages include: (1) Ts-201a1 and Ts-201a2 (1913-14) from the "Notes on Logic" corpus: This contains the first expression of Wittgenstein's view on philosophy and specific material on logic. (2) Ms-139a and Ts-207 (1929) from the "Lecture on Ethics" corpus: The "Lecture on Ethics" is the only "popular" lecture which Wittgenstein ever held. It is a masterpiece expressing his early views on tensions between meaningful language and nonsensical language in which we expose ourselves when speaking about ethics. (3) Ms-114, Ms-115 (first part), Ms-148, Ms-149, Ms-150, Ms-153a, Ms-153b, Ms-154, Ms-155, Ms-156a, Ts-212 and Ts-213 (1931-34) from the "Big Typescript" corpus: The Big Typescript is the great "summa" of Wittgenstein's latest thoughts elaborated from 1929 to 1933. It addresses most philosophical subjects Wittgenstein has ever been interested in. (4) Ms-115 (second part), Ms-140 (p. 39v), Ms-141, Ms152 and Ts-310 (1934-36) from the "Brown Book" corpus: The "Brown Book" was first lectured to students in Cambridge and is thus laid out very pedagogically. One aspect is the introduction of "language games" which shed light on the complexity of our language. The Book was revised several times. The last revision led to the Philosophical Investigations.

In terms of philosophical development, the 5000 pages include four high points in Wittgenstein's philosophical development from 1913 to 1936; in terms of philosophical themes, they encompass all themes addressed by Wittgenstein incl. philosophy of language, logics, mathematics, psychology, ethics, metaphilosophy, etc.; in terms of type of Nachlass material they include all kinds of manuscripts written by Wittgenstein: first drafts, lecture notes, notebooks, copybooks, typescript cuttings, and elaborated materials such as typescripts, and materials prepared in cooperation with others. In terms of language, they contain both German and English materials.

In response to the metadata requirements defined for the DM2E project, WAB improved and supplemented its XML TEI(P5) encoding of the 5000 pages' transcriptions both on the  and on the element levels. Revisions include, in the  element, for example standardisation and improvements of the element and addition of  elements; revisions in the element include standardisation and improvements of the attributes recording an CHO's dating or its references to persons and works. For the Wittgenstein Incubator project, WAB added a significant amount of encoding of intertextual reference and text genesis.

For DM2E ingestion, the metadata are extracted by an XSLT stylesheet from the XML transcriptions and mapped to the DM2E v1.1, revision 1, metadata model.

Humboldt Universität zu Berlin
The Humboldt-Universität zu Berlin (UBER) provides the digitised edition of the “Polytechnische Journal”. The journal was first published by the German chemist and industrialist J. G. Dingler and is therefore often simply referred to as “Dingler”. In the case of the Dingler records, the schema language that is used to describe the elements is non modified TEI. No other elements have been added to the schema. All used classes are coming from the original TEI full set. The logical description of the records is following the recommendations of the Guidelines. The only modification that has been done was the excluding of not used elements and the definition of limited value lists for some elements.

For the finalised version of the mapping to the DM2E model, DM2E got local copies of the last modified TEI-XML metadata records of the complete journal on volume and on article level. The mappings were created and tested with these records. The current mapping is based on the first test mappings which were carried out using the “DM2E v1.0 Fixed Ranges” schema in MINT. Two different ore:Aggregation and edm:ProvidedCHO classes were created: one for a journal issue, another for a journal article. After the first mapping circle with MINT, which already included about two-thirds of the first mapping, further mapping steps were carried out by manually working on the MINT output (supported by the Oxygen editor). This was mainly done due to readability reasons (the output file was split up into different files for the creation of journal issues and articles), to reduce redundant steps in the mapping workflow (URIs of all classes were created as variables instead of typing them repeatedly) and to include steps that were not possible to proceed with MINT (e.g. normalising URIs or the creation of titles for smaller CHOs). Furthermore, the mappings were first created for the DM2E model v1.0 and then manually adapted to DM2E v1.1, revision 1.0. It was much easier and faster to do this step by hand than by repeating the whole mapping in MINT.