Texts and Transcriptions: mapping scribal complexities onto a line of text.
Edward Vanhoutte
evanhoutte@kantl.beedward.vanhoutte@ua.ac.be
The transcription of modern manuscript material is the core activity of a couple of newly initiated electronic editing projects at the Centre for Scholarly Editing and Document Studies (Centrum voor Teksteditie en Bronnenstudie) of the Royal Academy of Dutch Language and Literature in Ghent, Belgium (Koninklijke Academie voor Nederlandse Taal- en Letterkunde). In order for these projects to result in the publication of versioning editions, representing multiple texts (Reiman 1987) - in facsimile as well as in machine-readable form, in concordances, stemmata, lists of variants, etc. - transcriptions of all the extant material have to be made in a platform independent and non-proprietary markup language which can deal with the linguistic and the bibliographic text of a work and which can guarantee maximal accessibility, longevity and intellectual integrity (Sperberg-McQueen, 1994 & 1996: 41). The encoding schemes proposed by the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQueen & Burnard 1994) have generally been accepted as the most promising solution to this. The transcription of primary source material using TEI enables automatic collation, stemma (re)construction, the creation of (cumulative) indexes and concordances etc. by the computer.
Although the TEI subsets for the transcription of primary source material "have not proved entirely satisfactorily" for a number of problems (Driscoll 2000), transcription and digitization guidelines for older texts can be produced on the basis of the TEI encoding scheme , (Robinson 1994, and Robinson & Solopova 1993). The transcription of modern manuscript material using TEI proves to be of a more problematic nature because of at least two essential characteristics of such complex source material: namely the notions of time and of overlapping hierarchies. Since SGML (and thus XML) was devised on the assumptions that a document is a logical construct that contains one or more trees of elements that make up the documents content (Goldfarb 1995: 18), several scholars began to theorize about the assumption that text is an ordered hierarchy of content objects (OHCO thesis), which always nest properly and never overlap,[1] and the difficulties attached to this claim.[2]
The TEI Guidelines propose five possible methods to handle non-nesting information,[3] but state that
"Non-nesting information poses fundamental problems for any encoding scheme, and it must be stated at the outset that no solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly). The representation of non-hierarchical information is thus necessarily a matter of choices among alternatives, of tradeoffs between various sets of different advantages and disadvantages." (chapter 31 "Multiple Hierarchies" of the TEI Guidelines)
The editor using an encoding scheme for the transmission of any feature of a modern manuscript text to a machine-readable format, is essentially confronted with the dynamic concept of time which constitutes non-hierarchical information. Whereas the simple representation of a (printed) prose text can be thought of as a logical tree of hierarchical and structural elements such as book, part, chapter, and paragraph, and an alternative tree of hierarchical and physical elements such as volume, page, column, and line-structures which can be applied to the wide majority of printed texts and medieval manuscripts-, the modern manuscript shows a much more complicated web of interwoven and overlapping relationships of elements and structures.
Modern manuscripts, as Almuth Grésillon defines them, are "manuscrits qui font partie d'une genèse textuelle attestée par plusieurs témoins successifs et qui manifestent le travail d'écriture d'un auteur." ["manuscripts which are part of a textual genesis for which many consecutive witnesses give evidence, and which are the manifestation of the author's labour of writing." (my translation).](Grésillon 1994: 244) The French school of Critique Génétique primarily deals with modern manuscripts and their primary aim is to study the avant-texte, not so much as the basis to set out editorial principles for textual representation, but as a means to understand the genesis of the literary work or as Daniel Ferrer put it: "it does not aim to reconstitute the optimal text of a work; rather, it aims to reconstitute the writing process which resulted in the work, based on surviving traces, which are primarily author's draft manuscripts" (Ferrer 1995: 143). Therefore, the structural unit of a modern manuscript is not the paragraph, nor the page or the chapter, but the temporal unit of writing. These units form a complex network which are often not bound to the chronology of the page.
The application of hypertext technology and the possibility to display digital facsimiles in establishing electronic dossiers génétiques, provides the editor with a multiplicity of ways in which s/he can regroup a series of documents which are akin to each other on the basis of resemblance or difference. The experiments with proprietary software systems (Hypercard, Toolbook, Macromedia, PDF, etc.), however, are too much oriented towards display, and often do not comply with the rule of "no digitization without transcription" (Robinson 1997).
Further, the TEI solutions for the transcription of primary source material do not cater for modern manuscripts because the current (P4) and previous versions of the TEI have never addressed the encoding of the time factor in text. Since a writing process by definition takes place in time, four central complications may arise in connection with modern manuscripts and should thus be catered for in en encoding scheme for the transcription of modern primary source material. The complications are the following:
- Its beginning and end may be hard to determine and its internal composition difficult to define (document structure vs. unit of writing): authors frequently interrupt writing, leave sentences unfinished and so on.
- Manuscripts frequently contain items such as scriptorial pauzes which have immense importance in the analysis of the genesis of a text.
- Even non-verbal elements such as sketches, drawings, or doodles may be regarded as forming a component of the writing process for some analytical purposes.
- Below the level of the chronological act of writing, manuscripts may be segmented into units defined by thematic, syntactic, stylistic, etc. phenomena; no clear agreement exists, however, even as to the appropriate names for such segments.
These four complications are exactly the ones the TEI Guidelines cite when trying to define the complexity of speech, emphasizing that "Unlike a written text, a speech event takes place in time." (Sperberg-McQueen and Burnard 2001: 254). This may suggest that the markup solutions employed in the transcription of speech could prove useful for the transcription of modern manuscripts, in particular the chapter in the TEI Guidelines on Linking, Segmentation, and Alignment (esp. 14.5. Synchronization).
This paper will deal with the practicalities underlying the production of electronic scholarly editions and will report on the results of a research stay at the Wittgenstein Archives in Bergen with a EU Research Infrastructure grant in June-July 2002.
Notes
- 1. Annex C of ISO 8879 introduces the optional CONCUR feature (not available in XML) which "supports multiple concurrent structural views in addition to the abstract view. It allows the user to associate element, entity, and notation declarations with particular document type names, via multiple document type declarations." (Golfarb 1995: 89). [Back]
- 2. See on overlap-related problems: Barnard et al. 1988; Barnard et al. 1995; DeRose et al. 1990; Durand et al. 1996; Huitfeldt 1995; Renear et al. 1996; Sperberg-McQueen & Huitfeldt (1999); Sperberg-McQueen & Huitfeldt s.d., and chapter 31 "Multiple Hierarchies"of the TEI Guidelines. [Back]
- 3. The suggested methods are CONCUR, milestone elements, fragmentation of an element, virtual joints, and redundant encoding of information in multiple forms. Cf. Chapter 31 "Multiple Hierarchies" of the TEI Guidelines. The Rossetti Archive based in Virginia, and the Wittgenstein Archives at Bergen created their own encoding system-respectively RAD (Rossetti Archive Document), and MECS (Multi Element Code System),–out of dissatisfaction with the operationality of the options suggested by TEI. Cf. McGann 2001, 88-97, and "Text Encoding at the Wittgenstein Archives" <http://www.hit.uib.no/wab/1990-99/textencod.htm> [Back]
Literature
- Centrum voor Teksteditie en Bronnenstudie. Website. <http://www.kantl.be/ctb/>
- Barnard, D.T. , R. Hayter, M. Karababa, G. Logan, and J. McFadden (1988). "SGML-Based Markup for Literary Texts: Two Problems and Some Solutions." In: Computers and the Humanities, 22 (1988): 265-276.
- Barnard, D.T., L. Burnard, J.-P. Gaspart, L.A. Price, C.M. Sperberg-McQueen, and G.B. Varile (1995). "Hierarchical Encoding of Text: Technical Problems and SGML solutions." In: Computers and the Humanities, 29: 211-231.
- DeRose, Steven J., D.G. Durand, E. Mylonas, and A. Renear (1990). "What is Text, Really.?" In: Journal of Computing in Higher Education, 1 (1990): 3-26.
- Driscoll, M.J. (2000). "Encoding Old Norse/Icelandic Primary Sources using TEI-Conformant SGML. in: Literary and Linguistic Computing, 15/1: 81-91.
- Durand, David, Elli Mylonas, and Steve DeRose (1996). "What should markup really be? Applying theories of text to the design of markup systems." ALLC/ACH'96 Joint Conference of the ALLC and ACH, Bergen 1996.
- Ferrer, Daniel (1995). "Hypertextual Representation of Literary Working Papers." in: Literary and Linguistic Computing, 10/2: 143-145.
- Huitfeldt, C. (1995). "Multi-Dimensional Texts in a One Dimensional Medium." In: Computers and the Humanities, 28 (1995): 235-241.
- Grésillon, Almuth (1994). Eléments de critique génétique. Lire les manuscrits modernes. Paris: Presses Universitaires de Paris.
- Reiman, Donald H. (1987). Romantic Texts and Contexts. Columbia: University of Missouri Press. (Chapter 10: "'Versioning': The Presentation of Multiple Texts.", 167-180).
- Renear, A., D. Durand, and E. Mylonas (1996). "Refining our Notion of What Text Really Is." In: S. Hockey and N. Ide (eds.). Research in Humanities Computing 4: Selected Papers from the 1992 ALLC/ACH Conference. Oxford: OUP. 263-280.
- Robinson, P.M.W. (ed.) (1996). The Wife of Bath's Prologue on CD-ROM. Cambridge: Cambridge University Press.
- Robinson, Peter M.W. (1997). "New Directions in Critical Editing." Kathryn Sutherland (ed.). Electronic Text. Investigations in Method and Theory. Oxford: Clarendon Press, 145-171.
- Sperberg-McQueen, C. M. (1991). "Text in the Electronic Age: Textual Study and Text Encoding, with Examples from Medieval Texts." in: Literary and Linguistic Computing, 6/1: 34-46.
- Sperberg-McQueen, C. M. (1994). "Textual Criticism and the Text Encoding Initiative." Paper presented at MLA '94, San Diego, 1994. Accessed on March 15, 2002. >http://www.tei-c.org/Vault/XX/mla94.html>
- Sperberg-McQueen, C. M. (1996). "Textual Criticism and the Text Encoding Initiative." Finneran, Richard, J. (ed.) (1996). The Literary Text in the Digital Age. Ann Arbor: The University of Michigan Press, 37-61.
- Sperberg-McQueen, C. M. and Lou Burnard (eds.) (1994). Guidelines for Electronic Text Encoding and Interchange. (TEI P3). Chicago and Oxford: Text Encoding Initiative.
- Sperberg-McQueen, C. M. and Lou Burnard (eds.) (2001). TEI P4 Guidelines for Electronic Text Encoding and Interchange. XML-compatible edition. Oxford, Providence, Charlottesville, and Bergen: The TEI Consortium.
- Sperberg-McQueen, C.M., and Claus Huitfeldt. (1999). "Concurrent Document Hierarchies in MECS and SGML." In: Literary and Linguistic Computing, 14 (1999): 29-42.
- Sperberg-McQueen, C.M., and Claus Huitfeldt (s.d.). "GODDAG: A Data Structure for Overlapping Hierarchies." Accessed on March 15, 2002. >http://www.hit.uib.no/claus/goddag.html<
© Edward Vanhoutte, 12 June 2002.
This is the abstract for a paper to be presented on ALLC/ACH 02. Tübingen: University of Tübingen, 25 July 2001.
XHTML auteur: Edward Vanhoutte
Last revision: 27/11/2002