Reading Assignments |
||
![]() |
||
![]() |
![]() |
List of useful resources for Document Engineering and text encoding
Nonhierarchical markup and textual phenomena
Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies, Renear, Mylonas, Durand. This is a first paper on the problems of non-hierarchical texts. There was a followup: What should markup really be? Applying theories of text to the design of markup systems. Durand, DeRose, Mylonas Robin Cover has created a nice abstract page of many of the papers in this area. Please pick one other paper from that page and be prepared to discuss it in class on Friday. These two papers are assigned and they lay out a problem space and some possible solutions. Since then there has been a moderate amount of work done. The two efforts that I think are most significant are the GODDAG, and LMNL.
Web Information harvesting
This journal paper by Jon Kleinberg is a very elegant algorithm for finding authoritative sources of information on the web. You may want to start by reading the Hypertext conference paper on the same work.
XML Schema languages
This week's reading is a survey paper on XML schema languages by Makoto Murata. This paper is pretty substantial, and Friday, depending on feedback from the class, I can be prepared to walk you through some of the theory, so that we can have an informed discussion on Monday. Because the paper is widely available on the web, there are no redistribution issues, and this link is directly stored on the course web site. I'd like to remind everyone that you need to write 6 paper responses before the end of the semester, an people are starting to fall behind.
Page layout
This week we will read about the Lout formatting system, while we learn about xsl:fo, and the its approach to specifying text formatting. There are two Lout papers The first is a short overview of the features of the system. The second is a much fuller and longer description of the details of the system and their implementation. Read the whole of the first paper. When reading the second, skim or skip section 2.4, with details of concatenation of objects and the data structures used. also skim or skip section 5.3. The idea in reading the second paper is to get an overview of the major algorithms and approaches used in Lout, and their limitations, not to learn every detail.
XSL FO and CSS
You should read chapters 12 and 13 in the Nutshell book.
HTTP: readings
In addition to Nelson (for historical interest) you should read the following (less-historic) specifications from the WWW. HTTP is basic to the web, but is frequently misunderstood. Reading these will give the foundation that you will need to understand the remaining issues that the final specifications deal with. Notice the expansion factor on the size of these specifications; that is why you are not being asked to read HTTP 1.1. The overall expansion rate continued to be quite considerable.
Hypertext in XML
You should read Chapters 10 and 11 of Nutshell. You may want to review Chapter 9 on XPath, if XPointer starts to seem confusing.
Text Formatting
This week we are going to start discussing text formatting by considering an older system that explores a direction that has not become very popular. Kenneth Brooks Lilac text editor (written at the end of the 1980's) was an attempt to combine the relatively new model of text formatting introduced by Knuth's TeX program with the WYSIWYG paradigm of word-processing that was then in its infancy. Along the way, Brooks more or less re-discovered hierarchical content markup, albeit with some prior exposure to the ideas of Scribe. The journal article about Lilac is in the library at Brown, but it is not available in machine-readable form. You can download the doctoral dissertation with the complete report of the work (but I would not recommend printing it all out). The first four chapters of the dissertation are pretty easy reading, and cover most of the material that is still of interest in Lilac.
XSLT and XPath
XML in a nutshell: Chaps. 8, 9.
Implementing transformation languages
These articles discuss some of the algorithmic issues in implementing XSLT processing applications, and especially the creation of efficient XSLT implementations. You may want to think about how realistic the models in these papers are, and how the notion of the best efficiency tradeoff varies with the problem being solved. Noga, et. al., "Lazy XML Processing" Villard, Layaïda, "An Incremental XSLT Transformation Processor for XML Document Manipulation"
DTDs and document analysis
There's a new homepage design for the course and a much easier update mechanism for me to get new stuff in. Currently it focuses on the
Content Markup
Coombs, Renear, DeRose. "Markup Systems and the Future of Scholarly Text processing" DeRose, Durand, Mylonas, Renear. "What is Text, Really?" Comments and responses on What is text really |