Course News

Home

Reading assignments

Programming Assignments

Resources

Syllabus

Slide Sets

Final Project Ideas

Updates


List of useful resources for Document Engineering and text encoding

Posted: 5 Feb


FRESS data publishing
:

The FRESS system was one of the very early such systems ever created, being preceded only by HES, also done at Brown, and Engelbart's NLS system. Among the early projects done in FRESS was the teaching of a poetry course — at that time, the idea of teaching English Poetry with the assistance of computer software was very radical.

I have the entire database of that course, and a number of other early FRESS resources, converted to SGML from the binary IBM mainframe format that FRESS used.

The final project would be to complete the data conversion process that I started some years ago. There are good tools available for converting SGML to XML, but there is also embedded formatting markup in the documents that has not yet be converted to XML.

Once the files have been fully converted to XML, stylesheets to publish the data on the web, and (perhaps) to publish it in a print form (PDF via XSLFO?) would complete the project.

While this project is in some senses straightforward, there are a number of tricky aspects to it; there will certainly be a real audience for the results of the conversion once it's complete.


processing
:

These project ideas consider a variety of problems in markup or document processing.


data
:

These projects all revolve around the management and publication of document data; either system creation of tools, or the processing of a significant real-world data set that presents interesting problems.


Collaborative editing slides
:

These powerpoint slides cover the material on collaborative editing and Palimpsest that we talked about in class.


Archival
:


Final Projects
Update:
Due: 9 May

Don't forget to make an appointment with me to deliver and demonstrate your final projects on Monday the 12th.


Nonhierarchical markup and textual phenomena
Reading Assignment:
Due: 18 Apr

Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies, Renear, Mylonas, Durand. This is a first paper on the problems of non-hierarchical texts.

There was a followup:

What should markup really be? Applying theories of text to the design of markup systems.

Durand, DeRose, Mylonas

Robin Cover has created a nice abstract page of many of the papers in this area. Please pick one other paper from that page and be prepared to discuss it in class on Friday.

These two papers are assigned and they lay out a problem space and some possible solutions.

Since then there has been a moderate amount of work done. The two efforts that I think are most significant are the GODDAG, and LMNL.


Web Information harvesting
Reading Assignment:
Due: 14 Apr

This journal paper by Jon Kleinberg is a very elegant algorithm for finding authoritative sources of information on the web. You may want to start by reading the Hypertext conference paper on the same work.


Project definition appointments
Update:
Due: 11 Apr

I would like to meet with each of you to define your final project. This will be the first of several meetings, in which we will refine the idea, check progress and difficulties, and revise specifications as necessary.

Please send me mail to make an appointment.


Programming assignments
:
Due: 15 Feb

This is some test text.