In this project, our objective
was to develop a client-side tool which lets the user to get continuous
and timely information from the web about what he is interested in, minimizing
the time user spends on browsing the web pages. Data source file
should be a well-structured site whose content is rapidly changing ( We
chose CNN.com as the web-site). Although the language itself is generic,
we need to manually build site trees for the web sites we want to apply
it to, and to specify some functions on the leaves of these trees to
allow finer-grained information retrieval. Such ``half-automatic'' extraction
of structure and information from HTML-pages is not very elegant of course,
but it seems that the only way to allow to do it automatically is to store
data on the web in XML.
The user specifies his interests through :
FROM /cnn/sports/baseball/schedules/*
A
/weather/forecast/Boston B
WHERE A.city("home", "May 11")
= "Boston" AND
B.temperature("day2", "high", "F") > ``50''
Our system should be easily extendible to query and integrate data from multiple sites. This is the potential advantage of this client side system versus the sever side service.