GlossaNet: Parsing a web site as a corpus
Abstract:GlossaNet is an automated system that monitors Web sites. On dates and at intervals selected by the user, GlossaNet downloads the Web site, converts it to an electronic corpus and uses the intex programs (M. Silberztein 1993) and the linguistic resources of the ladl (electronic dictionaries and libraries of local grammars) to parse it. Once the software has been set up, it automatically repeats the task at regular periods of time (as the Web site is updated). Results, if any, are e-mailed to the user.
Document Type: Research Article
Affiliations: Laboratoire d’Automatique Documentaire et Linguistique UMR N°7546 du CNRS, Université Paris 7
Publication date: October 1, 2000