Publication Details

HTML Document Analysis for Information Extraction

BURGET Radek. HTML Document Analysis for Information Extraction. In: Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002, pp. 426-430. ISBN 80-214-2116-9.

Czech title

Analýza HTML dokumentů pro extrakci informace

Type

conference paper

Language

english

Authors

Burget Radek, doc. Ing., Ph.D. (DIFS FIT BUT)

Keywords

HTML Analysis, Information Extraction

Abstract

The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Published

2002

Pages

426-430

Proceedings

Proceedings of 8th EEICT conference

Conference

ELECTRICAL ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES 2002, Brno, CZ

ISBN

80-214-2116-9

Publisher

Faculty of Information Technology BUT

Place

Brno, CZ

BibTeX

@INPROCEEDINGS{FITPUB6921,
   author = "Radek Burget",
   title = "HTML Document Analysis for Information Extraction",
   pages = "426--430",
   booktitle = "Proceedings of 8th EEICT conference",
   year = 2002,
   location = "Brno, CZ",
   publisher = "Faculty of Information Technology BUT",
   ISBN = "80-214-2116-9",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/6921"
}