Publication Details
Extrakce informace z WWW na základě znalosti struktury dat
BURGET Radek. Extrakce informace z WWW na základě znalosti struktury dat. In: Sborník příspěvků 2. ročníku konference Znalosti 2003. Ostrava: Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava, 2003, pp. 271-280. ISBN 80-248-0229-5.
English title
Information Extraction from WWW based on the data structure knowledge
Type
conference paper
Language
czech
Authors
Burget Radek, doc. Ing., Ph.D. (DIFS FIT BUT)
Keywords
Information Extraction, HTML, XML
Abstract
This paper deals with the matter of modelling the logical structure of a Web site and using such model for information extraction. It proposes an algorithm for creating a site model based on the HTML code analysis and a XML/XSL based system for information extraction from this model. Furthermore, the possibility of the usage of tree matching algorithms for automating the extraction process is discussed.
Published
2003
Pages
271-280
Proceedings
Sborník příspěvků 2. ročníku konference Znalosti 2003
Conference
Znalosti 2003, Ostrava, CZ
ISBN
80-248-0229-5
Publisher
Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava
Place
Ostrava, CZ
BibTeX
@INPROCEEDINGS{FITPUB7136, author = "Radek Burget", title = "Extrakce informace z WWW na z\'{a}klad\v{e} znalosti struktury dat", pages = "271--280", booktitle = "Sborn\'{i}k p\v{r}\'{i}sp\v{e}vk\r{u} 2. ro\v{c}n\'{i}ku konference Znalosti 2003", year = 2003, location = "Ostrava, CZ", publisher = "Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava", ISBN = "80-248-0229-5", language = "czech", url = "https://www.fit.vut.cz/research/publication/7136" }