Publication Details
Creating Searchable Web Page Snapshots using Semantic Technologies
Salem Hamza, MSc (FIT BUT)
Web page snapshot, Page rendering, Data extraction, RDF, SPARQL
For many applications, it is necessary to create snapshots of web pages that accurately describe how the page appeared in a browser at a given point in time. Storing the original code (even when including all referenced resources) and creating bitmap screenshots have many drawbacks when it comes to searching, viewing and manipulating such snapshots. In this paper, we demonstrate a different approach that uses a remotely controlled web browser for rendering web pages. We capture the complete information about the rendered page and all pieces of its content, transform it to an explicit RDF-based model representation stored in a repository. Then, the stored page models may be examined using an interactive web-based tools, exported in different formats, linked with other data sources, and queried using SPARQL.
@INPROCEEDINGS{FITPUB12965, author = "Radek Burget and Hamza Salem", title = "Creating Searchable Web Page Snapshots using Semantic Technologies", pages = "355--358", booktitle = "Web Engineering - 23rd International Conference, ICWE 2023", series = "Lecture Notes in Computer Science", year = 2023, location = "Alicante, ES", publisher = "Springer Nature Switzerland AG", ISBN = "978-3-031-34443-5", doi = "10.1007/978-3-031-34444-2\_26", language = "english", url = "https://www.fit.vut.cz/research/publication/12965" }