Thesis Details

Automated Factoid Question Answering and Fact-Checking in Natural Language

Ph.D. Thesis Student: Fajčík Martin Academic Year: 2023/2024 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.
Czech title
Automatické odpovídání na faktické otázky a ověřování faktů v přirozeném jazyce
Language
English
Abstract

This thesis examines two problems, that rely on a precise understanding of factual information. In factoid question answering (QA), it addresses three topics, Firstly, it shows a novel  probability formulation and training objective for systems that extract answer as a span of text. The experiments show that the proposed compound objective with joint probability space is Pareto optimal to other used objectives. Secondly, the thesis studies the problem of open-domain QA. It shows that extractive approaches and abstractive approaches have complementary strengths and proposes a pipelined state-of-the-art system R2-D2 that serves as a strong baseline for the community. Thirdly, it studies the effect of pruning down the retrieval corpus under R2-D2. The experiments demonstrate that for two popular datasets, NaturalQuestions and TriviaQA, two-thirds of the retrieval corpus can be removed without the loss of performance, and 92 % can be removed with a loss of performance up to -3 exact match score. Findings also indicate that the same pruning mechanism is implicitly present in modern supervised retrieval mechanisms, such as DPR. In fact-checking, the thesis studies two topics. Firstly, it shows that pretrained model approaches can reach competitive performance in rumor stance detection without using of any handcrafted features or metadata. Specifically, our system targets rumor stance detection in social media threads and selects whether each post supports, denies, queries, or comments on the rumor present in the discussion thread. Experiments demonstrate that using just the first thread post and the previous thread post is sufficient in obtaining strong performance of determining the current post stance. Secondly, the thesis studies evidence-grounded fact-checking. Claim-Dissector-a system that jointly identifies the relevant evidence and produces a veracity verdict-is proposed. The proposed system can find supporting and refuting evidence for a claim at any language granularity, including tokens, sentences, or paragraphs, and link them in an interpretable way with the verdict. It is demonstrated that the model allows successful transfer learning from the coarse granularity of supervision to the fine granularity of predictions. In particular, it is shown that training on sentence level of relevance is sufficient to obtain relevant token-level rationales, and training on block level indeed provides competitive sentence-level cues. The strong performance of Claim-Dissector is demonstrated across 5 datasets and 2 underlying pretrained models, including a newly collected dataset TLR-FEVER. The code for all experiments is available online.

Keywords

question answering, fact checking, fact-checking, QA, FC, extractive question answering,R2-D2, Claim-Dissector, RumourEval, QA corpus pruning, compound objective, TriviaQA,EfficientQA, NaturalQuestions

Department
Degree Programme
Computer Science and Engineering, Field of Study Computer Science and Engineering
Files
Status
defended
Date
25 April 2024
Citation
FAJČÍK, Martin. Automated Factoid Question Answering and Fact-Checking in Natural Language. Brno, 2023. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2024-04-25. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/phd-thesis/1224/
BibTeX
@phdthesis{FITPT1224,
    author = "Martin Faj\v{c}\'{i}k",
    type = "Ph.D. thesis",
    title = "Automated Factoid Question Answering and Fact-Checking in Natural Language",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2024,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/1224/"
}
Back to top