Information Extraction

Currently, information extraction methods can be categorized into two approaches:

However, both approaches have their own limitations:

Supervised Learning
- In the NER task: It operates at the sentence level and cannot capture keyword relationships across the entire document.
- In a specific domain: It is costly, but the quality may be higher.
Large Language Models (LLMs)
- In the NER task: They process information at the document level but may generate hallucinated content.
- In a specific domain: They are less costly, but the quality is uncertain because LLMs generate output beyond their basic domain knowledge

法律遺囑的資訊擷取：GPT-4 的效能分析
探討 GPT-4 在法律遺囑文本資訊擷取中的效能，聚焦於四類實體（遺囑人、受益人、資產、遺囑）和四類關係（如遺囑人與受益人的關係）

GPT-4 在科學資訊擷取中的應用分析
評估 GPT-4 是否能透過基本的一樣本提示（one-shot prompting）正確理解敘述性文字和表格數據，並完成基於 Schema 的科學資訊擷取任務

這篇研究探討無監督、弱監督和預訓練模型（如DistilBERT、Longformer）在對話中抽取11個關鍵特徵（如案件類型、教育程度、假釋評估分數等）的能力。結果顯示，大多數模型的F1分數低於0.85，表明這些任務仍具挑戰性。主要難點包括：