Event Extraction

Event extraction is the task of identifying events mentioned in text and extracting their structured representations. An event is typically defined by a trigger — the word or phrase that most clearly denotes the event's occurrence — an event type drawn from a predefined ontology, and a set of arguments filling semantic roles specific to the event type. For example, in "Boeing acquired McDonnell Douglas for $13 billion in 1997," the trigger is "acquired," the event type is Acquisition, and the arguments include the buyer (Boeing), the acquired entity (McDonnell Douglas), the price ($13 billion), and the time (1997).

Event Detection and Argument Extraction

Event Extraction Pipeline Step 1 — Trigger detection: identify event-evoking words
P(type | trigger, context) for each candidate trigger

Step 2 — Argument extraction: identify role fillers
P(role | arg, trigger, context) for each candidate argument

Event record: {type: Attack, trigger: "bombed",
Attacker: "rebels", Target: "embassy", Place: "Nairobi"}

Event extraction is typically decomposed into two subtasks: event detection (identifying triggers and classifying event types) and argument extraction (identifying argument spans and classifying their roles). Event detection can be formulated as a word classification task — for each word in the sentence, determine whether it triggers an event and, if so, of what type. Argument extraction then identifies which entities or phrases fill the semantic roles defined by the event type. The ACE (Automatic Content Extraction) programme defined 33 event types and 35 argument roles that have served as the standard ontology for event extraction research.

Joint Models and Document-Level Extraction

Pipeline approaches to event extraction suffer from error propagation between the trigger detection and argument extraction stages. Joint models address this by performing both tasks simultaneously, allowing trigger identification to inform argument extraction and vice versa. Neural joint models typically use shared representations and structured prediction techniques to capture dependencies between triggers and arguments. The JMEE (Jointly Multiple Events Extraction) model uses attention-based graph convolution networks to capture interactions between multiple events in the same sentence, handling the common case where a single sentence describes multiple events.

ACE and Beyond

The ACE (Automatic Content Extraction) evaluations (2000–2008) established the foundational benchmarks for event extraction, defining event types, argument roles, and evaluation metrics that the field still uses. However, the ACE ontology covers only 33 event types, far fewer than the diversity of events described in real-world text. More recent efforts such as the KAIROS programme and the MAVEN dataset have expanded event ontologies to hundreds of types and introduced more complex scenarios including cross-document event coreference, event-event relations, and schema-guided extraction that can adapt to new event types without retraining.

Document-level event extraction extends the task beyond individual sentences, requiring systems to aggregate event information scattered across multiple sentences and resolve argument coreference. A news article about a corporate acquisition might mention the buyer in the first paragraph, the price in the third, and the completion date in the fifth. Document-level extraction requires long-range reasoning and coreference resolution to assemble complete event records. Transformer-based models with document-level attention have shown promise, but the task remains substantially harder than sentence-level extraction, with current systems achieving F1 scores 15–20 points lower than their sentence-level counterparts.

Event Detection and Argument Extraction

Joint Models and Document-Level Extraction

References

External Links

Event Detection and Argument Extraction

Joint Models and Document-Level Extraction

Related Topics

References

External Links