Natural language does not function at the level of isolated sentences. When humans produce and interpret text or speech, they rely on discourse structure — the set of relations, hierarchies, and organizational patterns that bind individual clauses into coherent wholes. Computational linguistics has devoted substantial effort to formalizing these structures, both to improve natural language understanding systems and to test theoretical claims about how humans process extended text. Discourse structure encompasses phenomena ranging from local coherence between adjacent sentences to the global organization of entire documents.
Levels of Discourse Organization
E = {(uᵢ, uⱼ, r) | r ∈ R} (coherence relations)
Coherence score: C(T) = Σ_{(i,j,r)∈E} w(r) · sim(uᵢ, uⱼ)
Discourse can be analyzed at multiple levels. At the sentential level, adjacent clauses are linked by coherence relations such as Cause, Contrast, and Elaboration. At the paragraph level, topic sentences organize clusters of supporting detail. At the document level, macrostructures such as introduction-body-conclusion or problem-solution frames guide global organization. Each level imposes constraints on what constitutes a well-formed text, and violations at any level produce incoherence that readers readily detect.
Formal Theories of Discourse
Several competing frameworks have been proposed for representing discourse structure. Rhetorical Structure Theory (RST) organizes text into hierarchical tree structures, where nucleus-satellite and multinuclear relations connect elementary discourse units. The Penn Discourse Treebank (PDTB) takes a lexically grounded approach, annotating discourse connectives and the arguments they relate. Segmented Discourse Representation Theory (SDRT) combines dynamic semantics with discourse relations, allowing underspecified attachments and complex anaphoric dependencies. Each framework captures different aspects of discourse organization and has motivated distinct computational approaches.
An alternative perspective on discourse structure focuses on entity coherence — the patterns of entity mentions across sentences. Centering Theory (Grosz, Joshi, and Weinstein, 1995) models local coherence through the tracking of discourse entities and their salience. The entity grid model (Barzilay and Lapata, 2008) operationalizes this idea by representing documents as matrices of entity transitions, achieving strong results in coherence assessment and text ordering tasks without explicit relation annotation.
Computational Applications
Discourse structure plays a critical role in numerous NLP applications. In automatic summarization, identifying the nucleus of each RST relation allows systems to extract the most important content. In sentiment analysis, discourse relations help determine whether an opinion expressed in a subordinate clause should be attributed to the author or to a reported speaker. Machine translation benefits from discourse-level analysis when translating connectives, pronouns, and other phenomena whose interpretation depends on cross-sentential context.
Recent neural approaches have begun to learn implicit discourse representations through large-scale pre-training. Transformer-based models capture long-range dependencies that correspond to discourse structure, though whether they genuinely learn hierarchical discourse organization or merely exploit surface correlations remains an active area of investigation. Probing studies suggest that intermediate layers of models like BERT encode information about discourse relations, centering transitions, and paragraph structure, but explicit discourse parsing still improves downstream performance on tasks requiring document-level understanding.