Computational Linguistics
About

Projective vs. Nonprojective

Projectivity is a structural constraint on dependency trees requiring that no arcs cross when words are laid out in linear order; non-projective trees arise in languages with freer word order.

Arc (i,j) is non-projective iff ∃ k: min(i,j) < k < max(i,j) and k is not dominated by i

A dependency tree is projective if, for every arc from head h to dependent d, all words between h and d in the linear order are dominated by h (i.e., are descendants of h in the tree). Equivalently, a tree is projective if no two arcs cross when drawn above the sentence. Non-projective trees contain at least one crossing arc pair. The distinction is important because projectivity constrains which parsing algorithms are applicable and which linguistic phenomena can be captured.

Formal Definition

Projectivity Arc (h, d) is projective iff for all k with min(h,d) < k < max(h,d):
h →* k (k is a descendant of h)

A tree is projective iff all its arcs are projective.

Equivalently: no two arcs (h1, d1) and (h2, d2) cross,
where arcs cross iff h1 < h2 < d1 < d2 or h2 < h1 < d2 < d1 (or symmetric cases)

English is largely projective: the Penn Treebank-derived dependencies contain fewer than 2% non-projective arcs. However, languages with freer word order show much higher rates: Czech (~23%), Dutch (~36%), and Ancient Greek (~40%) have substantial non-projectivity. Common sources of non-projectivity include topicalization, wh-movement, extraposition, and scrambling.

Parsing Implications

Projective dependency parsing can be solved in O(n³) time using Eisner's algorithm, which is essentially a variant of CYK for dependencies. Standard transition-based parsers with arc-standard or arc-eager transition systems can only produce projective trees. Non-projective parsing requires different algorithms: the Chu-Liu/Edmonds maximum spanning arborescence algorithm runs in O(n²) for arc-factored models, while pseudo-projective parsing (Nivre and Nilsson, 2005) transforms non-projective trees into projective ones with augmented labels, parses projectively, and then restores the non-projective arcs.

Mildly Non-Projective
Most non-projective structures in natural language are only mildly non-projective: they have a small gap degree (number of gaps in a dependent's projection) and limited edge degree. This has motivated algorithms that handle bounded non-projectivity, such as parsing with gap-degree constraints, which are more efficient than fully non-projective parsing.

Cross-Linguistic Patterns

Typological studies using Universal Dependencies have revealed systematic patterns in non-projectivity across languages. Head-final languages (like Japanese and Korean) tend to be more projective, while languages allowing scrambling and topicalization (like Czech and German) show higher non-projectivity rates. Understanding these patterns is important for selecting appropriate parsing algorithms and for designing annotation schemes that handle discontinuous structures.

Related Topics

References

  1. Nivre, J., & Nilsson, J. (2005). Pseudo-projective dependency parsing. Proceedings of the 43rd Annual Meeting of the ACL, 99–106. https://doi.org/10.3115/1219840.1219853
  2. Kuhlmann, M., & Nivre, J. (2006). Mildly non-projective dependency structures. Proceedings of COLING-ACL 2006, 507–514. https://doi.org/10.3115/1220175.1220238
  3. McDonald, R., Pereira, F., Ribarov, K., & Hajiç, J. (2005). Non-projective dependency parsing using spanning tree algorithms. Proceedings of HLT-EMNLP 2005, 523–530. https://doi.org/10.3115/1220575.1220641
  4. Havelka, J. (2007). Beyond projectivity: Multilingual evaluation of constraints and measures on non-projective structures. Proceedings of the 45th Annual Meeting of the ACL, 608–615. https://aclanthology.org/P07-1077

External Links