Causality across languages: State of the art

Jürgen Bohnemeyer, University at Buffalo – SUNY

This presentation introduces the project Causality Across Languages (CAL; NSF Award # BCS-1535846 and BCS-1644657). CAL investigates which aspects of causal cognition are invariant across human populations and thus may have an innate basis, and which are learned and subject to cross-cultural variation. The project primarily uses language as a window into causal cognition. The principal objective is the first large-scale comparison of how speakers of different languages categorize causal chains for the purposes of describing them. The project will help discern the criteria by which speakers choose from among the expressive options available to them when representing causal chains.
These options have been shown to vary enormously across languages. CAL furthermore aims to elucidate the principles governing the encoding of causal relations at the syntax- semantics interface, a key aspect of the grammatical structure of human language. It also includes the largest-scale investigation to date of how language interfaces with nonlinguistic cognition in the representation of causality. An international team of 42 researchers is investigating the representation of causality across 34 languages belonging to 32 genera and spoken on six continents.
The presentation summarizes three ongoing studies. The first, directed by Erika Bellingham (University at Buffalo), is an examination of the strategies speakers of different languages employ when verbalizing causal chains in narratives. These strategies comprise the output of decisions concerning which subevents to represent specifically, which to represent in an underspecified manner, and which to leave to nonmonotonic inferences such as conversational implicatures. Similar decisions must be made concerning the causal relations among the subevents. The ideal result of this investigation is a narrative production model for the causal domain, described in computational terms purely on the basis of linguistic data, focusing on the deployment of linguistic resources and pragmatic strategies.
For this study, we created the CAL Clips, a set of 58 short video clips (43 core scenes plus 15 additional scenes) that feature causal chains involving two or three participants, one or two of whom are human. The design of the scenes systematically varies properties such as the animacy of the participants, the intentionality of their actions, and the domain of causation (physical vs. psychological or speech act causation). Narratives of these clips have to date been collected from speakers of English, Japanese, Korean, Russian, and Yucatec (Mayan; Mexico and Belize), from 12 speakers per language. Only a subset of this data has been coded at the time of writing.
Results indicate remarkably parallel patterns of what is expressed (specifically or generically) and what is left implicit.
The second study targets the semantic typology of causative constructions. We implemented a multiphasic design protocol combining two techniques. Then first involved the collection of rich production data from a smaller number of speakers using the CAL Clips. In the second phase, comprehension data were collected in the form of acceptability judgments of goodness-of-fit between the clips and the descriptions produced during the first phase from a larger number of speakers. Goodness-of-fit judgments were collected based on an eight-point scale. Data has so far been collected from speakers of Datooga (Nilotic, Tanzania), Mandarin, Sidaama (Cushitic, Ethiopia), Urdu, and Yucatec (Mayan, Mexico and Belize) at 12 speakers per language. The research was conducted in the speakers’ native languages. Data collection from speakers of additional languages is underway.
To assess the structural complexity of the responses, we coded them for the ‘juncture’ levels (nuclear, core, or clause-layer) and ‘nexus’ types (coordination, subordination, or cosubordination) at which the causal relations between the initial and final subevents of the chains represented in the videos were encoded. Juncture and nexus relations have been argued to project into a single ‘Interclausal relations hierarchy’ (e.g., Van Valin 2005: 209). An ordinal mixed effects logistic regression model based just on the Datooga and Yucatec speakers’ ratings of descriptions of the 43 core scenes indicates a strong, highly significant correlation between domain and most compact juncture-nexus type acceptable, where scenes involving psychological or speech act causation generally require more loosely integrated representations independently of language. There is a strong main effect of language, in that the Datooga speakers generally preferred more complex descriptions than the Yucatec speakers. This can be explained with reference to periphrastic causative constructions having subordinative nexus in Datooga, but cosubordinative nexus in Yucatec. In contrast, surprisingly, mediation – the closest equivalent to directness of causation in a traditional sense – did not exert a significant effect, possibly lending some support to Escamilla’s (2012) observations.
The third study investigates the nexus between responsibility and intentionality in nonverbal attributions of causality. Participants watch a subset of 24 of the CAL Clips featuring two actors involved in a causal chain initiated by one of them. After watching each video, participants divide 10 tokens into piles indicating their assignment of responsibility for the resulting event. To date, data has been analyzed that was obtained from 12 speakers of Yucatec, 16 Mandarin speakers, and 20 Spanish speakers. A linear mixed effects regression model indicated a significant interaction between intentionality and population: causer and causee intentionality made a significant difference only for the Spanish and Yucatec participants, but not for the Chinese participants. This supports previous findings suggesting that internal dispositions play a lesser role in responsibility attribution in societies in which attention to individual agency is far more common than attention to group agency (e.g., Morris & Peng 1994).
A question we intend to take up in the next phase of our investigation is whether the apparent difference in causal attribution also manifests itself in the grammatical means used when members of the different groups talk about causality. It has often been observed that more agentive causal chains tend to be represented more compactly in language than less agentive ones. Thus, Sally made Floyd knock over the cup tower implicates, but does not entail, that Sally acted intentionally, whereas Sally bumped into Floyd and he knocked over the cup tower does not (McCawley 1976). This predicts that members of sociocentric societies may use relatively more compact representations of low-intentionality scenarios than members of egocentric societies. If confirmed, this could mean that grammars are influenced by folk theories of agency.


Escamilla, R. M. Jr. (2012). An updated typology of causative constructions: Form- function mappings in Hupa (Californian Athabaskan), Chungli Ao (Tibeto-Burman) and Beyond. PhD Dissertation, University of California, Berkeley.

McCawley, J. (1976). Remarks on what can cause what. In M. Shibatani (ed.), Syntax and Semantics VI: The grammar of causative constructions. New York, NY: Academic Press. 117–129.

Morris, M. M. W., & Peng, K. (1994). Culture and cause: American and Chinese attributions for social and physical events. Journal of Personality and Social Psychology, 67(6), 949–971.

Van Valin, R. D. Jr. (2005). Exploring the syntax-semantics interface. Cambridge: Cambridge University Press.