This study proposes a novel approach to studying cross-cultural relationships among folktales that employs powerful, quantitative methods of phylogenetic analysis.

In each case, the aim of a phylogenetic analysis is to construct a tree or graph that represents relationships of common ancestry inferred from shared inherited traits homologies.

Folktales represent an excellent target for phylogenetic analysis because they are, almost by definition, products of descent with modification: Rather than being composed by a single author, a folktale typically evolves gradually over time, with new parts of the story added and others lost as it gets passed down from generation to generation.

The present study aims to establish whether these methods can also be used to differentiate the tale types themselves, and test the empirical validity of the international type system.

In addressing this question, phylogenetics has several advantages over traditional historic-geographic methods. First, rather than basing the classification of related tales on just a few privileged motifs, phylogenetic analysis can take into account all the features that a researcher believes might be relevant.

Second, phylogenetic reconstruction does not assume a-priori that the most common form of a trait, or the form exhibited by the oldest recorded variant, is necessarily ancestral.

It is therefore likely to be less vulnerable to the strong European bias in the folktale record than traditional historic-geographic methods.

Third, phylogenetics provides useful tools for quantifying the relative roles of descent versus other processes, such as convergence and contamination, in generating similarities among taxa.

These include statistical techniques for measuring how well patterns in a dataset fit a tree-like model of descent [34] , and network-based phylogenetic methods that have been designed to capture conflicting relationships [35] , [36].

Such methods make it possible to evaluate the coherence and degree of overlap between international types indicated by the analyses.

Most versions of the story in modern popular culture are derived from the classic literary tale published by Charles Perrault in seventeenth century France [39] , which recounts the misadventures of a young girl who visits her grandmother's house, where she is eaten by a wolf disguised as the old woman.

In many of these tales, the girl lacks her characteristic red hood and nickname, and manages to outwit the wolf before he can eat her: After finally seeing through the villain's disguise, the girl asks to go outside to the toilet.

The wolf reluctantly agrees, but ties a rope to her ankle to prevent her from escaping. When she gets out, the girl cuts the rope, ties the end to a tree, and flees into the woods before the villain realises his mistake.

On the way there, she eats the cakes and replaces them with donkey dung. The poem, which purports to be based on a local folktale, tells of a girl who wanders into the woods wearing a red baptism tunic given to her by her godfather.

She encounters a wolf, who takes her back to its lair, but the girl manages to escape by taming the wolf's cubs.

Highly similar stories to Little Red Riding Hood have been recorded in various non-western oral literatures.

When the children hear the sound of their youngest sibling being eaten, they trick the villain into letting them outside to go to the toilet, where, like the heroine of The Story of Grandmother, they manage to escape.

Another tale, found in central and southern Africa [43] [44] , tells of a girl who is attacked by an ogre after he imitates the voice of her brother.

In some cases, the victim is cut out of the ogre's belly alive — an ending that echoes some variants of Little Red Riding Hood recorded in Europe, including a famous text published by the Brothers Grimm in nineteenth century Germany [1].

Despite these similarities, it is not clear whether these tales can in fact be classified as ATU Some writers [44] [45] [46] suggest they may belong to another international tale type, ATU , The Wolf and the Kids, which is popular throughout Europe and the Middle East.

In this tale, a nanny goat warns her kids not to open the door while she is out in the fields, but is overheard by a wolf. When she leaves, the wolf impersonates her and tricks the kids into letting him in, whereupon he devours them.

Versions of the tale occur in collections of Aesop's fables, in which the goat kid avoids being eaten by heeding the mother's instruction not to open the door, or seeks further proof of the wolf's identity before turning him away.

Although ATU is believed to be closely related to ATU , it is classified as a separate international tale type on the basis of two distinguishing features.

First, ATU features a single victim who is a human girl, whereas ATU features multiple victims a group of siblings who are animals.

Second, in ATU the victim is attacked in her grandmother's house, while in ATU the victims are attacked in their own home.

However, the application of these criteria to non-western oral traditions is highly problematic: Thus, in most of the African tales the victim is a human girl grouping them with ATU , but she is attacked in her own home rather than a relative's grouping them with ATU In most variants of the tale, they are attacked after being left at home by their mother ATU , but in some cases they encounter the villain en route to their grandmother's house as per ATU The ambiguities surrounding the classification of the East Asian and African tales exemplify the problems of current folklore taxonomy.

While ATU and ATU are easy to discriminate between in a western context, tales from other regions share characteristics with both types and do not comfortably fit the definitions of either.

With that in mind, the present study addresses two key questions: Can the tales described above be divided into phylogenetically distinct international types?

Relationships among the tales were reconstructed using three methods of phylogenetic analysis: cladistics, Bayesian inference and NeighbourNet see Methods for a full description.

The analyses focused on 72 plot variables, such as character of the protagonist single child versus group of siblings; male versus female , the character of the villain wolf, ogre, tiger, etc.

A full list of characters and explanation of the coding scheme is provided in the Supporting Information File S1 , together with the character matrix File S2.

The cladistic analysis returned equally most parsimonious trees MPTs. The fit between the data and the trees was measured with the Retention Index, which was calculated as 0.

Figure 2 shows a consensus tree representing relationships that were present in the majority of the MPTs and levels of support for them returned in a bootstrap analysis.

The tree, which is unrooted, splits the tales into three principal groups. The latter include two non-European tales, one from Iran, the other collected from the Ibo of Nigeria.

The Little Red Riding Hood clade separates Perrault's classic version from more recent versions, including the Grimms' 18 th century German text.

However, the low levels of boostrap support indicate a substantial degree of conflicting signal surrounding these relationships. The second major group can be identified as international type ATU The third major group is formed by the East Asian tales.

Major groupings are labelled by region or ATU international type and indicated by the coloured nodes.

Numbers beside the edges represent the level of support for individual clades returned by the bootstrap analysis. The Bayesian analysis returned a very similar set of results.

Figure 3 shows an unrooted maximum clade credibility tree obtained from the posterior distribution. It represents the same three major groupings, with varying levels of support in the posterior distribution of trees.

Relationships within the group separate variants of the Aesopic fable from the other narratives. Numbers beside the edges represent the percentage of trees in the Bayesian posterior distribution of trees in which a given node occurred.

The scale bar indicates the average number of changes per character along a given edge. The NeighbourNet graph is shown in Figure 4.

Once again, the tales are divided into the same three main groups, except the Indian tale, which does not cluster with any of them. Although the groups are clearly discernible, the overlapping boxes demonstrate conflicting splits in the data.

This is especially clear in the East Asian clade, which exhibits a highly reticulated structure. Similarly, overlapping boxes obscure the phylogenetic structure within the ATU group, although it is possible to identify a split between the fable and folktale versions of the story, with the latter again including a clade of African tales plus the Antiguan variant.

Overlapping box-like structures indicate conflicting signal in the data. The scale bar indicates the proportion of characters in which states differ among the tales being compared.

Comparative cross-cultural studies of folklore have long been dogged by debates concerning the durability and integrity of oral traditions.

To address this problem, the present study employed three methods of phylogenetic reconstruction together with several techniques for quantifying the relative contributions of descent versus other processes in generating relationships between Little Red Riding Hood and other similar tales from around the world.

Overall, the results demonstrate a high degree of consistency in the groupings returned by the cladistic, Bayesian and NeighbourNet analyses.

The RI of the most parsimonious trees 0. Simulations of cultural evolutionary processes carried out by Nunn et al.

The average delta score 0. These figures are within the range of values obtained from linguistic cognate vocabulary sets reported in Gray et al.

However, it is worth noting that it is possible to obtain lower values than the ones reported here from datasets that include known borrowings and even hybrid languages.

For example, Gray et al. In other words, while relationships among the folktales fit a branching model of descent quite well, borrowing and blending could have potentially played a more significant role than indicated by the RI of the MPTs.

This would be consistent with the low bootstrap support and posterior probabilities for some of the clades returned by the cladistic and Bayesian analyses.

Like the NeighbourNet graph, both these analyses indicate conflicting signal surrounding the East Asian group, as well as among geographically proximate variants of ATU and ATU In this case, the accuracy of the relationships depicted in Figures 2 , 3 , and 4 is supported by qualitative evidence regarding the historiography of the tales.

Thus, all three analyses identified Little Red Riding Hood, The Story of Grandmother and Catterinella as a single tale type that is distinct from The Wolf and the Kids, which folklorists believe to be a more distantly related tale [11].

The position of the Grimms' version of Little Red Riding Hood supports historiographical evidence that it is directly descended from Perrault's earlier tale via a literate informant of French Huguenot extraction [37].

The results of the analyses also concur with the literary record on The Wolf and the Kids, which suggests the tale evolved from an Aesopic fable which was first recorded around AD [46].

All three analyses indicate that Aesopic versions of the tale — in which the victim sees through the villain's disguise before letting him through the door — diverged at an early point in the history of the lineage, prior to the existence of the last common ancestor shared by other variants of The Wolf and the Kids.

In sum, the consistency of the relationships returned by different phylogenetic methods, their fit to the data, and their compatibilities with independent lines of folklore research provide compelling evidence that — contrary to the claim that the vagaries of oral transmission are bound to wipe out all traces of descent in folktales — it is possible to establish coherent narrative traditions over large geographical distances and historical periods.

While these findings broadly support the goals of historic-geographic approaches to folklore, they also demonstrate that phylogenetic analysis can help resolve some the problems arising from more traditional methods.

As mentioned previously, one of the key problems with existing folklore taxonomy is that it defines international types in reference to European type specimens on the basis of just a few traits.

In this case, African and East Asian tales are grouped with Little Red Riding Hood because they feature human protagonists, and with The Wolf and the Kids because the villain attacks the victims in their own home, rather than their grandmother's.

The phylogenetic approach used here, on the other hand, defines types in reference to the tales' inferred common ancestors rather than any existing variants, and uses all the traits they exhibit as potential evidence for their relationships.

All the analyses clustered the African stories with ATU The sole exception was an Ibo tale, which grouped with European variants of Little Red Riding Hood, thus endorsing the collector's belief that the story is not of local origin, but an Ibo oral translation of the western fairy tale [50].

The tale was subsequently modified to create a novel redaction that spread across central and southern societies on the continent, and even as far as Antigua.

Although bootstrap and posterior support for this clade was relatively modest, it is remarkable that the phylogenetic signal in this tradition was sufficiently strong to be detected by all three analyses, despite the massive cultural and human upheavals that occurred during the forced displacement of African populations during the slave trade.

Since there is no evidence to suggest they share a more recent common ancestor with The Wolf and the Kids or Little Red Riding Hood, they cannot be classified as members of either international type.

One intriguing possibility raised in the literature on this topic that would be consistent with these results is that the East Asian tales represent a sister lineage that diverged from ATU and ATU before they evolved into two distinct groups.

A more detailed exposition of this theory has been set out by the Sinologist Barend J. Noting that the The Tiger Grandmother encompasses a spectrum of more ATU like variants and more ATU like variants, Haar argues that the East Asian tales represent an ancient autochthonous tale type that is ancestral to the other two.

On the basis of qualitative comparisons among these and other Asian tales, he conjectures that the tale originated in China and spread westwards to the Middle East and Europe between the twelfth and fourteenth century, a period during which there were extensive trade and cultural exchanges between east and west.

At some unspecified later point, the tale type split into the lineages that gave rise to Little Red Riding Hood and the Wolf and the Kids.

Although it is tempting to interpret the results of the analyses in this light, there are several problems with this theory.

Of course, as mentioned previously, literary evidence about the origins of oral tales can be unreliable and biased toward Europe.

However, at the very least, the existence of ATU in first century Europe means that the putative Asian ancestral tale type would have to had to have spread west long before the opening of trade routes in the twelfth to fourteenth centuries, as suggested by Haar [51].

Second, if ATU and ATU are more closely related to each other than they are to the East Asian tales, they would be expected to share derived characters i.

However, there is not a single characteristic shared by these two tale types that does not also occur in the East Asian group.

If that were the case, we would expect earlier versions of ATU and ATU to be more similar to the East Asian tales than later variants, as original elements of the story would be lost or substituted as each tradition evolved.

However, this prediction is contradicted by the available chronological data on the tales' histories. In ATU , this test first appears in a version of the fable recorded in the fourteenth century [46] , and is lacking in the original version.

An almost identical episode occurs in variants of The Wolf and the Kids and is also present in the African tales , in which the wolf drinks something or cuts his tongue to smooth out his voice.

However, it does not appear in any recorded versions prior to the publication of the Grimms' Children and Household Tales in [1]. The latter trait has excited particular interest among folklorists, since it occurs in the oral tale The Story of Grandmother and not in Little Red Riding Hood where the girl gets eaten.

To investigate the evolution of these similarities more rigorously, the ancestral states of the traits discussed above were reconstructed on the tale phylogenies see Methods for details.

The results are shown in Table 1 below. An alternative — and, to the best of this author's knowledge, novel — explanation for the relationship of the East Asian tales to ATU and ATU is that the former is derived from the latter two, rather than vice versa.

This hypothesis would account for the finding that important traits shared by the East Asian tales and Little Red Riding Hood and The Wolf and the Kids are not ancestral, suggesting that they were borrowed instead.

Given the number and striking nature of these resemblances, it seems unlikely that they could have evolved independently.

Borrowing is also consistent with patterns of conflicting signal in the NeighbourNet graph, which appear to be especially prevalent around the East Asian group.

This impression is confirmed by a comparison of taxon-specific delta scores and Q-residuals, which are higher on average for the East Asian tales than other tales.

The average delta score of the East Asian tales is 0. To investigate this hypothesis further, another set of analyses were carried out in which the East Asian tales were removed from the data along with the characters that were only present in this group.

It was reasoned that if these tales evolved by blending together elements of ATU and ATU then their removal should result in a more phylogenetically robust distinction between these two groups.

This prediction was tested by maximum parsimony bootstrapping and Bayesian inference. For reference, consensus trees derived from both analyses are presented in the Supporting Information, together with a NeighbourNet graph excluding the East Asian tales Figures S1 , S2 and S3.

Thus, both analyses indicate that the East Asian tales are a source of conflicting signal in the data, in line with the hybridisation hypothesis.

While on current evidence this appears to be the best available explanation for the relationships between the East Asian group and ATU and ATU , questions remain about how, where and when the latter two tale types were adopted and combined.

Based on the similarities described above, it seems likely to have occurred sometime between the origin of the lineage leading to Little Red Riding Hood and The Story of Grandmother, but before the publication of Perrault's classic tale in Given the antiquity and wide geographic diffusion of The Wolf and the Kids, it is certainly plausible that ATU would have also reached China by this time, perhaps between the twelfth and fourteenth centuries, i.

Given the current state of the evidence, such scenarios are necessarily speculative. However, the digitisation and translation of an ever increasing number of folklore collections from Asia, as well as other regions, promise to yield a wealth of new data with which to investigate these questions more thoroughly in the future.

In the meantime, this case study has shown that phylogenetic methods provide powerful tools for analysing cross-cultural relationships among folktales that can be used to classify groups based on common ancestry, reconstruct their evolutionary histories, and identify patterns of contamination and hybridisation across traditions.

While these goals are clearly of crucial importance to comparative studies of folklore, they also have potentially exciting applications in other fields too.

As previous researchers have pointed out, the faithful transmission of narratives over many generations and across cultural and linguistic barriers is a rich source of evidence about the kinds of information that we find memorable and motivated to pass on to others [9] [53] [54].

In the present case, stories like Little Red Riding Hood, The Tiger Grandmother and The Wolf and the Kids would seem to embody several features identified in experimental studies as important cognitive attractors in cultural evolution.

Equally, these methods could be applied to explore how tales are influenced by cultural, rather than psychological, selection pressures.

Such an analysis might address whether local modifications of different tale-types exhibit consistent patterns, and see if they covary with specific ecological, political or religious variables.

Future work on these questions promises to generate important insights into the evolution of oral traditions, and open new lines of communication between anthropologists, psychologists, biologists and literary scholars.

Cladistic analysis employs a branching model of evolution that clusters taxa on the basis of shared derived evolutionarily novel traits.

To search for the most parsimonious tree MPT , the present analysis employed an efficient tree-bisection-reconnection algorithm implemented by the heuristic search option in PAUP 4 [57] , carrying out 1, replications to ensure a thorough exploration of tree-space.

The RI is a measure of how well similarities among a group of taxa can be explained by the retention of shared derived traits on a given tree [34].

A maximum RI of 1 indicates that all similarities can be interpreted as shared derived traits, without requiring additional explanations, such as losses, independent evolution or borrowing.

As the contribution of these latter processes increase, generating similarities that conflict with the tree, the RI will approach 0.

Maximum parsimony bootstrapping is a technique for measuring support for individual clades [58]. It involves carrying out cladistic analyses of pseudoreplicate datasets generated by randomly resampling characters with replacement from the original matrix.

Support for the clades returned by the original analysis is then estimated by calculating the frequency with which they occur in the most parsimonious trees obtained from the pseudoreplicates.

The bootstrap analyses reported here were carried out in PAUP 4 [57] using heuristic searches of 1, replicates. Bayesian inference proceeds by calculating the likelihood of the data given an initially random tree topology, set of branch lengths and model of character evolution, and iteratively modifies each of these parameters in a Markov Chain Monte Carlo MCMC simulation.

Moves that improve the likelihood of the data are always accepted, while those that do not are usually rejected although some may occasionally be accepted within a certain threshold so as to avoid getting trapped in local optima.

