Punctuation in literature of major languages is intriguingly mathematical

A moment’s hesitation… Yes, a full stop here—but shouldn’t there be a comma there? Or would a hyphen be better? Punctuation can be a nuisance; it is often simply neglected. Wrong! The most recent statistical analyses paint a different picture: punctuation seems to “grow out” of the foundations shared by all the (examined) languages, and its features are far from trivial.

To many, punctuation appears as a necessary evil, to be happily ignored whenever possible. Recent analyses of literature written in the world’s current major languages require us to alter this opinion. In fact, the same statistical features of punctuation usage patterns have been observed in several hundred works written in seven, mainly Western, languages.

Punctuation, all ten representatives of which can be found in the introduction to this text, turns out to be a universal and indispensable complement to the mathematical perfection of every language studied. Such a remarkable conclusion about the role of mere commas, exclamation marks or full stops comes from an article by scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow, published in the journal Chaos, Solitons & Fractals.

“The present analyses are an extension of our earlier results on the multifractal features of sentence length variation in works of world literature. After all, what is sentence length? It is nothing more than the distance to the next specific punctuation mark— the full stop. So now we have taken all punctuation marks under a statistical magnifying glass, and we have also looked at what happens to punctuation during translation,” says Prof. Stanislaw Drozdz (IFJ PAN, Cracow University of Technology).

Two sets of texts were studied. The main analyses concerning punctuation within each language were carried out on 240 highly popular literary works written in seven major Western languages: English (44), German (34), French (32), Italian (32), Spanish (32), Polish (34) and Russian (32). This particular selection of languages was based on a criterion: the researchers assumed that no fewer than 50 million people should speak the language in question, and that the works written in it should have been awarded no fewer than five Nobel Prizes for Literature.

In addition, for the statistical validity of the research results, each book had to contain at least 1,500 word sequences separated by punctuation marks. A separate collection was prepared to observe the stability of punctuation in translation. It contained 14 works, each of which was available in each of the languages studied (two of the 98 language versions, however, were omitted due to their unavailability).

In total, authors in both collections included such writers as Conrad, Dickens, Doyle, Hemingway, Kipling, Orwell, Salinger, Woolf, Grass, Kafka, Mann, Nietzsche, Goethe, La Fayette, Dumas, Hugo, Proust, Verne, Eco, Cervantes, Sienkiewicz or Reymont.

The attention of the Cracow researchers was primarily drawn to the statistical distribution of the distance between consecutive punctuation marks. It soon became evident that in all the languages studied, it was best described by one of the precisely defined variants of the Weibull distribution.

A curve of this type has a characteristic shape: it grows rapidly at first and then, after reaching a maximum value, descends somewhat more slowly to a certain critical value, below which it reaches zero with small and constantly decreasing dynamics. The Weibull distribution is usually used to describe survival phenomena (e.g. population as a function of age), but also various physical processes, such as increasing fatigue of materials.

“The concordance of the distribution of word sequence lengths between punctuation marks with the functional form of the Weibull distribution was better the more types of punctuation marks we included in the analyses; for all marks the concordance turned out to be almost complete. At the same time, some differences in the distributions are apparent between the different languages, but these merely amount to the selection of slightly different values for the distribution parameters, specific to the language in question. Punctuation thus seems to be an integral part of all the languages studied,” notes Prof. Drozdz.

After a moment he adds with some amusement: “…and since the Weibull distribution is concerned with phenomena such as survival, it can be said with not too much tongue-in-cheek that punctuation has in its nature a literally embedded struggle for survival.”

The next stage of the analyses consisted of determining the hazard function. In the case of punctuation, it describes how the conditional probability of success—i.e., the probability of the next punctuation mark—changes if no such mark has yet appeared in the analysed sequence.

The results here are clear: the language characterized by the lowest propensity to use punctuation is English, with Spanish not far behind; Slavic languages proved to be the most punctuation-dependent. The hazard function curves for punctuation marks in the six languages studied appeared to follow a similar pattern, they differed mainly in vertical shift.

German proved to be the exception. Its hazard function is the only one that intersects most of the curves constructed for the other languages. German punctuation thus seems to combine the punctuation features of many languages, making it a kind of Esperanto punctuation.

The above observation dovetails with the next analysis, which was to see whether the punctuation features of original literary works can be seen in their translations. As expected, the language most faithfully transforming punctuation from the original language to the target language turned out to be German.

In spoken communication, pauses can be justified by human physiology, such as the need to catch one’s breath or to take a moment to structure what is to be said next in one’s mind. And in written communication?

“Creating a sentence by adding one word after another while ensuring that the message is clear and unambiguous is a bit like tightening the string of a bow: it is easy at first, but becomes more demanding with each passing moment. If there are no ordering elements in the text (and this is the role of punctuation), the difficulty of interpretation increases as the string of words lengthens. A bow that is too tight can break, and a sentence that is too long can become unintelligible. Therefore, the author is faced with the necessity of ‘freeing the arrow’, i.e. closing a passage of text with some sort of punctuation mark. This observation applies to all the languages analysed, so we are dealing with what could be called a linguistic law,” states Dr. Tomasz Stanisz (IFJ PAN), first author of the article in question.

Finally, it is worth noting that the invention of punctuation is relatively recent—punctuation marks did not occur at all in old texts. The emergence of optimal punctuation patterns in modern written languages can therefore be interpreted as the result of their evolutionary advancement. However, the excessive need for punctuation is not necessarily a sign of such sophistication.

English and Spanish, contemporarily the most universal languages, appear, in the light of the above studies, to be less strict about the frequency of punctuation use. It is likely that these languages are so formalized in terms of sentence construction that there is less room for ambiguity that would need to be resolved with punctuation marks.

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to The Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences


Theoretical biologists test two modes of social reasoning and find surprising truths in simplicity

Imagine a small village where every action someone takes, good or bad, is quietly followed by ever-attentive, nosy neighbours. An individual’s reputation is built through these actions and observations, which determines how others will treat them. They help a neighbour and are likely to receive help from others in return; they turn their back on a neighbour and find themselves isolated. But what happens when people make mistakes, when good deeds go unnoticed, or errors lead to unjust blame?

Here, the study of behaviour intersects with Bayesian and abductive reasoning, says Erol Akçay, a theoretical biologist at the University of Pennsylvania’s School of Arts & Sciences.

Bayesian reasoning refers to a method for assessing probability, in which individuals use prior knowledge paired with new evidence to update their beliefs or estimates about a certain condition, in this case the reputation of other villagers. While abductive reasoning involves a simple “what you see is what you get” approach to rationalizing and making a decision, Akçay says.

In two papers, one published in PLoS Computational Biology and the other in the Journal of Theoretical Biology, researchers from the Department of Biology explored how these reasoning strategies can be effectively modeled and applied to enhance biologists’ understanding of social dynamics.

Making the educated guess

The PLoS Computational Biology paper investigates how Bayesian statistical methods can be used to weigh the likelihood of errors and align the judgments of actors within a social network with a more nuanced understanding of reputation. “It’s something we may commonly do when we’re trying to offer up an explanation for some phenomena with no obvious, straightforward, or intuitive solution,” Akçay says.

Bryce Morsky, a co-author on both papers and now an assistant professor at Florida State University, began the work during his postdoctoral research in Akçay’s lab. He says that he initially believed that accounting for errors in judgment could substantially enhance the reward-and-punishment system that underpins cooperation and that he expected that a better understanding of these errors and incorporating them into the model would promote more effective cooperation.

“Essentially, the hypothesis was that reducing errors would lead to a more accurate assessment of reputations, which would in turn foster cooperation,” he says.

The team developed a mathematical model to simulate Bayesian reasoning. It involved a game-theoretical model where individuals interact within a framework of donation-based encounters. Other individuals in the simulation assess the reputations of actors based on their actions, influenced by several predefined social norms.

In the context of the village, this means judging each villager by their actions—whether helping another (good) or failing to do so (bad)—but also taking into account their historical reputation and the potential that you didn’t assess correctly.

“So, for example, if you observe someone behaving badly, but you thought they were good before, you keep an open mind that you perhaps didn’t see correctly. This allows for a nuanced calculation of reputation updates,” Morsky says. He and colleagues use this model to see how errors and reasoning would affect the villagers’ perception and social dynamics.

The five key social norms the study explores are: Scoring, Shunning, Simple Standing, Staying, and Stern Judging; each affects the reputation and subsequent behaviour of individuals differently, altering the evolutionary outcomes of cooperative strategies.

“In some scenarios, particularly under Scoring, Bayesian reasoning improved cooperation, Morsky says. “But under other norms, like Stern judging, it generally resulted in less cooperation due to stricter judgment criteria.”

Morsky explains that under Scoring a simple rule is applied: It is good to cooperate (give) and bad to defect (not give), regardless of the recipient’s reputation. Whereas under Stern judging not only are the actions of individuals considered, but their decisions are also critically evaluated based on the reputation of the recipient.

In the context of the nosy-neighbours scenario, if a villager decides to help another, this action is noted positively under Scoring, regardless of who receives the help or their standing in the village. Conversely, under Stern Judging if a villager chooses to help someone with a bad reputation it is noted negatively, the researchers say.

He adds that lack of cooperation was particularly evident in norms where Bayesian reasoning led to less tolerance for errors, which could exacerbate disagreements about reputations instead of resolving them. This, coupled with the knowledge that humans do not weigh all the relevant information prior to deciding who to work with, prompted Akçay and Morsky to investigate other modes of reasoning.

More than just a hunch

While working in Akçay’s lab, Morsky recruited Neel Pandula, then a sophomore in high school. “We met through the Penn Laboratory Experience in the Natural Sciences program,” Morsky says. “In light of the Bayesian reasoning model, Neel proposed abductive reasoning as another approach to modeling reasoning, and so we got to writing that paper for the Journal of Theoretical Biology, which he became first author of.”

Pandula, now a first-year student in the College of Arts and Sciences, explains that he and Morsky used Dempster-Shafer Theory—a probabilistic framework to infer best explanations—to form the basis of their approach.

“What’s key here is that Dempter-Shafer Theory allows for a bit of flexibility in handling uncertainty and allows for integrating new evidence into existing belief systems without fully committing to a single hypothesis unless the evidence is strong,” Pandula says.

For instance, the researchers explain, in a village, seeing a good person help another good person aligns with social norms and is readily accepted by observers. However, if a villager known as bad is seen helping a good person, it contradicts these norms, leading observers to question the reputations involved or the accuracy of their observation. Then they use the rules of abductive reasoning, specifically the Dempster-Shafer theory, considering error rates and typical behaviours to determine the most likely truth behind the unexpected action.

The team anticipated that abductive reasoning would handle errors in reputationassessments more effectively, especially in public settings in which individuals may be pressured one way or another resulting in discrepancies and errors. Under Scoring and the other norms, they found that abductive reasoning could better foster cooperation than Bayesian in public settings.

Akçay says that it came as a bit of a surprise to see that in navigating social networks, such a simple “cognitively ‘cheap, lazy’ reasoning mechanism proves this effective at dealing with the challenges associated with indirect reciprocity.”

Morsky notes that in both models the researchers chose not to factor in any cost of a cognitive burden. “You’d hope that performing a demanding task like remembering which individuals did what and using that to inform you on what they’re likely to do next would yield some positive, prosocial outcome. Yet even if you make this effort costless, under Bayesian reasoning, it generally undermines cooperation.”

As a follow up, the researchers are interested in exploring how low-cost reasoning methods, like abductive reasoning, can be evolutionarily favoured in larger, more complex social circles. And they are interested in applying these reasoning methods to other social systems.

 

 

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to Nathi Magubane, University of Pennsylvania