Magic numbers: the beauty of decimal notation

While adding up your grocery bill in the supermarket, you’re probably not thinking how important or sophisticated our number system is.

But the discovery of the present system, by unknown mathematicians in India roughly 2,000 years ago – and shared with Europe from the 13th century onwards – was pivotal to the development of our modern world.

Now, what if our “decimal” arithmetic, often called the Indo-Arabic system, had been discovered earlier? Or what if it had been shared with the Western world earlier than the 13th century?

First, let’s define “decimal” arithmetic: we’re talking about the combination of zero, the digits one through nine, positional notation, and efficient rules for arithmetic.

“Positional notation” means that the value represented by a digit depends both on its value and position in a string of digits.

Thus 7,654 means:

(7 × 1000) + (6 × 100) + (5 × 10) + 4 = 7,654

The benefit of this positional notation system is that we need no new symbols or calculation schemes for tens, hundreds or thousands, as was needed when manipulating Roman numerals.

While numerals for the counting numbers one, two and three were seen in all ancient civilisations – and some form of zero appeared in two or three of those civilisations (including India) – the crucial combination of zero and positional notation arose only in India and Central America.

Importantly, only the Indian system was suitable for efficient calculation.

Positional arithmetic can be in base-ten (or decimal) for humans, or in base-two (binary) for computers.

In binary, 10101 means:

(1 × 16) + (0 × 8) + (1 × 4) + (0 × 2) + 1

Which, in the more-familiar decimal notation, is 21.

The rules we learned in primary school for addition, subtraction, multiplication and division can be easily extended to binary.

The binary system has been implemented in electronic circuits on computers, mostly because the multiplication table for binary arithmetic is much simpler than the decimal system.

Of course, computers can readily convert binary results to decimal notation for us humans.

As easy as counting from one to ten

Perhaps because we learn decimal arithmetic so early, we consider it “trivial”.

Indeed, the discovery of decimal arithmetic is given disappointingly brief mention in most western histories of mathematics.

In reality, decimal arithmetic is anything but “trivial” since it eluded the best minds of the ancient world including Greek mathematical super-genius Archimedes of Syracuse.

Archimedes – who lived in the 3rd century BCE – saw far beyond the mathematics of his time, even anticipating numerous key ideas of modern calculus. He also used mathematics in engineering applications.

Nonetheless, he used a cumbersome Greek numeral system that hobbled his calculations.

Imagine trying to multiply the Roman numerals XXXI (31) and XIV (14).

First, one must rewrite the second argument as XIIII, then multiply the second by each letter of the first to obtain CXXXX CXXXX CXXXX XIIII.

These numerals can then be sorted by magnitude to arrive at CCCXXXXXXXXXXXXXIIII.

This can then be rewritten to yield CDXXXIV (434).

(For a bit of fun, try adding MCMLXXXIV and MMXI. First person to comment with the correct answer and their method gets a jelly bean.)

Thus, while possible, calculation with Roman numerals is significantly more time-consuming and error prone than our decimal system (although it is harder to alter the amount payable on a Roman cheque).

History lesson

Although decimal arithmetic was known in the Arab world by the 9th century, it took many centuries to make its way to Europe.

Italian mathematician Leonardo Fibonacci travelled the Mediterranean world in the 13th century, learning from the best Arab mathematicians of the time. Even then, it was several more centuries until decimal arithmetic was fully established in Europe.

Johannes Kepler and Isaac Newton – both giants in the world of physics – relied heavily on extensive decimal calculations (by hand) to devise their theories of planetary motion.

In a similar way, present-day scientists rely on massive computer calculations to test hypotheses and design products. Even our mobile phones do surprisingly sophisticated calculations to process voice and video.

But let us indulge in some alternate history of mathematics. What if decimal arithmetic had been discovered in India even earlier, say 300 BCE? (There are indications it was known by this date, just not well documented.)

And what if a cultural connection along the silk-road had been made between Indian mathematicians and Greek mathematicians at the time?

Such an exchange would have greatly enhanced both worlds, resulting in advances beyond the reach of each system on its own.

For example, a fusion of Indian arithmetic and Greek geometry might well have led to full-fledged trigonometry and calculus, thus enabling ancient astronomers to deduce the laws of motion and gravitation nearly two millennia before Newton.

In fact, the combination of mathematics, efficient arithmetic and physics might have accelerated the development of modern technology by more than two millennia.

It is clear from history that without mathematics, real progress in science and technology is not possible (try building a mobile phone without mathematics). But it’s also clear that mathematics alone is not sufficient.

The prodigious computational skills of ancient Indian mathematicians never flowered into advanced technology, nor did the great mathematical achievements of the Greeks, or many developments in China.

On the other hand, the Romans, who were not known for their mathematics, still managed to develop some impressive technology.

But a combination of advanced mathematics, computation, and technology makes a huge difference.

Our bodies and our brains today are virtually indistinguishable from those of ancient times.

With the earlier adoption of Indo-Arabic decimal arithmetic, the modern technological world of today might – for better or worse – have been achieved centuries ago.

And that’s something worth thinking about next time you’re out grocery shopping.

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to Jonathan Borwein (Jon), University of Newcastle and David H. Bailey, University of California, Davis


How do neural networks learn? A mathematical formula explains how they detect relevant patterns

Neural networks have been powering breakthroughs in artificial intelligence, including the large language models that are now being used in a wide range of applications, from finance, to human resources to health care. But these networks remain a black box whose inner workings engineers and scientists struggle to understand.

Now, a team led by data and computer scientists at the University of California San Diego has given neural networks the equivalent of an X-ray to uncover how they actually learn.

The researchers found that a formula used in statistical analysis provides a streamlined mathematical description of how neural networks, such as GPT-2, a precursor to ChatGPT, learn relevant patterns in data, known as features. This formula also explains how neural networks use these relevant patterns to make predictions.

“We are trying to understand neural networks from first principles,” said Daniel Beaglehole, a Ph.D. student in the UC San Diego Department of Computer Science and Engineering and co-first author of the study. “With our formula, one can simply interpret which features the network is using to make predictions.”

The team present their findings in the journal Science.

Why does this matter? AI-powered tools are now pervasive in everyday life. Banks use them to approve loans. Hospitals use them to analyse medical data, such as X-rays and MRIs. Companies use them to screen job applicants. But it’s currently difficult to understand the mechanism neural networks use to make decisions and the biases in the training data that might impact this.

“If you don’t understand how neural networks learn, it’s very hard to establish whether neural networks produce reliable, accurate, and appropriate responses,” said Mikhail Belkin, the paper’s corresponding author and a professor at the UC San Diego Halicioglu Data Science Institute. “This is particularly significant given the rapid recent growth of machine learning and neural net technology.”

The study is part of a larger effort in Belkin’s research group to develop a mathematical theory that explains how neural networks work. “Technology has outpaced theory by a huge amount,” he said. “We need to catch up.”

The team also showed that the statistical formula they used to understand how neural networks learn, known as Average Gradient Outer Product (AGOP), could be applied to improve performance and efficiency in other types of machine learning architectures that do not include neural networks.

“If we understand the underlying mechanisms that drive neural networks, we should be able to build machine learning models that are simpler, more efficient and more interpretable,” Belkin said. “We hope this will help democratize AI.”

The machine learning systems that Belkin envisions would need less computational power, and therefore less power from the grid, to function. These systems also would be less complex and so easier to understand.

Illustrating the new findings with an example

(Artificial) neural networks are computational tools to learn relationships between data characteristics (i.e. identifying specific objects or faces in an image). One example of a task is determining whether in a new image a person is wearing glasses or not. Machine learning approaches this problem by providing the neural network many example (training) images labeled as images of “a person wearing glasses” or “a person not wearing glasses.”

The neural network learns the relationship between images and their labels, and extracts data patterns, or features, that it needs to focus on to make a determination. One of the reasons AI systems are considered a black box is because it is often difficult to describe mathematically what criteria the systems are actually using to make their predictions, including potential biases. The new work provides a simple mathematical explanation for how the systems are learning these features.

Features are relevant patterns in the data. In the example above, there are a wide range of features that the neural networks learns, and then uses, to determine if in fact a person in a photograph is wearing glasses or not.

One feature it would need to pay attention to for this task is the upper part of the face. Other features could be the eye or the nose area where glasses often rest. The network selectively pays attention to the features that it learns are relevant and then discards the other parts of the image, such as the lower part of the face, the hair and so on.

Feature learning is the ability to recognize relevant patterns in data and then use those patterns to make predictions. In the glasses example, the network learns to pay attention to the upper part of the face. In the new Science paper, the researchers identified a statistical formula that describes how the neural networks are learning features.

Alternative neural network architectures: The researchers went on to show that inserting this formula into computing systems that do not rely on neural networks allowed these systems to learn faster and more efficiently.

“How do I ignore what’s not necessary? Humans are good at this,” said Belkin. “Machines are doing the same thing. Large Language Models, for example, are implementing this ‘selective paying attention’ and we haven’t known how they do it. In our Science paper, we present a mechanism explaining at least some of how the neural nets are ‘selectively paying attention.'”

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to University of California – San Diego