Can bigger-is-better ‘scaling laws’ keep AI improving forever? History says we can’t be too sure

Milad Fakurian / Unsplash

OpenAI chief executive Sam Altman – perhaps the most prominent face of the artificial intelligence (AI) boom that accelerated with the launch of ChatGPT in 2022 – loves scaling laws.

These widely admired rules of thumb linking the size of an AI model with its capabilities inform much of the headlong rush among the AI industry to buy up powerful computer chips, build unimaginably large data centres, and re-open shuttered nuclear plants.

As Altman argued in a blog post earlier this year, the thinking is that the “intelligence” of an AI model “roughly equals the log of the resources used to train and run it” – meaning you can steadily produce better performance by exponentially increasing the scale of data and computing power involved.

First observed in 2020 and further refined in 2022, the scaling laws for large language models (LLMs) come from drawing lines on charts of experimental data. For engineers, they give a simple formula that tells you how big to build the next model and what performance increase to expect.

Will the scaling laws keep on scaling as AI models get bigger and bigger? AI companies are betting hundreds of billions of dollars that they will – but history suggests it is not always so simple.

Scaling laws aren’t just for AI

Scaling laws can be wonderful. Modern aerodynamics is built on them, for example.

Using an elegant piece of mathematics called the Buckingham π theorem, engineers discovered how to compare small models in wind tunnels or test basins with full-scale planes and ships by making sure some key numbers matched up.

Those scaling ideas inform the design of almost everything that flies or floats, as well as industrial fans and pumps.

Another famous scaling idea underpinned the boom decades of the silicon chip revolution. Moore’s law – the idea that the number of the tiny switches called transistors on a microchip would double every two years or so – helped designers create the small, powerful computing technology we have today.

But there’s a catch: not all “scaling laws” are laws of nature. Some are purely mathematical and can hold indefinitely. Others are just lines fitted to data that work beautifully until you stray too far from the circumstances where they were measured or designed.

When scaling laws break down

History is littered with painful reminders of scaling laws that broke. A classic example is the collapse of the Tacoma Narrows Bridge in 1940.

The bridge was designed by scaling up what had worked for smaller bridges to something longer and slimmer. Engineers assumed the same scaling arguments would hold: if a certain ratio of stiffness to bridge length worked before, it should work again.

Instead, moderate winds set off an unexpected instability called aeroelastic flutter. The bridge deck tore itself apart, collapsing just four months after opening.

Likewise, even the “laws” of microchip manufacturing had an expiry date. For decades, Moore’s law (transistor counts doubling every couple of years) and Dennard scaling (a larger number of smaller transistors running faster while using the same amount of power) were astonishingly reliable guides for chip design and industry roadmaps.

As transistors became small enough to be measured in nanometres, however, those neat scaling rules began to collide with hard physical limits.

When transistor gates shrank to just a few atoms thick, they started leaking current and behaving unpredictably. The operating voltages could also no longer be reduced with being lost in background noise.

Eventually, shrinking was no longer the way forward. Chips have still grown more powerful, but now through new designs rather than just scaling down.

Laws of nature or rules of thumb?

The language-model scaling curves that Altman celebrates are real, and so far they’ve been extraordinarily useful.

They told researchers that models would keep getting better if you fed them enough data and computing power. They also showed earlier systems were not fundamentally limited – they just hadn’t had enough resources thrown at them.

But these are undoubtedly curves that have been fit to data. They are less like the derived mathematical scaling laws used in aerodynamics and more like the useful rules of thumb used in microchip design – and that means they likely won’t work forever.

The language model scaling rules don’t necessarily encode real-world problems such as limits to the availability of high-quality data for training, or the difficulty of getting AI to deal with novel tasks – let alone safety constraints or the economic difficulties of building data centres and power grids. There is no law of nature or theorem guaranteeing that “intelligence scales” forever.

Investing in the curves

So far, the scaling curves for AI look pretty smooth – but the financial curves are a different story.

Deutsche Bank recently warned of an AI “funding gap” based on Bain Capital estimates of a US$800 billion mismatch between projected AI revenues and the investment in chips, data centres and power that would be needed to keep current growth going.

JP Morgan, for their part, has estimated that the broader AI sector might need around US$650 billion in annual revenue just to earn a modest 10% return on the planned build-out of AI infrastructure.

We’re still finding out which kind of law governs frontier LLMs. The realities may keep playing along with the current scaling rules; or new bottlenecks – data, energy, users’ willingness to pay – may bend the curve.

Altman’s bet is that the LLM scaling laws will continue. If that’s so, it may be worth building enormous amounts of computing power because the gains are predictable. On the other hand, the banks’ growing unease is a reminder that some scaling stories can turn out to be Tacoma Narrows: beautiful curves in one context, hiding a nasty surprise in the next.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Nathan Garland*

 


Girls and boys solve math problems differently – with similar short-term results but different long-term outcomes

Math teachers have to accommodate high school students’ different approaches to problem-solving. RJ Sangosti/MediaNews Group/The Denver Post via Getty Images

Among high school students and adults, girls and women are much more likely to use traditional, step-by-step algorithms to solve basic math problems – such as lining up numbers to add, starting with the ones place, and “carrying over” a number when needed. Boys and men are more likely to use alternative shortcuts, such as rounding both numbers, adding the rounded figures, and then adjusting to remove the rounding.

But those who use traditional methods on basic problems are less likely to solve more complex math problems correctly. These are the main findings of two studies our research team published in November 2025.

This new evidence may help explain an apparent contradiction in the existing research – girls do better at math in school, but boys do better on high-stakes math tests and are more likely to pursue math-intensive careers. Our research focuses not just on getting correct answers, but on the methods students use to arrive at them. We find that boys and girls approach math problems differently, in ways that persist into adulthood.

A possible paradox

In a 2016 study of U.S. elementary students, boys outnumbered girls 4 to 1 among the top 1% of scorers on a national math test. And over many decades, boys have been about twice as likely as girls to be among the top scorers on the SAT and AP math exams.

However, girls tend to be more diligent in elementary school and get better grades in math class throughout their schooling. And girls and boys across the grades tend to score similarly on state math tests, which tend to be more aligned with the school curriculum and have more familiar problems than the SAT or other national tests.

Beyond grades and test scores, the skills and confidence acquired in school carry far beyond, into the workforce. In lucrative STEM occupations, such as computer science and engineering, men outnumber women 3 to 1. Researchers have considered several explanations for this disparity, including differences in math confidence and occupational values, such as prioritizing helping others or making money. Our study suggests an additional factor to consider: gender differences in approaches to math problems.

When older adults think of math, they may recall memorizing times tables or doing the tedious, long-division algorithm. Memorization and rule-following can pay off on math tests focused on procedures taught in school. But rule-following has its limits and seems to provide more payoff among low-achieving than high-achieving students in classrooms.

More advanced math involves solving new, perplexing problems rather than following rules.

Math can be creative, not rote. AP Photo/Jacquelyn Martin

Differing strategies

In looking at earlier studies of young children, our research team was struck by findings that young boys use more inventive strategies on computation problems, whereas girls more often use standard algorithms or counting. We wondered whether these differences disappear after elementary school, or whether they persist and relate to gender disparities in more advanced math outcomes.

In an earlier study, we surveyed students from two high schools with different demographic characteristics to see whether they were what we called bold problem-solvers. We asked them to rate how much they agreed or disagreed with specific statements, such as “I like to think outside the box when I solve math problems.” Boys reported bolder problem-solving tendencies than girls did. Importantly, students who reported bolder problem-solving tendencies scored higher on a math problem-solving test we administered.

Our newer studies echo those earlier results but reveal more specifics about how boys and girls, and men and women, approach basic math problems.

Algorithms and teacher-pleasing

In the first study, we gave three questions to more than 200 high school students: “25 x 9 = ___,” “600 – 498 = ___,” and “19 + 47 + 31 = ___.” Each question could be solved with a traditional algorithm or with a mental shortcut, such as solving 25 x 9 by first multiplying 25 x 8 to get 200 and then adding the final 25 to get 225.

Regardless of their gender, students were equally likely to solve these basic computation items correctly. But there was a striking gender difference in how they arrived at that answer. Girls were almost three times as likely as boys – 52% versus 18% – to use a standard algorithm on all three items. Boys were far more likely than girls – 51% versus 15% – to never use an algorithm on the questions.

Girls were far more likely than boys to use an algorithm

When given three basic math problems, high school girls were three times more likely than boys to use a standard algorithm to solve all three. High school boys were nearly three times more likely than girls to use an alternative strategy for all three problems.

We suspected that girls’ tendency to use algorithms might stem from greater social pressure toward compliance, including complying with traditional teacher expectations.

So, we also asked all the students eight questions to probe how much they try to please their teachers. We also wanted to see whether algorithm use might relate to gender differences in more advanced problem-solving, so we gave students several complex math problems from national tests, including the SAT.

As we suspected, we found that girls were more likely to report a desire to please teachers, such as by completing work as directed. Those who said they did have that desire used the standard algorithm more often.

Also, the boys in our sample scored higher than the girls on the complex math problems. Importantly, even though students who used algorithms on the basic computation items were just as likely to compute these items correctly, algorithm users did worse on the more complex math problems.

Continuing into adulthood

In our second study, we gave 810 adults just one problem: “125 + 238 = ___.” We asked them to add mentally, which we expected would discourage them from using an algorithm. Again, there was no gender difference in answering correctly.

But 69% of women, compared to 46% of men, reported using the standard algorithm for their mental calculation, rather than using another strategy entirely.

We also gave the adults a more advanced problem-solving test, this time focused on probability-related reasoning, such as the chances that rolling a seven-sided die would result in an even number. Similar to our first study, women and those who used the standard algorithm on the computation problem performed worse on the reasoning test.

The importance of inventiveness

We identified some factors that may play a role in these gender differences, including spatial-thinking skills, which may help people develop alternate calculation approaches. Anxiety about taking tests and perfectionism, both more prevalent among women, may also be a factor.

We are also interested in the power of gender-specific social pressures on girls. National data has shown that young girls exhibit more studious behavior than do boys. And the high school girls we studied were more likely than boys to report they made a specific effort to meet teachers’ expectations.

More research definitely is needed to better understand this dynamic, but we hypothesize that the expectation some girls feel to be compliant and please others may drive teacher-pleasing tendencies that result in girls using algorithms more frequently than boys, who are more socialized to be risk-takers.

While compliant behavior and standard math methods often lead to correct answers and good grades in school, we believe schools should prepare all students – regardless of gender – for when they face unfamiliar problems that require inventive problem-solving skills, whether in daily life, on high-stakes tests or in math-intensive professions.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Sarah Lubienski, Colleen Ganley & Martha Makowski*


One university boosted gender diversity in advanced maths by over 30% in 5 years – here’s how

ThisIsEngineering/Pexels

As the artificial intelligence (AI) and quantum computing industries explode, trained STEM professionals are in high demand. Mathematics is foundational to these fields.

But mathematics is missing an important ingredient: people who are female or gender-diverse.

In New South Wales, for example, only one-third of high school graduates who complete mathematics at the highest level are female or gender-diverse. And when students choose university courses in December, a large proportion of these highly qualified people will step away from mathematics and STEM.

Australia cannot stay competitive by only accessing half of its young talent. By leaving mathematics early, young women and gender-diverse people limit their own career opportunities. Worse, the new technologies resulting from the current revolutions may not serve broader society well, if women and gender-diverse people are not involved in their development.

But at the University of Sydney over the past five years we have run a successful pilot program to reverse this trend – and to empower young women to make informed career choices. Better, the program is cheap to run and can be easily adopted elsewhere so mathematics – and the many industries it underpins – can be more diverse in ways that benefit everyone, regardless of their gender.

Declining enrolments

Before 2020, female and gender-diverse enrolments in advanced mathematics at the University of Sydney were in decline.

In 2020 the incoming cohort was nearly 80% male. Non-STEM directions offer attractive and important career options, and some movement between specialisations is expected. But a nosedive from 35% female students at the end of high school to 22% at the start of university indicates a problem.

Over five years, a team I lead piloted an intervention which has increased the ratio of female and gender-diverse students in advanced first-year mathematics from 22% to 30% – nearly back to the high school levels.

Our program consists of two components:

information, personalised invitations, and enrolment advice for incoming female and gender-diverse students, and a mentoring program for female and gender-diverse students who enrol in advanced mathematics.

Targeting the problem from year one

Before the start of semester, we compare first year enrolments with students’ high school certificates and majors. Like in high school, mathematics at the university is offered at multiple parallel levels.

When students are enrolled at a lower level than their background and major would justify, we send personalised emails encouraging them to switch to the advanced level. We hold a welcome event and multiple drop-in sessions, offering tailored advice.

In the mentoring program we match female and gender diverse advanced maths students with groups of eight to twelve peers of mixed year levels. Matching is based on timetables.

Each group is mentored by a senior (Honours or PhD) student, and an academic – at least one of whom is female or gender-diverse. Student mentors bring invaluable insight to the program, as they had walked in the mentees’ shoes only a few years before.

Each year 50–80 students participate in the program, roughly two-thirds of whom are first-year students.

Mentoring groups meet weekly for an hour: sometimes with both mentors, sometimes with the student mentor alone. Meeting topics are loosely structured around academic advice and sharing experiences.

Many groups develop their own agendas organically. The program does not focus on tutoring, though students enjoy discussing key mathematical techniques and concepts.

Fostering community and belonging

At the heart of the program is the opportunity to build community with peers, away from the pressure of assessments. While student feedback on the program is overall enthusiastic, it is a puzzle to maintain engagement with mentoring as semesters get hectic. It is difficult for students to prioritise community building when marks are on the line elsewhere.

We suspected the large drop in female and gender diverse enrolments at the transition to university is at least partly explained by these students’ lack of confidence in their mathematical abilities.

Research shows such insecurities disproportionately affect women. General messaging is ineffective in the face of self-doubt, so we aimed for a personalised but scalable approach.

The mentoring component fosters community and belonging. This combats isolation, provides ongoing support and enables long-term retention.

A low-cost solution

Our program is a low-cost solution that can be implemented in most academic contexts.

The first year of university is a place to start, but it is too late to fully address Australia’s pipeline problem. We can’t expect to have women and gender-diverse students participating in STEM at university in higher numbers than they did at the end of high school.

Similar programs could be put in place in high schools, and personal invitations can even be used to bring more girls to elementary school enrichment programs. This would help boost diverse and equitable participation in STEM from the roots.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Zsuzsanna Dancso*

 


Nine-year-olds in England sit timed multiplication test – but using times tables is about more than quick recall

Halfpoint/Shutterstock

What’s seven times nine? Quick, you’ve got six seconds to answer.

This June, over 600,000 children in England in year four, aged eight and nine, will be expected to answer questions like this. They will be sitting the multiplication tables check (MTC), a statutory assessment of their multiplication fact recall.

The MTC was introduced in 2022 with the aim of driving up standards in mathematics. It’s an online test that children take on a tablet or computer, made up of 25 questions with six seconds per question.

Being able to quickly recall multiplication facts is valuable. Not having to think about seven times nine, just knowing that it’s 63, frees up a child’s mental thinking space. This means they can focus on different aspects of the mathematics they are doing, such as completing multi-step problems or using reasoning to solve context-based problems.

Being able to quickly recall multiplication facts is also the foundation for more advanced mathematics topics that children will encounter at secondary school.

Our research shows that the MTC is an accurate reflection of children’s multiplication fact recall. But the learning they do for this test doesn’t necessarily help them apply this knowledge in other areas of mathematics. What’s more, focus on the MTC may be diverting teaching time away from other maths knowledge.

Since the multiplication tables check was introduced in 2022, the average score in the test has increased year-on-year from 19.8 in 2022 to 20.6 in 2024. This suggests that schools are placing more emphasis on children’s multiplication fact recall – and on preparing them for this test.

Teaching union the NAHT (National Association of Head Teachers) has suggested that the test is unnecessary, and that it places too much emphasis on fact recall at a cost to other areas of mathematics. The union has also expressed concerns that it disadvantages some children for reasons such as digital accessibility.

Our research has investigated whether the MTC is a good way of testing children’s recall of multiplication facts. We have found that children perform just as well on a more traditional paper-and-pencil timed fact test as on a computer test equivalent to the MTC. However, having a time limit per question – which is only possible with a computerised test – is essential to assess recall, rather than fast calculation.

Pupils taking part in the research project. Lisa Gilligan-Lee/University of Nottingham, Author provided (no reuse)

There was no evidence that any children were particularly disadvantaged by the computerised test. However, we did find that children’s attention skills and how quickly they could enter numbers into the tablet they were using did influence their scores.

This suggests that, for it to be a fair test, it is important that children are familiar with the technology they are using to complete the test. Given that there are stark differences in access to technology in schools, this may pose an issue for some children.

The purpose of introducing the MTC was to improve children’s broader mathematics attainment by improving their multiplication fact recall. But performance in the year six Sats tests, which assess a range of mathematical skills, shows little change.

Crucially, improving children’s multiplication fact recall through retrieval practice doesn’t equate to improving their ability to use the multiplication facts they know. If posed a question such as “Tara has seven books. Ravi has four times as many. How many books do they have altogether?” Children who can recall that 5 x 7 = 35 may still not be able to solve the problem.

Time pressure

What’s more, because the MTC is a timed test, teachers and parents may use similar time-pressured approaches to prepare children and help them improve their multiplication fact recall. But our research showed that while practice with a computerised game can support children’s fact recall, the benefits to learning are the same whether or not children are encouraged to answer as quickly as possible.

In research not yet published in a peer-reviewed journal, we found that children who were anxious about mathematics learnt less when practising with time pressure compared to children without mathematics anxiety. Without time pressure, anxiety levels were not related to the amount of learning. Doing some regular multiplication fact retrieval practice is more important than the type of practice, for all learners.

Even though the MTC is a timed assessment, it doesn’t mean that children only need to do timed practice to prepare for this. Some children may benefit more from less time pressure when practising.

Multiplication fact recall is just one element of mathematics and so having a good balance is important. Fact recall and testing should go hand in hand with other areas of mathematics learning such as understanding concepts, choosing strategies and solving applied problems.

Recalling multiplication facts doesn’t automatically help children to apply their knowledge. So, although working towards the multiplication tables check can support fact recall, children will need extra support in knowing how to use and apply these facts.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Camilla Gilmore, Lucy Cragg & Natasha Guy*