Making sports statistics more scientific

Whether it is the sprinter who finished first or the team that scored more points, it’s usually easy to determine who won a sporting event. But finding the statistics that explain why an athlete or team wins is more difficult — and major figures at the intersection of sports and numbers are determined to crack this problem.

Many statistics explain part of the picture, especially in team sports, such as the number of points scored by a point guard, a quarterback’s passing yards, or a slugger’s batting average. But many of these numbers — some of them sacred among sports fans — don’t directly address a player’s contribution to winning. This was a primary topic of discussion last weekend at the Sloan Sports Analytics Conference in Boston.

Organised by students from the MIT Sloan School of Management and sponsored by several sports-related companies, including media outlet ESPN, the conference brought together over 2,200 people to discuss player evaluation and other factors important to the business of sports.

Many of the research presentations and panel discussions described efforts to remove subjective judgments from sports statistics — and how to define new statistics more directly explain a player’s value.

“We have huge piles of statistics now,” said Bill James, Boston Red Sox official and baseball statistics pioneer, at a panel discussion about adding modern statistics to box scores. “What you have to do is reduce it to significant but small concepts,” said James.

New technology and analysis is only now making it possible to learn more about many fundamental events in several sports, which are not often addressed by traditional sports statistics.

“We’re going to talk about stats that work and stats that don’t work,” said John Walsh, executive vice president of ESPN, who moderated the box score panel discussion.

The panel, which also included three other experts, cited several examples of statistics that didn’t work: a receiver might drop a pass for one of several reasons — but rarely are drops broken down into categories; an assist in basketball is a judgment call with room for different interpretations; and fielding percentage in baseball only generally describes a defensive player’s ability.

In another session, Greg Moore, the director of baseball products for the sports graphics and visualization company Sportvision, described recent data-collection advances in baseball. When all the company’s systems are fully deployed in Major League Baseball stadiums, they plan to track the trajectory of each pitch thrown, the movement of all the players on the field and the speed of every swing and hit ball. Their systems, already fully installed in some ballparks, will collect over a million data points at every game. Some of this data is publicly available.

The data will make it possible to say not just that a player hit a double or that he hit a hard line drive, but that the ball left the bat at a certain speed and launch angle and a certain number of degrees from the foul line. No scout or official scorer can contaminate those kinds of measures with subjectivity. On the other hand, a string of objective data is not inherently more useful than a flawed statistic, which may contain useful wisdom.

During the box-score panel discussion, Dean Oliver, ESPN’s sports analytics director, said that collecting information this way opens a new frontier.

“It’s an immense amount of data, but you have to know what to do with it,” said Oliver.

The winner of the conference’s research paper competition found one way to make new data useful. Using SportVU, a basketball database collected by the company STATS, a team from the University of Southern California’s computer science department studied rebounding a basketball from its absolute first concepts. The data shows the movement of all the players and the ball, including rebounds, passes and other game events.

The research team showed empirically what was only previously accessible from inference and experience. They were able to show that by the time almost all rebounds travel 14 feet from the hoop they also drop below eight feet of elevation — easy reaching distance for a basketball player. The researchers were able to compare shot distance with rebound distance and to show where strategic changes might change offensive rebounding success.

Rajiv Maheswaran, the researcher who presented the paper, compared the effort to find new insights about sports to astronomy. Once you start looking at the stars, he said, you make discoveries, which lead to new hypotheses and more research.

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article to Chris Gorski, Inside Science News Service


Flight of the bumblebee decoded by mathematicians

© Dr Tom Ings

Bumblebees use complex flying patterns to avoid predators according to new research from Queen Mary, University of London.

Writing in the journal Physical Review Letters, Dr Rainer Klages from Queen Mary’s School of Mathematical Sciences, Professor Lars Chittka from the School of Biological and Chemical Sciences, and their teams, describe how they carried out a statistical analysis of the velocities of foraging bumblebees. They found that bumblebees respond to the presence of predators in a much more intricate way than was previously thought.

Bumblebees visit flowers to collect nectar, often visiting multiple flowers in a single patch. There is an ongoing debate as to whether they employ an ‘optimal foraging strategy’, and what such a theory may look like.

Dr Klages explains: “In mathematical theory we treat a bumblebee as a randomly moving object hitting randomly distributed targets. However, bumblebees in the wild are under the constant risk of predators, such as spiders, so the question we wanted to answer is how such a threat might modify their foraging behaviour.”

The team used experiments that track real bumblebees visiting replenishing nectar sources under threat from artificial spiders, which can be simulated with a trapping mechanism that grabs the bumblebee for two seconds.

They found that, in the absence of the spiders, the bumblebees foraged more systematically and travelled directly from flower to flower. When predators were present, however, the bumblebees turned around more often highlighting a more careful approach to avoid the spiders.

PhD student Friedrich Lenz, who did the key analysis, explains: “We learned that the bumblebees display the same statistics of velocities irrespective of whether predators are present or not. Surprisingly, however, the way the velocities change with time during a flight is characteristically different under predation threat.”

The team’s analysis indicates that, when foraging in the wild, factors such as bumblebee sensory perception, memory, and even the individuality of different bumblebees should be taken into account in addition to the presence of predators. All of this may cause deviations from predictions of more simplistic foraging theories.

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to Queen Mary, University of London


Researchers create first large-scale model of human mobility that incorporates human nature

For more than half a century, many social scientists and urban geographers interested in modeling the movement of people and goods between cities, states or countries have relied on a statistical formula called the gravity law, which measures the “attraction” between two places. Introduced in its contemporary form by linguist George Zipf in 1946, the law is based on the assumption that the number of trips between two cities is dependent on population size and the distance between the cities. (The name comes from an analogy with Newton’s Law of Gravity, which describes the attraction of two objects based on mass and distance.)

Though widely used in empirical studies, the gravity model isn’t very accurate in making predictions. Researchers must retrofit data to the model by including variables specific to each study in order to force the results to match reality. And with much more data now being generated by new technologies such as cellphones and the Internet, researchers in many fields are eyeing the study of human mobility with a desire to increase its scientific rigor.

To this end, researchers from MIT, Northeastern University and Italy’s University of Padua have identified an underlying flaw in the gravity model: The distance between two cities is far less important than the population size in the area surrounding them. The team has now created a model that considers human motives rather than simply assuming that a larger city attracts commuters. They then tested their “radiation model” on five types of mobility studies and compared the results to existing data. In each case, the radiation model’s predictions were far more accurate than the gravity model’s, which are sometimes off by an order of magnitude.

“Using a multidisciplinary approach, we came up with a simple formula that works better in all situations and shows that population distribution is the key factor in determining mobility fluxes, not distance,” says Marta González, the Gilbert Winslow Career Development Assistant Professor in MIT’s Department of Civil and Environmental Engineering and Engineering Systems Division, and co-author of a paper published Feb. 26 in the online edition of Nature. “I wanted to see if we could find a way to make the gravity model work more accurately without having to change it to fit each situation.”

Physics professor Albert-László Barabási of Northeastern is lead author and principal investigator on the project. Filippo Simini of Northeastern and Amos Maritan of the University of Padua are co-authors.

“I think this paper is a major advance in our understanding of human behaviour,” says Dirk Brockmann, an associate professor of engineering sciences and applied mathematics at Northwestern University who was not involved in the research project. “The key value of the work is that they propose a real theory of mobility making a few basic assumptions, and this model is surprisingly consistent with empirical data.”

The gravity law states that the number of people in a city who will commute to a larger city is based on the population of the larger city. (The larger the population of the big city, the more trips the model predicts.) The number of trips will decrease as the distance between cities grows. One obvious problem with this model is that it will predict trips to a large city without taking into account that the population size of the smaller city places a finite limit on how many people can possibly travel.

The radiation model accounts for this and other limitations of the gravity model by focusing on the population of the surrounding area, which is defined by the circle whose center is the point of origin and whose radius is the distance to the point of attraction, usually a job. It assumes that job availability is proportional to the population size of the entire area and rates a potential job’s attractiveness based on population density and travel distance. (People are willing to accept longer commutes in sparsely populated areas that have fewer job opportunities.)

To demonstrate the radiation model’s accuracy in predicting the number of commuters, the researchers selected two pairs of counties in Utah and Alabama — each with a set of cities with comparable population sizes and distances between them. In this instance, the gravity model predicts that one person will commute between each set of cities. But according to census data, 44 people commuted in Utah and six in the sparsely populated area of Alabama. The radiation model predicts 66 commuters in Utah and two in Alabama, a result well within the acceptable limit of statistical error, González says.

The co-authors also tested the model on other indices of connectedness, including hourly trips measured by phone data, commuting between U.S. counties, migration between U.S. cities, intercity telephone calls made by 10 million anonymous users in a European country, and the shipment of goods by any mode of transportation among U.S. states and major metropolitan areas. In all cases, the model’s results matched existing data.

“What differentiates the radiation model from other phenomenological models is that Simini et al. assume that an individual’s migration or move to a new location is determined by what ‘is offered’ at the location — e.g., job opportunities — and that this employment potential is a function of the size of a location,” Brockmann says. “Unlike the gravity model and other models of the same nature, the radiation model is thus based on a plausible human motive. Gravity models just assume that people move to large cities with high probability and that also this movement probability decreases with distance; they are not based on an underlying first principle.”

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to Denise Brehm, Massachusetts Institute of Technology