Calls For a Posthumous Pardon … But Who was Alan Turing?

Momentum is gathering behind calls to pardon the father of computer science. BinaryApe

You may have read the British Government is being petitioned to grant a posthumous pardon to one of the world’s greatest mathematicians and most successful codebreakers, Alan Turing. You may also have read that Turing was was convicted of gross indecency in 1952 and died tragically two years later.

But who, exactly, was he?

Born in London in 1912, Turing helped lay the foundations of the “information age” we live in.

He did his first degree at King’s College, Cambridge, and then became a Fellow there. His first big contribution was his development of a mathematical model of computation in 1936. This became known as the Turing Machine.

It was not the first time a computer had been envisaged: that distinction belonged to Charles Babbage, a 19th century mathematician who designed a computer based on mechanical technology and built parts of it (some of which may be seen at the Science Museum in London or Powerhouse Museum in Sydney, for example).

But Babbage’s design was necessarily complicated, as he aimed for a working device using specific technology. Turing’s design was independent of any particular technology, and was not intended to be built.

The now iconic shot of Alan Turing.

It was very simple, and would be very inefficient and impractical as a device for doing real computations. But its simplicity meant it could be used to do mathematical reasoning about computation.

Turing used his abstract machines to investigate what kinds of things could be computed. He found some tasks which, although perfectly well defined and mathematically precise, are uncomputable. The first of these is known as the halting problem, which asks, for any given computation, whether it will ever stop. Turing showed that this was uncomputable: there is no systematic method that always gives the right answer.

So, if you have ever wanted a program that can run on your laptop and test all your other software to determine which of them might cause your laptop to “hang” or get stuck in a never-ending loop, the bad news is such a comprehensive testing program cannot be written.

Uncomputability is not confined to questions about the behaviour of computer programs. Since Turing’s work, many problems in mainstream mathematics have been found to be uncomputable. For example, the Russian mathematician and computer scientist, Yuri Matiyasevich, showed in 1970 that determining if a polynomial equation with several variables has a solution consisting only of whole numbers is also an uncomputable problem.

Turing machines have been used to define measures of the efficiency of computations. They underpin formal statements of the P vs NP problem, one of the Millennium Prize problems.

Another important feature of Turing’s model is its capacity to treat programs as data. This means the programs that tell computers what to do can themselves, after being represented in symbolic form, be given as input to other programs. Turing Machines that can take any program as input, and run that program on some input data, are called Universal Turing Machines.

These are really conceptual precursors of today’s computers, which are stored-program computers, in that they can treat programs as data in this sense. The oldest surviving intact computer in the world, in this most complete sense of the term, is CSIRAC at Melbourne Museum.

 

CSIRAC was Australia’s first digital computer, and the fourth “stored program” computer in the world. Melbourne Museum

It seems a mathematical model of computation was an idea whose time had come. In 1936, the year of Turing’s result, another model of computation was published by Alonzo Church of Princeton University. Although Turing and Church took quite different routes, they ended up at the same place, in that the two models give exactly the same notion of computability.

In other words, the classification of tasks into computable and uncomputable is independent of which of these two models is used.

Other models of computation have been proposed, but mostly they seem to lead to the same view of what is and is not computable. The Church-Turing Thesis states that this class of computable functions does indeed capture exactly those things which can be computed in principle (say by a human with unlimited time, paper and ink, who works methodically and makes no mistakes).

It implies Turing Machines give a faithful mathematical model of computation. This is not a formal mathematical result, but rather a working assumption which is now widely accepted.

Turing went to Princeton and completed his PhD under Church, returning to Britain in 1938.

Early in the Second World War, Turing joined the British codebreaking operation at Bletchley Park, north-west of London. He became one of its most valuable assets. He was known by the nickname “Prof” and was described by colleague Jack Good as “a deep rather than a fast thinker”.

One of the famous Enigma machines decrypted at Bletchley Park. Keir David

At the time, Germany was using an encryption device known as Enigma for much of its communications. This was widely regarded as completely secure. The British had already obtained an Enigma machine, from the Poles, and building on their work, Turing and colleague Gordon Welchman worked out how the Enigma-encrypted messages collected by the British could be decrypted.

Turing designed a machine called the Bombe, named after a Polish ice cream, which worked by testing large numbers of combinations of Enigma machine configurations, in order to help decrypt secret messages. These messages yielded information of incalculable value to the British. Winston Churchill described the Bletchley Park codebreakers as “geese that laid the golden eggs but never cackled”.

In 1945, after the war, Turing joined the National Physical Laboratory (NPL), where he wrote a report on how to construct an electronic computer, this time a general-purpose one unlike the machines dedicated to cryptanalysis which he helped to design at Bletchley Park.

This report led to the construction of an early computer (Pilot ACE) at NPL in 1950. By then, Turing had already moved on to Manchester University, where he worked on the first general-purpose stored-program computer in the world, the Manchester “Baby”.

The remade Bombe machine at Bletchley Park, England, features miles of circuitry. Keir David

In their early days, computers were often called “electronic brains”. Turing began to consider whether a computer could be programmed to simulate human intelligence, which remains a major research challenge today and helped to initiate the field of artificial intelligence.

A fundamental issue in such research is: how do you know if you have succeeded? What test can you apply to a program to determine if it has intelligence? Turing proposed that a program be deemed intelligent if, in its interaction with a human, the human is unable to detect whether he or she is communicating with another human or a computer program. (The test requires a controlled setting, for example where all communication with the human tester is by typed text.)

His paper on this topic – Computing Machinery and Intelligence – was published in 1950. The artificial intelligence community holds regular competitions to see how good researchers’ programs are at the Turing test.

The honours Turing received during his lifetime included an OBE in 1945 and becoming a Fellow of the Royal Society in 1951.

His wartime contributions remained secret throughout his life and for many years afterwards.

In 1952 he was arrested for homosexuality, which was illegal in Britain at the time. Turing was found guilty and required to undergo “treatment” with drugs. This conviction also meant he lost his security clearance.

In 1954 he ingested some cyanide, probably via an apple, and died. An inquest classified his death as suicide, and this is generally accepted today. But some at the time, including his mother, contended his death was an accidental consequence of poor handling of chemicals during some experiments he was conducting at home in his spare time.

Dino Gravalo.

The irony of Turing losing his security clearance – after the advantage his work had given Britain in the war, in extraordinary secrecy – is clear.

The magnitude of what was done to him has become increasingly plain over time, helped by greater availability of information about the work at Bletchley Park and changing social attitudes to homosexuality.

Next year, 2012, will be the centenary of Turing’s birth – with events planned globally to celebrate the man and his contribution. As this year approached, a movement developed to recognise Turing’s contribution and atone for what was done to him. In 2009, British Prime Minister, Gordon Brown, responding to a petition, issued a formal apology on behalf of the British government for the way Turing was treated.

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Graham Farr*

 


Everything You Need To Know About Statistics (But Were Afraid To Ask)

Does the thought of p-values and regressions make you break out in a cold sweat? Never fear – read on for answers to some of those burning statistical questions that keep you up 87.9% of the night.

  • What are my hypotheses?

There are two types of hypothesis you need to get your head around: null and alternative. The null hypothesis always states the status quo: there is no difference between two populations, there is no effect of adding fertiliser, there is no relationship between weather and growth rates.

Basically, nothing interesting is happening. Generally, scientists conduct an experiment seeking to disprove the null hypothesis. We build up evidence, through data collection, against the null, and if the evidence is sufficient we can say with a degree of probability that the null hypothesis is not true.

We then accept the alternative hypothesis. This hypothesis states the opposite of the null: there is a difference, there is an effect, there is a relationship.

  • What’s so special about 5%?

One of the most common numbers you stumble across in statistics is alpha = 0.05 (or in some fields 0.01 or 0.10). Alpha denotes the fixed significance level for a given hypothesis test. Before starting any statistical analyses, along with stating hypotheses, you choose a significance level you’re testing at.

This states the threshold at which you are prepared to accept the possibility of a Type I Error – otherwise known as a false positive – rejecting a null hypothesis that is actually true.

  • Type what error?

Most often we are concerned primarily with reducing the chance of a Type I Error over its counterpart (Type II Error – accepting a false null hypothesis). It all depends on what the impact of either error will be.

Take a pharmaceutical company testing a new drug; if the drug actually doesn’t work (a true null hypothesis) then rejecting this null and asserting that the drug does work could have huge repercussions – particularly if patients are given this drug over one that actually does work. The pharmaceutical company would be concerned primarily with reducing the likelihood of a Type I Error.

Sometimes, a Type II Error could be more important. Environmental testing is one such example; if the effect of toxins on water quality is examined, and in truth the null hypothesis is false (that is, the presence of toxins does affect water quality) a Type II Error would mean accepting a false null hypothesis, and concluding there is no effect of toxins.

The down-stream issues could be dire, if toxin levels are allowed to remain high and there is some health effect on people using that water.

Do you know the difference between continuous and categorical variables?

  • What is a p-value, really?

Because p-values are thrown about in science like confetti, it’s important to understand what they do and don’t mean. A p-value expresses the probability of getting a given result from a hypothesis test, or a more extreme result, if the null hypothesis were true.

Given we are trying to reject the null hypothesis, what this tells us is the odds of getting our experimental data if the null hypothesis is correct. If the odds are sufficiently low we feel confident in rejecting the null and accepting the alternative hypothesis.

What is sufficiently low? As mentioned above, the typical fixed significance level is 0.05. So if the probability portrayed by the p-value is less than 5% you reject the null hypothesis. But a fixed significance level can be deceiving: if 5% is significant, why is 6% not?

It pays to remember that such probabilities are continuous, and any given significance level is arbitrary. In other words, don’t throw your data away simply because you get a p-value of 6-10%.

  • How much replication do I have?

This is probably the biggest issue when it comes to experimental design, in which the focus is on ensuring the right type of data, in large enough quantities, is available to answer given questions as clearly and efficiently as possible.

Pseudoreplication refers to the over-inflation of degrees of freedom (a mathematical restriction put in place when we calculate a parameter – e.g. a mean – from a sample). How would this work in practice?

Say you’re researching cholesterol levels by taking blood from 20 male participants.

Each male is tested twice, giving 40 test results. But the level of replication is not 40, it’s actually only 20 – a requisite for replication is that each replicate is independent of all others. In this case, two blood tests from the same person are intricately linked.

If you were to analyse the data with a sample size of 40, you would be committing the sin of pseudoreplication: inflating your degrees of freedom (which incidentally helps to create a significant test result). Thus, if you start an experiment understanding the concept of independent replication, you can avoid this pitfall.

  • How do I know what analysis to do?

There is a key piece of prior knowledge that will help you determine how to analyse your data. What kind of variable are you dealing with? There are two most common types of variable:

1) Continuous variables. These can take any value. Were you to you measure the time until a reaction was complete, the results might be 30 seconds, two minutes and 13 seconds, or three minutes and 50 seconds.

2) Categorical variables. These fit into – you guessed it – categories. For instance, you might have three different field sites, or four brands of fertiliser. All continuous variables can be converted into categorical variables.

With the above example we could categorise the results into less than one minute, one to three minutes, and greater than three minutes. Categorical variables cannot be converted back to continuous variables, so it’s generally best to record data as “continuous” where possible to give yourself more options for analysis.

Deciding which to use between the two main types of analysis is easy once you know what variables you have:

ANOVA (Analysis of Variance) is used to compare a categorical variable with a continuous variable – for instance, fertiliser treatment versus plant growth in centimetres.

Linear Regression is used when comparing two continuous variables – for instance, time versus growth in centimetres.

Though there are many analysis tools available, ANOVA and linear regression will get you a long way in looking at your data. So if you can start by working out what variables you have, it’s an easy second step to choose the relevant analysis.

Ok, so perhaps that’s not everything you need to know about statistics, but it’s a start. Go forth and analyse!

For more such insights, log into www.international-maths-challenge.com.

*Credit for article given to Sarah-Jane O’Connor*

 


Protecting confidential data with math

Statistical databases (SDBs) are collections of data that are used to gather and analyse information from a variety of sources. The data may be derived from sales transactions, customer files, voter registrations, medical records, employee rosters, product inventories, or other compilations of facts and figures.

Because database security requires multiple processes and controls, it presents huge security challenges to organizations. With the computerization of databases in healthcare, forensics, telecommunications, and other fields, ensuring this kind of security has become increasingly important.

In a paper published Thursday in the SIAM Journal on Discrete Mathematics, authors Rudolf Ahlswede and Harout Aydinian analyse a security-control model for statistical databases.

“Providing privacy and confidentiality in SDBs is not a new issue,” Aydinian points out. “Privacy interests have evolved from the very first census in the United States. Recorded protests until the mid-20th century reflect constitutional issues resulting from the requirement for U.S. residents to provide sensitive personal information. Questions on census forms about diseases, mortgage values, and other items have raised many concerns.”

While such databases are very helpful in aggregating data, there is a risk that confidential information about an individual’s record may be deliberately compromised. “Since such data sets also contain sensitive information, such as the disease of an individual, or the salary of an employee, it is necessary to provide security against the disclosure of confidential information,” says Aydinian. “Even in cases where a user has no direct access to sensitive information, sometimes confidential data about an individual can be inferred by correlating enough statistics.”

Typically, statistical databases are designed to only accept queries that involve specific statistical functions (such as sum, average, count, min, max, etc.). However, the use of these queries may render databases susceptible to compromise. For instance, it may be possible to infer information about specific individuals by putting together data from a sequence of statistical queries, using prior knowledge of an individual, or through collusion among users.

An SDB is considered secure if no protected data can be inferred from available queries. “In the literature, many scenarios of compromise and inference control methods have been proposed to protect SDBs,” Aydinian says. “However, to date no one security control method is capable of completely preventing compromise.”

Query restriction is one of several general approaches used for security control. A “query request” retrieves a subset of data from a database that meets a set of conditions. In query restriction, the kind and amount of data that can be retrieved by such queries is limited, for example, the size of the data, or the amount of overlap between data that is returned.

In one type of query restriction method, only certain sums of individual records (called “SUM queries”) that meet a minimum specified size or number, and satisfy a specified set of conditions, are available to users.

Aydinian explains with an example. “Consider a company with a large number of employees. Suppose that for each member of the company, the sex, age, rank, length of employment, salary etc. is recorded. The salaries of individual employees are confidential. Suppose that only SUM queries are allowed, i.e. the sum of the salaries of the specified people is returned. Then one might pose the query: What is the sum of salaries for males, above 50, and during the last 10 years?”

The task addressed in the paper is to provide an optimal collection of SUM queries that prevents compromise of confidential information—such as individual salaries, for instance. A natural solution is to maximize the number of available SUM queries. The authors obtain tight bounds for the maximum number of such queries that return subsets of data without compromising groups of entries.

“Future work in the query-restriction approach includes evaluation of new security-control mechanisms, which are easy to implement and guarantee absolute security,” says Aydinian. “At the same time, it is desirable that these methods satisfy other criteria like richness of available queries, consistency, cost etc. It also seems promising to develop methods combining different security control mechanisms.”

For more such insights, log into our website https://international-maths-challenge.com

Credit of the article given to Society for Industrial and Applied Mathematics