Chance News 72

From ChanceWiki
Revision as of 16:39, 27 March 2011 by BillJefferys (Talk | contribs)
Jump to: navigation, search



"Self-selected samples are not much more informative than a list of correct predictions by a psychic."

John Allen Paulos, in Innumercacy

(Hill & Wang, 1998), p. 113

Submitted by Paul Alper

"[S]tatistical analysis is being used, and not always to your benefit, by everyone from your cable company to your real estate broker. Consider this your chance to fight back."

Nate Silver, in How to beat the salad bar

New York Times Magazine, 17 March 2011

Submitted by Bill Peterson


P-value revealed

The typical elementary statistics textbook is dry as dust and mostly useful as a doorstop. Andrew Vickers’ What is a p-value anyway? 34 Stories To Help You Actually Understand Statistics, consists of only 210 pages and is too thin to stop many doors but is extremely amusing, full of insight and good advice. His aim was to “write something that (a) focused on how to understand statistics, (b) avoided formulas and (c) was fun, at least in places.” And, that “is how he came with the idea of stories.” The 34 chapters are stories which are intended to illustrate statistical concepts, but are “short and fun to read.” Some of his stories may be found on and can be viewed once you sign on to Medscape (at no cost).

Despite the title, p-value does not appear until Chapter (i.e., Story) 13, page 55, although the related concept, confidence interval, is first mentioned on page 25. Here is what he has to say about the inappropriate use of confidence intervals:

You might say that the indiscriminant use of confidence intervals in scientific papers is because the authors don’t have a firm idea of what it is that they want to find out. And you might be right--I couldn’t possibly comment.

Likewise, when it comes to p-value and the mindless use of stats packages:

[I]t is all too easy to generate endless lists of p-values using statistical software, regardless of whether any of them address a question you actually want to answer.

As good as his book is, the reader will probably need some auxiliary help, perhaps by way of an organic doorstopper.


1. Vickers has not yet entered into the controversy regarding Bem’s ESP article which rests more or less solely on a p-value of less than .05. However, on page 172, Vickers writes

On the other hand, there is an idea that all cancers are caused by a parasitic infection and can be cured by a special “zapper”. (You can’t make this stuff up) If you showed me a medical study showing that these zappers cured cancer with a p-value of .04, I’d probably say something like, “Well, that is surprising, but it is a ridiculous hypothesis, there is no reason to believe it is true other than this one measly p-value. So thanks but no thanks, I am not going to believe in this hypothesis for now.”

Instead of a “measly” p-value of .04, suppose the p-value were far, far smaller such as 10^-35 which has been alleged in previous ESP studies. What might his reaction now be?

2. On page 208 he criticizes the use of weasel words such as “may,” “might,” and “could” which “are often found in the conclusion of scientific studies.” He speculates that “The reason words like ‘may,’ ‘might’ and ‘could’ are so popular is that it absolves the author from any responsibility whatsoever.” He then employs the following satirical comment to justify his criticism:

Students may learn more statistics from reading What is p-value anyway? than from any competing statistics textbook.

Do a possibly non-random sample of recently published articles in any field to determine the prevalence of those weasel words.

3. Vickers points out that “missing data is a big problem in medical research.” Imputing the value of the missing data, as might be imagined, is tricky and fraught with difficulties. One of his contributions was “to reduce the rate of missing data in the first place” by telephoning “patients at home and ask[ing] them just two question in place of a long questionnaire. In this way, we reduced the rate of missing data in a trial from 25% to 6%, which made the use of complex missing data rather redundant.” Comment on the ease of doing such pre-trial contact in this age of privacy and security. Comment on the ease of doing a post-trial contact in this age of privacy and security.

4. His definition of p-value is the standard “The p-value is the probability that the data would be at least as extreme as those observed if the null hypothesis were true.” Bayesians are unhappy with the phrase “at least as extreme” as reflected in the famous quote of Harold Jeffreys: “What the use of P[-value] implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred.” Another famous related criticism of the use of p-value may be found via the so called optional stopping problem whereby the p-value is not unique. For some other shortcomings and possibilities for misinterpreting of p-value, see this. If these criticisms are valid, why then is p-value so ubiquitous?

5. On page 204 Vickers writes,

I generally take the position “never attribute to conspiracy what you can attribute to a simple screw-up.” Nevertheless, when you see bad statistics, it is worth wondering who stands to gain.

On the other hand, read (A) White Coat Black Hat: Adventures on the Dark Side of Medicine by Carl Elliott or (B) Bad Science: Quacks, Hacks and Big Pharma Flacks by Ben Goldacre to see just how prevalent they claim conspiracies are in the medical world.

6. On page 148, Vickers looks askance at “what is perhaps the most typical approach to statistics:”

• Load up the data into the statistics software.
• Press a few buttons.
• Cut and paste the results in a word processing document.
• Look at the p-value. If p is less than .05, that is a good thing. If p >= .05, your study was a failure and probably isn’t worth sending to a scientific journal.

Why is he so scornful of this approach?

7. On page 143, Vickers invokes what he christens “J-Com’s Law in honor of the worst typing mistake in history:”

Many of the research papers you read will be wrong not as a result of scientific flaws, poor design or inappropriate statistics, but because of typing errors.

Go to this web site to see what happened to Mizuho Securities Co. and the Tokyo Stock Exchange in December, 2005 due to a typing error.

8. He puts forward on page 193 a “rule of thumb: if you have the whole population, rather than a sample, don’t report confidence intervals and p-values.” If “we have all the data that we could get, that is, the whole population,” then “[a]ccordingly, we say these things with confidence and leave out the confidence interval.” Give several examples in which the investigator might have the entire population.

Submitted by Paul Alper

Pi day probability puzzle

Numberplay: Pi in the sky
by Pradeep Mutalik, New York Times Wordplay Blog, 14 March 2011

Among the three puzzles posed for Pi Day (3/14) is this:

2. Notice that in the decimal expansion of pi, zero is the last digit to appear, and does not appear till the 32nd decimal place. This seems to be a long time for the last digit to appear. What is the expected place for the last digit to appear in a truly random series of digits, as the decimal expansion of transcendental numbers like pi is known to be?

As posed, this becomes an application of the Coupon Collector's problem. However, it is still unknown whether  π  is a normal number, even though mathematicians suspect that this is true. For a class activity on empirical digit frequencies, see How normal is pi?. The question of whether the digits are "truly random" is even more subtle; for discussion see this article by Stan Wagon.

Submitted by Bill Peterson

Coping with bad medical news

Matthew Hayat recommended this the following article on the Isolated Statisticians list.

After a diagnosis, wishing for a magic number
by Peter B. Bach, M.D., New York Times Well Blog, 21 March 2011

Contrasts the optimistic point of view expressed in Steven Jay Gould's classic The median isn't the message with a more sobering essay, Letting go, by Atul Gawande in the New Yorker last year (8 August 2010).

Submitted by Bill Peterson

A nonsignificant result won't protect you in a court of law

Supreme Court Rules Against Zicam Maker, Adam Liptak, The New York Times, March 22, 2011.

Investors in a company called Matrixx Initiatives got angry when they weren't told about side effect reports for that company's biggest product, Zicam.

The case involved Zicam, a nasal spray and gel made by Matrixx Initiatives and sold as a homeopathic medicine. From 1999 to 2004, the plaintiffs said, the company received reports that the products might have caused some users to lose their sense of smell, a condition called anosmia. Matrixx did not disclose the reports and in 2003, the company said it was “poised for growth” and had “very strong momentum” though, by the plaintiffs’ calculations, Zicam accounted for about 70 percent of its sales.

The company defended itself by pointing out that

it should not have been required to disclose small numbers of unreliable reports, which were the only ones available in 2004. They added that the company should face liability for securities fraud only if the reports had been collectively statistically significant.

The Supreme Court rules against Matrixx. One comment by Sonia Sotomayor was quite interesting.

Given that medical professionals and regulators act on the basis of evidence of causation that is not statistically significant, it stands to reason that in certain cases reasonable investors would as well.”

She goes on to say that just any old set of adverse reports wouldn't meet this standard, but that the courts must look at

the source, content and context of the reports

Submitted by Steve Simon


1. If statistical significance is not a standard for establishing causation, what would be the alternate standard?

2. How does the responsibility of Matrixx to its investors differ from its responsibility to its customers and to FDA?

3. What action should a company take to a small number of reports of serious side effects when those reports fail to meet the criteria of statistical significance?

Personal tools