Chance News 23

From ChanceWiki
Jump to navigation Jump to search

Quotations

Light a Lucky and you’ll never miss sweets that make you fat.

Constance Talmadge, Charming Motion Picture Star, 1930.

When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics--as long as the numbers come from people with whom they disagree.

Joel Best, More Damned Lies and Statistics, page XI

Forsooth

This forsooth is from the Jan 2007 RSS News.

Carl Griffths' feet have grown to a massive size 18 - double the average for adult men in Britain.

The Times

6 October 2006


The "From the President" column of the March 2007 issue of Consumer Reports (page 5) discusses how CR uses statistics in its testing. The President states that in response to an article in the January issue about contamination in chickens, "The U. S. Department of Agriculture, whose job it is to keep our cacciatore clean, labeled our study "junk science," without even learning our methodology: 'There's virtually nothing or any conclusion that anyone could draw from 500 samples,' said a USDA spokesman."

Submitted by Jerry Grossman

A Challenge

The mathematics department at Dartmouth has just moved to a new building and the previous math building is being demolished. The students called this building "Shower Towers" suggest by this picture of one wall of the building.

http://www.dartmouth.edu/~chance/forwiki/bradley2.jpg

For at least 30 years we walked by this wall assuming that the tiles were randomly placed. One day, as we were walking by it, our colleague John Finn said "I see they are not randomly placed." What did he see?

Submitted by Laurie Snell

Statz 4 life

Statz 4 life, homies!, Da Statz Krew, Google video.

This is an hilarious 5-minute hip-hop video on an introductory statistics course for phychology at the University of Oregon, last Summer. Graduate student Chuck Tate enlisted the help of other psychology graduate students to get students to enjoy statistics as much as they enjoy hip-hop. From anova to correlation, from Peason to Fisher, the whole syllabus is mentioned. Let's hope Da Statz Krew enjoy their real stats courses as much as they seemed to enjoy making the video.

(Note: This video was previously mentioned briefly in Chance News 18.)

Submitted by John Gavin.

Hot streaks rarely last

The Man Who Shook Up Vegas, by Sam Walker, January 5, 2007; Page W1.

Since last fall (Autumn), Las Vegas has had a problem each Thursday morning at precisely 10 a.m. Nevada time. Casino sports betting operations around the world were being simultaneously pounded by thousands of bettors wagering millions of dollars on the same few college football games. Odder still, most of these lock step bets were turning out to be winners, costing the casinos a fortune. The global business of sports betting was being jolted every week by an obscure 41-year-old statistician from San Francisco, using the alias Dr. Bob.

The article explains the background

Gamblers wagering against a point spread must win more than half their bets (about 53%) to make a profit and must be closer to 55% to make a comfortable living. This is no small feat. Experts say there may be fewer than 100 people who can sustain these rates over time. Most of them belong to professional betting syndicates that hire teams of statisticians, wager millions every week and keep their operations secret.

Since 1999, Bob Stoll has recommended 658 bets on college football, or about 81 per season. Here are his results. (For comparison, when betting against a point spread in Las Vegas, bettors must win 52.4% of their wagers to make a profit.)

YEAR  WIN/LOSS/TIE  %  
1999  49-31-1  61  
2000  47-25-0  65  
2001  35-28-0  56  
2002  49-44-3  53  
2003  46-55-2  46  
2004  55-34-1  62  
2005  51-21-2  71  
2006  45-34-3  57  

The article claims that in the last three months, Mr. Stoll has emerged to become one of the world's most influential sports handicappers. And when it comes to predicting the outcomes of college football games, he is peerless.

What separates Mr. Stoll from other professionals, and makes him so frightening to bookmakers, is that he distributes his bets to the public, for a fee. All that pandemonium on Thursdays was no coincidence: that's the day Mr. Stoll sends an email to his subscribers telling them which college football teams to bet on the following weekend. This makes it very difficult for bookmakers to maintain a balanced book.

His website discusses the tools he uses to analyze football games: a mathematical model to project how many points each team was likely to score in a coming matchup. He makes unapologetic use of terms like variances, square roots, binomials and standard distributions. Much of his time is spent making tiny adjustments. If a team lost 12 yards on a running play, he checks the game summary to make sure it wasn't a botched punt. He compensates for the strength of every team's opponents. It takes him eight hours just to calculate a rating he invented to measure special teams. Trivial as this seems, Mr. Stoll says the extra work makes his predictions 4% better.

He does not follow the standard business model. He has no employees and he declines to advertise or swap links with other handicapping sites. In online essays, Dr. Bob says

I have a very realistic approach to handicapping and consider sports betting an investment rather than a gamble. In case you haven't figured it out by now, there is no such thing as a sure thing and I don't respect anyone who does. But, in the long run, if you follow my Best Bet advice and use a disciplined money management strategy you will win.

Bob Stoll's handicapping career began at Berkeley when he entered a $2 NFL pool and, after doing a few minutes of simple math, won $100. From then on, his statistics classes became excuses to feed football data through campus mainframes. After winning 63% of his bets in three years, he quit school to become a tout.

Hot streaks rarely last. One handicapper says

He (Bob Stoll) needs to enjoy this while it's going on right now.

In 2005, Mr. Stoll noticed that a few minutes after he sent his advice, the lines on those games would shift slightly. By the beginning of the 2006 college football season, within 30 seconds of the moment he pressed "send" on his Thursday picks, every major casino in the world would fall into line.

The bookmakers had clearly subscribed, and were trying to change the lines before his clients could make bets. When a stock analyst moves the market with a recommendation, investors who get in early can make money on it regardless of its merits. It's just the opposite in my business. When he makes picks, it's as if brokers and traders collude to drive down the price.

It's a story Mr. Stoll says he's heard thousands of times from clients who don't look at the long term.

Even good bets lose 40% of the time but some clients don't grasp that. They think I'm either hot or I'm cold.

As for what motivates him, Stoll says:

I'm not flashy by nature. I don't need three houses and a boat. I just like to handicap. For me, it's about problem solving.

Questions

  • How likely is it that his past performance table could have happened by chance?
  • Dr. Bob advises clients to bet in a disciplined pattern that leaves less than a 1% chance of exhausting their bankrolls. Is this an acceptable performance statistic? What other information would you like to know about how much you might lose?

Further reading

Submitted by John Gavin.

Amazon's Statistically Improbable Phrases

About a year ago, Amazon.com, a popular site for the online purchase of books and other items, listed a group of phrases for certain books with the label, Statistically Improbable Phrases (SIP). These were phrases identified from the full text of a book that were common in that book relative to other books.

Amazon describes how it selects the SIPs in very vague terms on one of its help pages. I presume that it is vague because Amazon considers their approach to be a trade secret. The August 23, 2006 entry on S Anand's blog outlines how you might compute SIPs and offers an example using the Calvin and Hobbes comic strip.

One use of SIPs is clustering. You could measure the similarity between books based on the number of common SIPs and then cluster the data using that similarity matrix. Another approach to clustering that is used for RSS feeds is available here.

Questions

1. Find a well known statistics book on the Amazon web site that lists SIPs. Do these SIPs give you a good idea of the content of the book?

2. Would SIPs be valuable for a work of fiction?

3. Speculate on what book would have the highest number of SIPs.

Submitted by Steve Simon

Can Google replace your doctor?

Googling for a diagnosis—use of Google as a diagnostic aid: internet based study Hangwi Tang, Jennifer Hwee Kwoon Ng. BMJ 2006: 333; 1143-1145.

An article published in BMJ argues that Google searches can sometimes aid with developing an appropriate diagnosis of disease. The researchers selected a convenience sample of diagnostic cases presented in the New England Journal of Medicine in 2005. They extracted three to five search terms from these case studies, using "statistically improbable phrases" (see above) whenever possible. They then reviewed roughly the top thirty links suggested by Google (never more than the top fifty links) and extracted a diagnosis from the pages. The diagnoses were correct in 15 out 26 cases (58%, 95% CI 38% to 77%).

The authors admit that the success of a Google diagnosis depends on what you are looking for.

We suspect that using Google to search for a diagnosis is likely to be more effective for conditions with unique symptoms and signs that can easily be used as search terms.

and also note that

Searches are less likely to be successful in complex diseases with non-specific symptoms or common diseases with rare presentations.

The BMJ offers "Rapid Responses," a system that allows interested readers to offer their own comments on any article published. The Rapid Responses to this article include a number of criticisms as well as some suggestions for improvement.

Questions

1. Is a 58% rate of correct diagnoses good?

2. The authors used blinding--the authors were unaware of the correct diagnosis during the search phase. Comment on whether this blinding is needed and whether it is effective.

3. The authors acknowledge the importance of skill in extracting information from the pages that Google identifies. There is also skill in selecting the "statistically improbable phrases" used as search terms. How would you redesign this experiment so that the skill of the authors did not influence the results?

Submitted by Steve Simon

What can you do with 100 words?

Parrot's oratory stuns scientists Alex Kirby, BBC News, January 26, 2004.

An article about N'Kisi, a parrot with a vocabulary of 950 words, makes a rather dubious statistical claim.

About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material.

There is a story about Dr. Seuss writing his famous book "The Cat in the Hat" using a limited vocabulary list and coming in at 220 unique words. His publisher wagered $50 that he could not write a book using only 50 words. Dr. Seuss did indeed accomplish this with "Green Eggs and Ham" which uses exactly 50 words. See the Snopes.com entry on Green Eggs and Ham for details. So if 100 words are needed for half of all reading, then the book with a median level of complexity is bracketed below and above by "Green Eggs and Ham" and "The Cat in the Hat".

Another interpretation is that the 100 most common words represent 50% of the words used in a typical book. You can find a list of these words on the web, and if you remove anything except those 100 words, the text would be rather difficult to read. Here is an example of a paragraph taken from a previous Chance News.

When a ? ? for a ? ? ?, he or she ? an ? ? with the ?. The ? may ?, but ? if he does not, others will. That is ? the ? will ? ? ?. But if the ? is ? ?, then the ? ? and ? are for ?.

A separate critique of the claims about N'Kisi published at the Skeptic's Dictionary web page comments on the problems with confirmation bias.

Questions

1. How would you interpret the phrase "100 words are needed for half of all reading"? How would you verify the accuracy of this statement?

Submitted by Steve Simon

Read before you cite

Significance, Dec. 2006, Vol. 3 issue 4.
Mikhail Simkin, Vwni Roychowdhry

This is a popular account of work the authors carried out under the title "Copied citations create renowned papers".

This article was suggested by Norton Starr who was enchanted by the author's story which might be called "What determines Great Generals?

During the “Manhattan project” (the making of nuclear bomb), Fermi asked Gen. Groves, the head of the project, what is the definition of a “great” general. Groves replied that any general who had won five battles in a row might safely be called great. Fermi then asked how many generals are great. Groves said about three out of every hundred. Fermi conjectured that, considering that opposing forces for most battles are roughly equal in strength, the chance of winning one battle is 1/2 and the chance of winning five battles in a row is 1/32. “So you are right, General, about three out of every hundred. Mathematical probability, not genius.”

The authors give as reference Deming's 1936 book "Out of the crisis." But Deming says that a student sent him the story and seems to suggest that it can be found in "The Face of Battle" by John Keegan. We could not find it there. It is in Carl Sagan's "The Demon-Haunted World" but without a reference. So we don't know if this is a true story

Now just as generals might be great generals by chance so might great scientists be great by chance. The authors comment that "a commonly accepted measure of 'greatness' for scientists is the number of citations to their papers."

Now most of us would admit that we often do not read all the citations we make in our articles. Also we would admit that we probably make mistakes occasionally in our citations: the date is wrong, the volume is wrong, we might misspell the authors name etc. Of course these errors get propagated when others copy our citations.

To get any idea how many times this might occur the authors chose a renowned paper that had 4300 citations and found that of these citations 196 contained misprints, out of which only 45 were distinct. The most popular misprint in a page number appeared 78 times.

The authors develop a model to measure the effect of citation copying on the distribution of the number of citations. This model uses a "random-citing scientist." who, when writing an article, picks up m random articles, cites them, and also copies some of their references each with probability p. So m and p are parameters. They say that a good agreement between this model and actual citation data is achieved with m = 3 and p = 1/4. They illustrate this with the following figure:

http://www.dartmouth.edu/~chance/forwiki/citations.jpg

Submitted by Laurie Snell


To live longer, choose fame over fortune

Nobel's greatest prize, The Economist, 20 Jan 2007.

Winning a Nobel prize not only brings fame and fortune to the holder but it also brings two extra years of life, according to Matthew Rablen and Andrew Oswald at the University of Warwick.

The paper provides evidence that an increase in status, rather than wealth alone, raises a person's lifespan, based on data about the lives of about 520 Nobel Prize winners (135) and nominees (389).

This idea was first proposed by Michael Marmot, of University College London, when he studied a large cohort of people, British civil servants, and found, against all expectations, that top civil servants were far healthier and less stressed than lower ranked civil servants. Other studies have confirmed this result and support the assertion that better health is not a result of higher salary. The Rablen-Oswald paper refines the approach by analysing people who are at the top of their profession, by virtue of being nominated for a Nobel, to measure the value of winning the prize, relative to merely being nominated.

The authors correct for various biases, such as grouping the data by country. So American winners live over two years longer, German winners by just over a year and other European winners by 0.7 years, based on the empirical data. The fitted model suggests a two year difference overall.

What causes the increase in longevity is not clear but it is not the cash that comes with a Nobel prize, as the inflation-adjusted purchasing power of the prize is not correlated with longevity. So status, rather than money, appears to be responsible for the effect, the authors claim.

Marmot and others have previously suggested that stress hormones may be a potential factor: those at the bottom of the pile are more stressed than those at the top, even though the latter have to make decisions with more wide ranging impacts. Rablen and Oswald's paper goes further by suggesting a positive effect from having a high status, rather than the absence of a negative effect, as unsuccessful nominees never know that they were being considered. In the case of Oscar winners (see previous article), the winner may live longer but the other failed nominees know that they have failed to win.

Questions

  • This result is based on data from the first half of the 20th century, only, due to the secrecy surrounding the nomination of potential prize winners. Are results based on historical data still applicable today? Speculate on what adjustments might be needed.
  • The data is based on men only, to avoid differences in life-span between the sexes. Do you think that the underlying idea can be extrapolated to women?
  • If the idea that social status improves lifespan is truly correct, might the size of the effect be larger in a more normal population of people?
  • Oddly, Oscar winning actresses and actors live 3.6 years longers than those who are merely nominated but Oscar winning scriptwriters live 3.6 years less than other nominess. Why might this be?

Further reading

Mortality and Immortality, Matthew D. Rablen and Andrew J. Oswald, University of Warwick, Jan 2007.

Submitted by John Gavin

Comments on: To live longer, choose fame over fortune

When we discussed the Oscar studies in Chance Newsl10.05 and Chance News 10.06 we like the media accepted this as a serious study and did not even raise the possibility that it might be nonsence. This was in May of 2001. When Peter Doyle read our account he was skeptical and shortly thereafter he and Mark Mixer did their experiments which showed that the study was indeed nonsense.

It took us five years and the article by James Hinley and his colleagues for Chance News to report that the Oscar study was nonsense.

Our experience with the Oscar Winners study makes us skeptical that Nobel Prize winners live longer than nominees and we feel that our new Chance Wiki should do better than our old Chance News did in testing criticking this study. We will see if we can get the data and if so make it available and perhaps we the new Wiki can do better this time.

Submitted by Laurie Snell

Momentous modelling

Momentous modelling, Economics focus, The Economist, Feb 1st 2007.

This article highlights a growing trend in economics to focus on the uncertainty surrounding a economic forecast, rather than the forecast level itself.

Shocking is what economists do. They start with a model of the economy, administer a 'shock' to it - a sudden rise in the oil price - and work out what happens to output, prices, employment and so forth.

Such models consider changes in a model's mean or expected value, such as what if the oil price doubles? In contrast, economists have focussed much less on variaton around the forecast, such as working out what will happen if the oil price is likely to range between, say, $20 and $100 rather than between $50 and $60? The Economist's tentative explanation is that the latter question requires more difficult maths.

In a recent paper, based on his PhD thesis, Standford University's Nick Bloom claims such models are important if people's behaviour changes as a result of the world suddenly becoming a less (or more) certain place. Sudden big second-moment shocks, measured by the volatility of American share prices, are also fairly frequent: the terrorist attacks of September 11th 2001, the assassination of John Kennedy and the collapse of big companies such as Worldcom and Enron.

Bloom's model allows firms to choose how much to invest and how many workers to employ. The world in which they operate is uncertain because their revenues can vary. He shocks the model by suddenly increasing the variability of firms' revenues, based on data from shocks over the past 45 years. He does this by doubling the standard deviation of revenues, a common measure of variability, before it returns to its old level a few months later. The model predicts that firms wait and see what happens because the value of waiting increases. So expanding firms defer hiring new workers and failing firms tend to delay sacking employees in the hope of a turnaround in their circumstances. As a result, workers are no longer being shuffled from less productive to more productive firms, which is bad for the economy as a whole, a concern for policymakers.

So Bloom claims that for policymakers it is important to tell second-moment shocks, which seem not to last long, from the first-moment variety, where the effects endure for longer.

Questions

  • Is it plausible that people's perceptions might be more influenced by the uncertainty of a forecast rather than the forecast itself? Can you think of common examples where this is the case? For example, when you hear a weather forecast which attribute of the forecast do you tend to recall before venturing outside? If you are going on holiday for a week, does that change what you look for, from the weather forecast for your destination?
  • Is standard deviation an appropriate measure of the shocks that are mentioned in the article? Would higher moments be more helpful?
  • The data used to calibrate the model covers a period of 45 years. Is data from so long ago still relevant to today's economy? What adjustments might be applied to standardise the data accross time?
  • Do you think that the duration of shocks might be a influential factor to consider? How might this be measured and subsequently simulated? What other information would you like to have at your disposal?

Further reading

Submitted by John Gavin.