Chance News 46
"I find all novels lacking in probability."
"Probability is the bane of the age. Every Tom, Dick and Harry thinks he knows what is probable. The fact is most people have not the smallest idea what is going on round them. Their conclusions about life are based on utterly irrelevant--and usually inaccurate--premises."
Spoken by a music critic and a musician, respectively, from page 212 of Casanova's Chinese Restaurant by Anthony Powell and is the fifth novel in his twelve volume series, Dance to the Music of Time.
Submitted by Paul Alper
Life is a gamble, at terrible odds;
If it were a bet, you wouldn't take it.
Rosencrantz and Guildenstern are Dead
Submitted by Laurie Snell
...[Y]ou stated that it is better to pick higher numbers in a lottery because people pick their birthdays, which just go to 31. I picked numbers over 31, and while it didn't make me a millionaire, I won about a hundred dollars.
So, you see, I have saved enough money to subscribe to your wonderful magazine for some time to come - and want to thank you for all the ... great information.
Submitted by Margaret Cibes
"In the last five months, according to the Federal Reserve Board, the money supply in the United States has increased by 271 percent. It has almost tripled."
3 March 2009
The following two Forsooths are from the April 2009 issue of the RSS News:
Europe's particle physics lab, CERN is losing ground rapidly in the race to discover the elusive Higgs boson, or 'God particle', its US rival claims ... the US Fermilab says the odds of its Tefatron accelerator detecting the famed particle first are now 50-50 at worst and up to 96%. at best.BBC News, Science & Environment
12 February 2009
Experts disagree over the issue, with some saying there is no proof light drinking harms the baby, while others believe the evidence is inconclusive.The Independent
21 January 2009
Does a screening test do more harm than good
Screen or Not? What Those Prostate Studies Mean. Tara Pope-Parker. The New York Times, March 23, 2009.
The Impossible Calculus of PSA Testing Dana Jennings. The New York Times, March 23, 2009.
A screening test gives an indication of whether you have a certain disease. Every screening test (other than autopsy, perhaps) is imperfect, leading to some false positive results and some false negative results. Still, you're better off knowing the results, even if they are imperfect, aren't you? Well, maybe not. If the value of the imperfect information is less than the price of the screening test, then obviously you shouldn't get screened.
Even if the test is free, though, you may still be better off not knowing. This is a classic example of where ignorance truly is bliss.
The problem with screening tests is that a positive finding leads to some sort of intervention, with costs and risks associated with it. If that intervention was done on a false positive screen, then you endured some level of cost and risk without any compensating benefit.
Two screening tests, mammography for breast cancer in women, and PSA (prostate specific antigen) testing for prostate cancer in men, have been in a storm of controversy in the past few years because some critics argue that they cause more harm than good. The controversy of PSA testing has been rekindled by a pair of studies recently published in the New England Journal of Medicine: an American study, and a European study.
Tara Pope-Parker offers a valuable summary of these articles.
"The news was unsettling and confusing to many middle-age men, particularly those who already have diagnoses of prostate cancer as a result of P.S.A. testing. Doctors say some men are reconsidering surgery or radiation treatment they have planned. Others, convinced that their lives were saved by P.S.A. screening, wonder how anyone could question the value of early detection of prostate cancer."
These were very large studies, over 77,000 total patients in the American study and 182,000 in the European study. The followup time was 7-10 years in the American study and a median of 9 years in the European study.
"The bottom line of both studies is that P.S.A. screening does find more prostate cancers — but finding those cancers early doesn’t do much to reduce the risk of dying from the disease. The American study showed no statistical difference in prostate cancer death rates between a group of men who had the screening and a control group who did not. The European researchers found that P.S.A. screening does reduce the risk of dying from prostate cancer by about 20 percent."
But even that 20% improvement comes at a serious cost.
"The European study found that for every man who was helped by P.S.A. screening, at least 48 received unnecessary treatment that increased risk for impotency and incontinence. Dr. Otis Brawley, chief medical officer of the American Cancer Society, summed up the European data this way: 'The test is about 50 times more likely to ruin your life than it is to save your life.'"
There were important limitations to the studies.
"The American study found no benefit in P.S.A. screening over a period of 7 to 10 years. But so far, only about 170 men out of 77,000 studied have died of prostate cancer. Prostate cancer is slow-growing, so it's possible that in the next few years, meaningful differences in mortality rates between the two groups will emerge."
Also, the control group in the American study was not a pure control group.
"A larger concern is what statisticians call “contamination” in the unscreened control group. Because it would have been unethical to tell men in the control group that they could not be screened, many either sought the test or were offered it by their doctors. Investigators initially estimated that 20 percent of the control group would fit in this category, but the numbers ended up being far higher —38 to 52 percent. As a result, the study doesn’t really compare the risks and benefits of screening and no screening. It compares aggressive screening and some screening."
There are some limitations to the European study as well.
"The European research has its own set of problems. Although the finding that P.S.A. screening reduces cancer deaths by 20 percent is statistically significant, experts say it's on the borderline, and a few more years of data could weaken the result. Finally, parts of the study were not 'blinded,' meaning that biases could have crept into the interpretation of the data."
Amazingly, Tara Pope-Parker says that these two large studies have failed to resolve the issue of whether it is better to test.
"Before the studies were released, most major medical groups said P.S.A. testing was a personal decision that a man should discuss with his doctor. The two new studies are unlikely to change that advice, experts say; instead, they give men and their doctors more information with which to make the decision."
Dana Jennings has a personal stake in the PSA testing controversy.
"I'm confused because I'm the statistical exception. I'm the one man in 49 whose life may have been saved because I had the PSA blood test. Most prostate cancers are slow and lazy. But my doctors and I learned after I had my prostate surgically removed last July that my cancer was shockingly aggressive. There's a good chance that it would've killed me if I hadn't been screened. And, to be blunt, it might yet."
Mr. Jennings offers a perspective that these studies tend to dehumanize the people involved.
"My biggest problem with the studies – and, of course, this is the nature of such studies – is that they reduce me and all my brothers-in-disease to abstractions, to cancer-bearing ciphers. Among those dry words, we are not living, breathing and terrified men, but merely our prostate cancers, whether slow or bold."
He also notes that
"The researchers counted 'deaths,' not men who had died. As Charlie Brown once said to Lucy as she detailed his baseball team’s shortcomings: 'Tell your statistics to shut up.'"
The two studies stirred strong emotions in Mr. Jennings.
"So, I sit here in limbo. And I wonder whether I'll be that rare man who ducks death from a cancer that would've killed him – because I got screened. But all I can confess to you, in all honesty, is this: I'm still angry and confused."
1. If two studies with a quarter of million patients between them is not enough to resolve the controversy over PSA testing, what would it take?
2. Why is it impossible to get a pure control group in a randomized assignment to screen group and a control group? What ethical principle would be violated if you forced the control group to forgo screening?
3. What is the expected impact of contamination on the results of the American study?
4. Do large statistical studies tend to dehumanize the participants by reducing their personal tragedies to a set of statistics? What can be done to avoid this?
Submitted by Steve Simon
Coming next year: Obama's inflation
The Hill, 3 March 2009
Morris, a former advisor to President Bill Clinton and Senator Trent Lott, is now actively blogging as a political commentator. The present column, however, is an object lesson in either sloppy reading or how to lie with statistics. Morris asserts that "In the last five months, according to the Federal Reserve Board, the money supply in the United States has increased by 271 percent. It has almost tripled." Calling this a tripling is a common slip-up in interpreting percent change, though one we might expect someone writing on finance to avoid.
That, however, is the least of the distortions--and ironically the only one to err on the small side. Of course, it will be obvious to informed readers that the US money supply could not have nearly tripled (or quadrupled) in the last few months. But what then to make of the reference to Federal Reserve Board data?
Here is a link to the March 5 Federal Reserve Statistical Release that was current at the time of Morris's posting. At the bottom of the first table there, we read that for the 3 Months from October 2008 to January 2009 the M1 money stock grew at a seasonally adjusted annualized rate of 27.1 percent.
(1) In view of the actual Federal Reserve report, find (at least) 3 errors in Morris's statement. For each, do you think it more likely represents simple innumeracy or a deliberate distortion?
(2) On the About The Hill page, we read "In an environment filled with political agendas, The Hill stands alone in delivering solid, non-partisan and objective reporting on the business of Washington...". Comment.
Submitted by Bill Peterson
From the New York Times comes this triumph of statistics. The graphic below summarizes why foul play was suspected in Luzerne County, PA on the part of two greedy judges who lacked a moral compass.
1. Why is the graph so incriminating?
2. However, many statistics textbooks caution, "The data never speaks for itself." What possible mitigating facts regarding variability are missing?
3. As interesting as the statistical data is, read the article itself as well as the audios of victims to see the non-statistical evidence unearthed by the prosecution. Which do you find more compelling?
Submitted by Paul Alper
An Older Look at PSA Screening
A previous wiki succinctly discussed the two recent large studies—77,000 in the U.S. and 182,000 in Europe—regarding prostate cancer and the efficacy of PSA screening. Nevertheless, some history might be useful. Back in the late 1990s a major issue was the treatment effectiveness of radioactive seed implantation, SI, as compared to the supposed “gold standard” of radical prostatectomy, RP. Because prostate cancer is slow growing, urologists insisted that seven years since the inception of SI treatments was not enough data because more time was needed to properly compare the two treatments. This issue has largely been resolved in the ensuing decade to the point where urologists no longer invoke the phrase gold standard and now offer either SI or RP to their patients.
The PSA test was originally conceived as a monitoring method to judge the condition of those who were already diagnosed with prostate cancer, as opposed to acting as a screening test to determine if prostate cancer was present. Like many blood tests, it is merely an indicator; for the PSA test particularly, the result can vary according to the size of the gland, presence of infection or inflammation, how recently an ejaculation occurred, and, of course, whether cancer is present. Men started to demand the PSA and an expensive industry took off precisely because of the plausible, but ultimately empirically unwarranted belief that prompt intervention promoted the saving of lives.
However, even back then it was suspected that whatever the treatment,—SI, RP, cryotherapy, watchful waiting, electron beam radiation—the mortality rate was more or less the same. Said another way, (mass) “screening” as opposed to “testing” was of doubtful value because if the prostate cancer was indolent the patient would die with it rather than of it, and if the prostate cancer was aggressive, none of the treatments would do much good.
1. One often hears the term, “five-year survival rate.” Some researchers consider this term completely misleading especially when applied to prostate cancer treatments. What is misleading about it? Why is mortality rate a better way of comparing treatments?
2. Consider the following statement: “The U.S. Preventive Services Task Force (USPSTF) concludes that the evidence is insufficient to recommend for or against screening asymptomatic persons for lung cancer with either low dose computerized tomography (LDCT), chest x-ray (CXR), sputum cytology, or a combination of these tests.” Go to here for the rational. How does this accord with screening for prostate cancer?
3. Find a friendly librarian to determine the percentage of watchful waiting treatment of prostate cancer in western European countries compared to the U.S. Do likewise for RP and SI. Assuming that RP is the most aggressive and watchful waiting the least aggressive, where does the United States stand?
Poker Showdown Between Luck and Skill
The Numbers Guy
March 27, 2009
The Wall Street Journal has a number of Blogs, one of which is called "The Numbers Guy". It is managed by Carl Bialik who writes on the "way numbers are used, and abused". Many of his articles appear in his column for the Wall Street Journal.
In this article Bialik writes: Is Texas Hold'Em poker more a game of chance or of skill? That question has figured in several legal tests of playing the card game for money: Games of chance are considered gambling under U.S. law. Now a major poker Web site has sponsored a study it claims demonstrates that it takes skill to win — which would help the site’s legal standing. But several poker experts question that claim.
Texas Hold'Em like other poker games starts with the players being dealt a number of cards and the players putting a prescribed amount of money in the "pot". Then in a number of stages the players make bets which other players must match or drop out of the game. When there are no more bets there is a "showdown" among the players still in the game. The player with the best hand wins the money in the pot with the money shared if there are ties.
Bailik writes: "PokerStars paid Cigital, a software consulting firm, to analyze 103,273,484 hands played on the site last December, for real money — usually at least $1 blind bets. (for the uninitiated here's a guide to the rules) Three quarters of the hands analyzed ended without a showdown, meaning that the winner never had to show his or her cards — everyone else eventually folded during the rounds of betting. And half the time that hands did end in a showdown, a player who would have won had already folded.
Paco Hope, technical manager at Cigital and co-author of the study, argues that the paucity of showdowns shows poker is a game of skill: The winner could have won by making identical bets no matter which cards he or she had drawn. “Most people think, you get your cards, and the best hand wins,” Hope said. He added, “Whether or not you go to a showdown is determined by the decisions you make, which are determined entirely by your skill.”
However, several researchers who have studied poker in the past said they were skeptical of the conclusions of the study. For one thing, players’ decisions are determined by the cards they draw, which is entirely a matter of luck. Also, there’s no way to know that PokerStars poker hands are representative of all poker hands. The study doesn’t track individual players, so there’s no indication that success in the past — a possible indicator of skill — predicts future triumphs. A study chartered by an interested party is bound to raise more questions than one that’s entirely independent. And the study doesn’t answer the question of how showdowns and best-hand wins would look in a game of pure skill, or of pure chance.
“You have to identify players across hands to identify players who are more skillful than others,” said Joseph Kadane, a professor emeritus of statistics at Carnegie Mellon University.
Bailik writes: "Who folds is determined to a huge degree by the value of the cards! Peter Winkler, a Dartmouth College mathematician who has studied games of skill and chance, said in an email. The player who picks up AA [two aces] and stays in while the rest fold is the lucky one; the player who picks up 32 [a three and a two] and folds before the 332 [three, three, two] flop comes down is the unlucky one. That the AA player wins with an ultimately inferior hand does not prove poker is a game of skill. If anything, it shows the opposite: an unskillful player holding the 32 hole cards might have stayed in.
Hope counters that skill dominates luck in decision-making: “The same information is available to all players (the values of the cards), but it is skill in interpreting that information — not the presence of that information — that determines whether a player folds.
As for the failure to track individual players, Hope argues that it doesn’t matter whether skillful competitors are identified. “I don’t care who won or why they won,” he said. “What I care about is the decisions they made. The fact they decided to fold indicates it was decisions that determined the hand.
Like other Blogs, Bialik invites readers to comment on these arguments. Particularly interesting comments were made by Patrick Fleming who writes:
Your analysis of the Cigital study is much too simplistic. Mr. Hope is not saying that because the majority of poker hands are resolved without a showdown that ALONE means poker is a game of skill. What it shows is that the vast majority of poker hands are determined by the way people play the cards, not the actual deal of the cards. IF it is also true that the way people play the cards is an exercise of skill, THEN it follows that the exercise of skill is the predominant factor in determining the outcome of poker. First, Isn’t knowing that your AA hand is likely to be the winner in and of itself an application of knowledge and therefore a skill? But even if you want more than that to qualify as “skill,” there is more, indeed much more.
Your statement “For one thing, players’ decisions are determined by the cards they draw, which is entirely a matter of luck.” is patently false, and I am surprised you of all folks made it. I know from previous blogs that you are familiar with the rules of poker. A player’s decision is NEVER “determined” by the cards he holds (except for a few very minor points, like who makes the first mandatory bet in some version of stud poker). The cards a player holds may influence his decision, indeed sometimes they may be the biggest factor in his decision, but they never (except as noted above) DETERMINE his decision. A player in hold-em is just as free to raise with 3-2 as he is to fold 3-2. What he actually decides to do with that 3-2 will depend on a number of other factors. If he is a skilled player, he may realize that the other players are “scared” of him. He may raise knowing that they will think he has much better cards. If he does that and the other players fold, he wins. His actual cards had nothing to do with his win in that situation.
Likewise, the comment by Mr. Winkler that “Who folds is determined to a huge degree by the value of the cards!” shows a lack of understanding of poker. I doubt Mr. Winkler has played much poker. A player who bases his decisions solely or “to a huge degree” on what cards he holds is at best a beginning player, and will not be a successful player until he starts to consider other factors before making his decisions. What cards you hold, as any poker pro will tell you, is only the beginning of the thought process. As Kenny Rogers sang, every hand's a winner, and every hand's a loser. Most of the time which one your 2 cards will turn out to be depends on many, many other factors.
This comments section is not the place I want to begin a discussion of all of the factors that go into deciding how to play a poker hand, whole books have been written on the subject, so I will just list a view: your image as a player, your position at the table, the size of the bet, the size of your bankroll, the styles of the other players, physical tells of the other players, and mathematical probability. Analyzing and applying all of these and other concerns, plus the actual cards you hold, is a very complex thought process. It is a learned skill, and a skill that can be improved. And that proves the second half of the equation, and that’s why this study HELPS to prove that the results in poker are MOSTLY the product of player skill. And please note my use of the word “mostly,” No one denies that chance determines some results in poker. When you raise “all in” early with A-A and an opponent calls with 9-10 cause he mistakenly thinks you are bluffing, and 2 remaining 10s are dealt (with no ace), chance has determined that outcome for the most part.
But the question is not whether chance determines EVERY outcome in poker, nor whether skill determines EVERY outcome. The question is which is determining the MAJORITY of outcomes. Since in the majority of outcomes the cards, as shown by this study, are not even consulted, that is strong proof that the majority of outcomes fall into the skilled category. And that becomes conclusive proof when one looks at the actual decision making process required in MOST poker situations, and realizes that it involves complex analysis frequently unrelated to the actual cards held.
And one final point, to the expert who asserts a better test would be to show that skilled players win more often, has he ever heard of Doyle Brunson? 50 years of being a successful poker player must have more to do with skill than with Doyle just being the luckiest person on the planet, don’t you think? In sum, the Cigital study is only one piece of a larger analytic framework, but it is an important piece. And when all of the pieces are put together, the evidence that poker is a game of PREDOMINANTLY skill is, as the judge in South Carolina recently stated, “overwhelming.”
Thank you, Patrick Fleming
Full disclosure - I am the Poker Players Alliance Litigation Support Director and helped craft the argument that has been presented in the courts
(1) Does all this convince you that poker is predominately a game of skill?
(2) Do you think that this study would convince a court that poker predominately a game of skill?
Submitted by Laurie Snell
Profit, Thy Name Is ... Woman?
Since 2001 Roy Douglas Adler, Professor of Marketing at Pepperdine University, has conducted annual studies of Fortune 500 companies with strong records of promoting women to executive positions. This article here reports that results indicate that these companies consistently outperformed industry medians on profitability as a percentage of revenue, assets, and equity.
In 2008, the study was restricted to companies on Fortune's list of "100 Most Desirable MBA Employers," in order to focus on women with MBAs who could be presumed to be on executive fast tracks, and the 2008 results were consistent with earlier ones.
"For profits as a percent of revenue, the results showed 55 percent of the companies were higher than the median, 36 percent were lower and 11 percent were tied. For profits as a percent of assets, the results showed 50 percent were higher than the median, 28 percent were lower and 23 percent were tied. For profits as a percent of equity, the results showed 59 percent were higher than the median, 30 percent were lower and 11 percent were tied."
A blogger commented here:
"Did your study hold pay constant? I hope these companies are not making a profit because they a paying those promoted women less than their male counterparts."
MILLER-McCUNE, March-April 2009
Submitted by Margaret Cibes
Particularly in this age of Google, Wikipedia, YouTube, and the internet, you can’t believe everything you read, hear or see. Moreover, according to Ben Goldacre even memory can be dramatically wrong. Goldacre refers to an article by de Vito, et al and should be read in its entirety. "On an abstract level, there's a good short report in the journal Cortex, where researchers in Bologna demonstrate the spectacular hopelessness of memory. One morning in 1980 a bomb exploded in Bologna station: 85 people died, and the clock stopped ominously showing 10.25, the time of the explosion. This image became a famous symbol for the event, but the clock was repaired soon after and worked perfectly for the next 16 years. When it broke again in 1996, it was decided to leave the clock showing 10.25 permanently, as a memorial."
The clock at the Bolgna railroad station stopped at
10.25 to mark the time of the terroristic massacre
"The researchers asked 180 people familiar with the station, or working there, with an average age of 55, about the clock: 173 knew it was stopped, and 160 said it had been since 1980. What's more, 127 claimed they had seen it stuck on 10.25 ever since the explosion, including all 21 railway employees. In a similar study published last year, 40% of 150 UK participants claimed to remember seeing closed circuit television footage of the moment of the explosion on the bus in Tavistock Square on 7 July 2005. No such footage exists."
1. Goldacre is an excellent writer who specializes in pointing out how statistics is misused in science. Go here for his 2009 contributions. To get to previous years merely change the "9," to an "8," or "7," etc.
2. One would imagine that those closest to an incident would have the best memory but de Vito points out: “From the 173 people who knew that at the time of testing the clock was stopped, a subgroup of 56 citizens who regularly take part in the annual official commemoration of the event has been further considered: only six (11%) of them correctly remember that the clock had been working in the past.”
3. Memories can be repressed as well as false. Go here for a discussion of repressed memory relating to child abuse; specifically, you will find "1. An event that you cannot remember can be psychologically equivalent to an event that never happened. 2. An event that you falsely remember can be psychologically equivalent to an event that really did happen." The clock in Milan worked for another 16 years; adults report abuse which supposedly took place often decades previous. Use Google to see a discussion of some of these legal cases and their outcomes.
4. Is there anything you used to recall vividly that you now doubt actually occurred?
Submitted by Paul Alper
Smile Intensity and Divorce
Is there a connection between positive expressive behavior, as seen by facial photographs, and success in avoiding divorce? According to Moskowitz, “If you want to know whether your marriage will survive, look at your spouse's yearbook photos. Psychologists have found that how much people smile in old photographs can predict their later success in marriage.” This claim is based on a study headed by Matthew Hertenstein detailed in the April 5 issue of the journal Motivation and Emotion and is entitled “Smile intensity in photographs predicts divorce later in life.”
The table below is taken from Hertenstein’s publication. Study 1, Sample 1 involves alumni who were psychology majors at a particular university; Study 1, Sample 2 involves alumni from the same university who were not psychology majors. In each instance, the participants submitted yearbook photos which were judged for smile intensity and whether or not the participants were divorced or still married. Study 2 is similar but involved older members of the community who submitted photos from decades past.
1. In the above table, a two sample t-test of means (“M”) is carried out for each row. Pick a row and verify degrees of freedom (“df”), the value of t and the p-value (“p”). Note that “p values are one-tailed given the directional hypothesis of the studies.”
2. While p-value is useful to some extent, “effect size” may be of more interest. The last column, “r” signifies that some sort of correlation is taking place; it is sometimes called the point-biserial correlation coefficient. This correlation is between the independent dichotomous variable and the dependent variable (smile intensity). According to the psychology literature, it is given by
Use this formula and pick a row to verify the value in the table for “r”—ignore the sign.
3. Clearly, these results are for a sample. Speculate on what you would need to know of the characteristics of the participants in order to infer to a larger population.
4. Although the table is informative as to means and standard deviations, why would boxplots for each row be useful? For each row, why would a comparison of histograms be useful?
5. According to Moskowitz, but not mentioned in the paper itself by Hertenstein, “Overall, the results indicate that people who frown in photos are five times more likely to get a divorce than people who smile.” This comes about by looking at the highest scorers--who turn out to be mostly still married--compared to the lowest scorers who turn out usually to be those who are divorced. Why is this quotation featured rather than the above table?
6. For those who would like to improve their “smile intensity” of their photos, here is what the paper itself says the measurement process is:
“two muscle action units, AU6 and AU12, were analyzed for each photo. The combination of these actions units are used to reflect positive facial expression because AU6 (orbicularis oculi) causes one’s cheeks to raise as well as bagging around the eyes while AU12 (zygomatic major) causes the corners of the mouth to move upward forming a smile. The intensity of each action unit was scored utilizing a 5-point intensity scale (ranging from a 1-minimal to 5-extreme). A smile intensity score was created by adding together the scores of Action Unit 12 and Action Unit 6 (2 meaning no smile and 10 being the highest smile intensity score available…Once all photos for an individual subject were scored, all of the that participant’s smile intensity scores were averaged to provide a total smile intensity score.”
Submitted by Paul Alper