Chance News 41

September 2, 2008 to November 14, 2008

Quotation

We never like to say zero in statistics.

Andrew Gelman

Forsooth

http://www.pmean.com/images/ForgottenMissingValue.jpg

This graphical forsooth was submitted by Steve Simon.

The following Forsooth was suggested by Paul Alper

According to an article by Nicolas Kristof of the New York Times, Christopher Ruhm, an economist at the University of North Carolina, Greensboro, claims that"each one-percentage-point drop in unemployment in the United States is associated with an extra 3,900 deaths from heart attacks." More generally, "Ruhm argues that death rates go down during economic slowdowns. Professor Ruhm's research indicates that suicides rise but total mortality rates drop, as do deaths from heart attacks, car accidents, pneumonia and most other causes."

John Lamperti suggested this forsooth:

By 2023, more than half of all American children will be minority, the Census Bureau projects.

The Party of Yesterday

New York Times, October 26, 2008
Timothy Agen

The following forsooths are from the November 2008 RSS News. It is nice to see that even their News can contribute a forsooth.

The RSS has global standing. Of its 7000 members, about one in four is drawn from over 50 countries

Statistics & Society

Royal Statistical Society
2008

Blondes are said to have more fun but it seems brunettes steal the hearts of billionaires.

Brunettes such as Microsoft boss Bill Gates's wife, Melinda French, are more likely to marry a successful man than their blonde sisters, a study today said. Experts checked the hair colour of the wives and girlfriends of the world's top 100 billionaires. Most- 62 per cent- were brunettes.

Fair-haired women came in a poor second with 22 per cent of the world's top billionaires marrying blonds.

Raven-haired women entice just 16 per cent of the worlds wealthiest men, while not one of the top billionaires is married to a redhead

Sunday Metro

6 April 2008

The next two forsooths were suggested by Paul Alper who wonders how a name--in this instance, two names--can become written in and still obtain zero votes. The Minnesota's Secretary of State website gives the recount of the Minnesota senate race. As of Nov.9, 2008 it was:

Party	Candidate	Totals
Independence	Dean Barkley	437385
Rebublican	Norm Coleman	1211556
Democratic-Farmer-Labor	Al Franken	1211335
Libertarian	Charles Aldrich	13916
Constitution	James Niemackl	8906
Write-In	Write-In**	2340
Write-In	Michael Cavlan**	1
Write-In	John H. Evan**	0
Write-In	Anthony Keith Price**	12
Write-In	Jack Shepard**	0

The counts have changed since then. As of November 13th, the Coleman lead over Franken is now 206 instead of 221, and Michael Cavlan has soared to 13 write-in votes from just 1; Evan and Shepard, however, remain rock-steady at zero write-in votes each.

Note: The FiveThirtyEight Website is maintained by Nate Silver and is one of the more original Electoral Prediction Websites. Stephanie Clifford wrote an interesting article about him here in the The New York Times. Silver is an expert in baseball predictions and applies some of his baseball techniques to predicting elections. See Clifford's article for a fascinating story about him.

Note added by Laurie Snell

Is the Bradley effect real?

Do polls lie about race? Kate Zernike, The New York Times, October 12, 2008.

There has been a lot written about the "Bradley effect." This is a phenomenon first noted in the race for governor of California in 1982, where the Los Angeles mayor, Tom Bradley, polled far ahead of his competition but lost by a small margin. This phenomenon was also noted in elections involving Harold Washington, David Dinkins, and Douglas Wilder. All of these candidates were black men and in these elections the results of the polls were more favorable to the black candidates than the election results. The belief is that people who are polled don't want to appear bigoted to the pollster by opposing the black candidate, but feel no such social pressure when casting their ballots.

In recent days, nervous Obama supporters have traded worry about a survey — widely disputed by pollsters yet voraciously consumed by the politically obsessed — that concluded racial bias would cost Mr. Obama six percentage points in the final outcome.

Is that true? Perhaps there is a Bradley effect, but perhaps not.

But pollsters and political scientists say concern about a Bradley effect — some call it a Wilder effect or a Dinkins effect, and plenty call it a theory in search of data — is misplaced. It obscures what they argue is the more important point: there are plenty of ways that race complicates polling. Considered alone or in combination, these factors could produce an unforeseen Obama landslide with surprise victories in the South, a stunningly large Obama loss, or a recount-thin margin. In a year that has already turned expectations upside down, it is hard to completely reassure the fretters.

The article notes situations where there may be a reverse Bradley effect. This occurs when

polls understate support for a black candidate, particularly in regions where it is socially acceptable to express distrust of blacks.

More critical than social expectations, perhaps, is an even more fundamental issue about polling.

Research shows that those who refuse to participate in surveys tend to be less likely to vote for a black candidate.

One survey researcher, Andrew Kohut, got at this indirectly by comparing people who responded immediately to those that required some extra effort.

Mr. Kohut conducted a study in 1997 looking at differences between people who readily agreed to be polled and those who agreed only after one or more callbacks. Reluctant participants were significantly more likely to have negative attitudes toward blacks — 15 percent said they had a “very favorable” attitude toward them, as opposed to 24 percent of the ready respondents. “The kinds of people suspicious of surveys are also more intolerant,” Mr. Kohut said.

The article discusses some of the issues involving the race of the person conducting the survey interview.

A further complication is the race of the person who asks the questions. Talking to a white interviewer, blacks or whites are more likely to say that they are supporting the white candidate; talking to a black interviewer, people are more likely to support the black candidate. This holds true whether the surveys are in person, or on the phone.

It is unclear, however, which type of interviewer is more likely to produce an accurate response.

Submitted by Steve Simon

Questions

1. If there is indeed a Bradley effect, is there any statistical adjustment that could be made to produce more accurate election polling results?

2. Does the study by Andrew Kohut produce a valid conclusion about the racial attitudes of non-respondents?

Ghost Writing

Conspiracy theorists sometimes turn to statistics to prove their case. Jack Cashill writing here compares Barack Obama's book, Dreams From My Father, with Bill Ayers' book, Fugitive Days, in order to show that Ayers is the true author of Dreams From My Father.

Cashill writes, "To add a little science to the analysis, I identified two similar 'nature' passages in Obama's and Ayers' respective memoirs, the first from Fugitive Days:

'I picture the street coming alive, awakening from the fury of winter, stirred from the chilly spring night by cold glimmers of sunlight angling through the city.'

The second from Dreams:

'Night now fell in midafternoon, especially when the snowstorms rolled in, boundless prairie storms that set the sky close to the ground, the city lights reflected against the clouds.'

These two sentences are alike in more than their poetic sense, their length and their gracefully layered structure. They tabulate nearly identically on the Flesch Reading Ease Score (FRES), something of a standard in the field. The 'Fugitive Days' excerpt scores a 54 on reading ease and a 12th grade reading level. The 'Dreams' excerpt scores a 54.8 on reading ease and a 12th grade reading level. Scores can range from 0 to 121, so hitting a nearly exact score matters."

Cashill continues, "A more reliable data-driven way to prove authorship goes under the rubric 'cusum analysis' or QSUM. This analysis begins with the measurement of sentence length, a significant and telling variable. To compare the two books, I selected thirty-sentence sequences from Dreams and Fugitive Days, each of which relates the author's entry into the world of 'community organizing.' 'Fugitive Days' averaged 23.13 words a sentence. 'Dreams' averaged 23.36 words a sentence. By contrast, the memoir section of [Cashill's] 'Sucker Punch' averaged 15 words a sentence."

Further, "Interestingly, the 30-sentence sequence that I pulled from Obama's conventional political tract, Audacity of Hope, averages more than 29 words a sentence and clocks in with a 9th grade reading level, three levels below the earlier cited passages from 'Dreams' and 'Fugitive Days.' The differential in the Audacity numbers should not surprise. By the time it was published in 2006, Obama was a public figure of some wealth, one who could afford editors and ghost writers."

Discussion

1. Go to the indispensible Wikipedia site here to find

Flesch Reading Ease
In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark more-difficult-to-read passages. The formula for the Flesch Reading Ease Score (FRES) test is

       206.835 -1.015 *(total words/total sentences)

                 -84.6*(total syllables/total words)

Here's the breakdown,

Score
Notes

90.0-100.0
easily understandable by an average 11-year old student

60-70
easily understandable by 13- to 15-year old students

0-30
best understood by college graduates

Determine the FRES score for this Chance News wiki. Do likewise for Cashill's article.

Wikipedia goes on to state:

"Reader's Digest magazine has a readability index of about 65, Time Magazine scores about 52, and the Harvard Law Review has a general readability score in the low 30s. The highest (easiest) readability score possible is 121 (every sentence consisting of only one-syllable words); theoretically there is no lower bound on the score -- this sentence, for example, taken as a reading passage unto itself, has a readability score of ~21.9. This paragraph has a readability score of ~53.93."

Verify the value of 53.93 which is very close to the 54 stated for each book. Does this lend credence to Bill Ayers having written this paragraph?

2. The same Wikipedia site has this to say about how FRES relates to grade level:

An obvious use for readability tests is in the field of education. The "Flesch-Kincaid Grade Level Formula" translates the 0-100 score to a U.S. grade level, making it easier for teachers, parents, librarians, and others to judge the readability level of various books and texts. It can also mean the number of years of education generally required to understand this text, relevant when the formula results in a number greater than 12. The grade level is calculated with the following formula:

            0.39*(total words/total sentences) +
                  11.8*(total syllables/total words) -15.59

The result is a number that corresponds with a grade level. For example, a score of 8.2 would indicate that the text is expected to be understandable by an average student in 8th grade (usually aged 13-15 in the U.S.).

Determine the grade level for this Chance News wiki. Do likewise for Cashill's article. Likewise for the Wikipedia paragraph.

3. Go here for information about QSUM and how to perform the calculation. Determine the QSUM for this Chance News wiki. Do likewise for Cashill's article. For a stinging critique of QSUM, go here.

4. Speculate on how similar any two literary works would be as viewed by FRES and QSUM if the investigator had complete free reign over what segments to use.

5. According to this Dr. Peter Millican of Hertford College, Oxford has a computer program "that can detect when works are by the same author." He was offered $10,000 by "Robert Fox, a Californian businessman and brother-in-law of Cris Cannon, a Republican congressman from Utah" to prove, as contended by Cashill, that Ayers wrote Obama's book. Fox "believed that if 'proof' of Ayers involvement was provided by an Oxford academic it would be political dynamite." Millican "took a preliminary look and found the charges 'very implausible'." Interest waned "when Millican said the results had to be made public, even if no link to Ayers was proved." Defend and criticize the actions of Fox and Cannon.

Submitted by Paul Alper

SMOG (Simple Measure of Gobbledygook)

Strange as it may seem to the general public, even statisticians want to write properly in order to communicate to the reader. A previous wiki (Ghost Writing) mentioned several websites which calculate readability and grade level using regression analysis; as you will see, there are others. According to Wikipedia, (Simple Measure of Gobbledygook) "is a readability formula that estimates the years of education needed to completely understand a piece of writing. SMOG is widely used, particularly for checking health messages. The precise SMOG formula yields an outstandingly high 0.985 correlation with the grades of readers who had 100% comprehension of test materials. SMOG was published by G. Harry McLaughlin in 1969 as a more accurate and more easily calculated substitute for the Gunning-Fog Index."

In order to calculate SMOG

1.. Count a number of sentences (at least: 10 from the start of a text, 10 from the middle, and 10 from the end).
2.. In those sentences, count the polysyllables(words of 3 or more syllables).
3.. Calculate using

grade = 1.0430*SQRT(number of polysyllables/number of sentences) +3.1291

For the Gunning-Fog Index, go here where you will find

The Gunning-Fog index can be calculated with the following algorithm

1.. Take a full passage that is around 100 words (do not omit any sentences).
2.. Find the average sentence length (divide the number of words by the number of sentences).
3.. Count words with three or more syllables (complex words), not including proper nouns (for example, Djibouti), compound words, or common suffixes such as -es, -ed, or -ing as a syllable, or familiar jargon.
4.. Add the average sentence length and the percentage of complex words (ex., +13.37%, not simply + 0.1337)
5.. Multiply the result by 0.4
The complete formula is as follows:

0.4*((words/sentence) +100(complex words/words))

This wonderful website, "is an interactive web page for checking a sample of writing. It is modeled after the ancient Unix utilities style and diction." One can "enter or copy text into the first box below. The scores to the right give the readability of the text according to various formulas" including all the ones mentioned thus far. "Words of three or more syllables are underlined. You should check the words or phrases in red to see if they should be re-written according to the suggestion in the brackets."

Click "the Demo" button one, two, or three times to see different samples of text. Check the scores for each sample; do you think the scores match the abilities of students in those grades? The different formulas give different estimates of grade level [required to understand the text]. Which formula is the most accurate? Click the 'Submit' button to look for problems and to see the more complex words underlined."

For example, enter the entire contents of Chance News Wiki #40 to obtain the following:

Flesch reading ease score:

   62.1

Automated readability index:

   11.1

Flesch-Kincaid grade level:

9.1

Coleman-Liau index:

   11.9

Gunning fog index:

   14.1

SMOG index:

   12.7

15658 characters 12913 non-space characters 12291 letters/numbers 2464 words 425 complex words 3681 syllables 136 sentences 4.99 chars per word 1.49 syllables per word 18.12 words per sentence

Oh, I forgot to mention this which explains the Coleman-Liau Index

To calculate the Coleman-Liau Index:

 1.. Divide the number of characters by the number of words, 
      and multiply by 5.89. Call this A. 
 2.. Take the number of sentences in a fragment of 100 words, 
     and multiply  0.3. Call this B. 
 3.. Subtract B from A and subtract 15.8
   CLI = 5.89*(characters/words)
             -0.3*(sentences/words) -15.8

And use this To calculate the Automated Readability Index:

 1.. Divide the number of characters by the number of words, 
     and multiply by  4.71. 
 2.. Divide the number of words by the number of sentences, 
     and multiply by 0.5. 
 3.. Add #1 and #2 together, and subtract 21.43.

ARI = 4.71*(characters/words) +0.5*(words/sentence) -21.43

Discussion

1. If possible, randomly sample material you have written and use this to see how your writing has changed over the years, thus obtaining a longitudinal view. Which index shows the most change in absolute value and/or relative value?

2. Do the same for Chance News to see how it has changed over the years.

3. Ask some teachers of English what they think of these figures of merit.

Submitted by Paul Alper

Odds: You are not the deciding vote

Odds of single vote deciding presidential election
1 in 60 million, better in swing states
Seth Borenstein, AP Science Writer
November 2, 2008

In this article we read

If you're in New Mexico, you have a better chance of having your vote matter than winning the New York Lottery," according to Aaron Edlin, a professor of economics and law at the University of California, Berkeley.

The article includes the following graphic to show the chance our vote matters for the various states.

http://www.dartmouth.edu/~chance/forwiki/election2.jpg

Borenstein remarks:

The odds of a District of Columbia resident casting the vote that decides the election are 1 in 490 billion.

That's essentially zero, but Andrew Gelman professor of statistics and political science at Columbia University said: "We never like to say zero in statistics."

The article referred to is What is the Probability your vote will make a difference? by Andrew Gelman, professor of statistics and political science at Columbia University, Aaron Edlin, professor of economics and law at the University of California, Berkeley, and Nate Silver, prominent baseball statistician, who also maintains a political prediction website.

Here is the abstract for the study:

One of the motivations for voting is that one vote can make a difference. In a
presidential election, the probability that your vote is decisive is equal to the probability that your state is necessary for an electoral college win, times the probability the vote in your state is tied in that event. We compute these probabilities for each state in the 2008 presidential election, using state-by-state election forecasts based on the latest polls. The states where a single vote is most likely to matter are New Mexico, Virginia, New Hampshire, and Colorado, where your vote has an approximate 1 in 10 million chance of determining the national election outcome. On average, a voter in America has a 1 in 60 million chance of being decisive in the presidential
election.

The article gives a detailed discussion on how they estimate these probabilities which is too technical to include here.

Discussion

(1) The authors comment: An objection sometimes arises about this sort of calculation that one vote never makes a difference, because if the election were decided by one vote, there would be a recount anyway. They say this argument is wrong. Do you agree?

(2) The authors suggest that a vote is like a lottery ticket with a 1 in 10 million chance of winning, but the payoff is the chance to change national policy. What do you think of this argument for voting?

Submitted by Laurie Snell and Paul Alper.

How were Princeton Consortium predictions?

In Chance News 39 we described the Princeton Election Consortium run by Princeton faculty member Stan Wang. Here, in a Presidential Election, Stan estimates for each day the number of electoral votes each candidate would have if the election ended on this day. The prediction on the day of the election is a measure of the success of his method. On his website, on the day after the election, we read:

Electoral vote (EV): The final polling snapshot is Obama 352 EV, McCain 186 EV.

The actual outcome that day was Obama 349 EV, McCain 162 EV so his estimates were pretty good. You can see much more about the success of their predictions by going to the consortium website.

As mentioned above, Nate Silver is an expert in baseball prediction, who now has here one of the more interesting electoral prediction websites. There is also a facinating story about Silver herein the New York Times.

Submitted by Laurie Snell

An unusual amendment in Florida

In an article in the Nov. 5 New York Times reporting on the various amendments in Tuesday's voting we read:

Among the more unusual measures on this year's ballots was one in Florida that would repeal an old clause in the state constitution that allows legislators to bar Asian immigrants from owning land. The repeal would be symbolic, as equal protection laws would prevent lawmakers from applying the ban. With 78 percent of precincts reporting just before 11 p.m. Tuesday, the vote was close, with 52 percent voting to preserve the clause.

A more complete discussion of this can be found in this Nov.6 New York Times article. The amendment to remove the clause which barred immigrants from owning land confused the voters. Many thought the clause referred to barring ILLEGAL immigrants rather than to barring Asian-AMERICANS from owning land.

Residual Votes in the Minnesota Senate Race.

Residual Votes in the 2008 Minnesota Senate Race
Jonathan W. Chipman, Michael C. Herrony, Jeffrey B. Lewisz
November 13, 2008

Dan Rockmore suggested this paper.

Here is the authors' abstract:

The 2008 United States Senate race in Minnesota is one of the closest electoral
contests in recent history: as of this writing, out of over 2.9 million ballots cast only 206 votes separate incumbent Republican Senator Norm Coleman and his Democratic challenger, Al Franken. The Minnesota Senate race is slated to be recounted starting on November 19, 2008, and a key issue in the recount will be the approximately 34 thousand residual votes associated with it. A Senate residual vote is, roughly speaking, the product of a ballot that lacks a recorded Senate vote, and in the Minnesota Senate race there is no doubt that the number of residual votes dwarfs the margin that separates Coleman from Franken. We show using a combination of precinct voting returns from the 2006 and 2008 General Elections that patterns in Senate race residual votes are consistent with, one, the presence of a large number of Democratic-leaning voters, in particular African-American voters, who appear to have deliberately skipped voting in the Coleman-Franken Senate contest and, two, the presence of a smaller number of Democraticleaning voters who almost certainly intended to cast a vote in the Senate race but for some reason did not do so. Ultimately, the anticipated recount may clarify the relative proportions of intentional versus unintentional residual votes. At present, though, the data available suggest that the recount will uncover many of the former and that, of the latter, a majority will likely prove
to be supportive of Franken.

Discussion

Read the article and make your own prediction for the winner.

Chance News 41

Contents

Quotation

Forsooth

Is the Bradley effect real?

Questions

Ghost Writing

Discussion

SMOG (Simple Measure of Gobbledygook)

Odds: You are not the deciding vote

Discussion

How were Princeton Consortium predictions?

An unusual amendment in Florida

Residual Votes in the Minnesota Senate Race.

Discussion

Navigation menu

Chance News 41

Quotation

Forsooth

Is the Bradley effect real?

Questions

Ghost Writing

Discussion

SMOG (Simple Measure of Gobbledygook)

Odds: You are not the deciding vote

Discussion

How were Princeton Consortium predictions?

An unusual amendment in Florida

Residual Votes in the Minnesota Senate Race.

Discussion

Navigation menu

Search