Chance News 88

From ChanceWiki
Jump to navigation Jump to search

Quotations

Forsooth

“Odds of becoming a top ranked NASCAR driver: 1 in 125 billion.”

from an advertisement by Autism Speaks in Sports Illustrated

(There are only about 7 billion people in the world, so if there are only two “top ranked drivers” then the odds are only 1 in 3.5 billion or so.)

Submitted by Marc Hurwitz

Impact and retract

As unlikely as it may seem, there are many thousands (!) of health/medical journals published each month. Obviously, some carry more clout than others when it comes to promotion and reputation of contributing authors. Those journals are said to have high “impact factors.” The de facto and default definition of IF, according to Wikipedia “was devised by Eugene Garfield, the founder of the Institute for Scientific Information (ISI), now part of Thomson Reuters. Impact factors are calculated yearly for those journals that are indexed in Thomson Reuters Journal Citation Reports.”

The calculation of IF is a bit involved:

In a given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. For example, if a journal has an impact factor of 3 in 2008, then its papers published in 2006 and 2007 received 3 citations each on average in 2008. The 2008 impact factor of a journal would be calculated as follows:

A = the number of times articles published in 2006 and 2007 were cited by indexed journals during 2008.
B = the total number of "citable items" published by that journal in 2006 and 2007. ("Citable items" are usually articles, reviews, proceedings, or notes; not editorials or Letters-to-the-Editor.)
2008 impact factor = A/B.

(Note that 2008 impact factors are actually published in 2009; they cannot be calculated until all of the 2008 publications have been processed by the indexing agency.)

Of course, when there is an “A over B” you can be sure that some journals might be tempted to inflate A and/or lower B to obtain a higher IF.

A journal can adopt editorial policies that increase its impact factor. For example, journals may publish a larger percentage of review articles which generally are cited more than research reports. Therefore review articles can raise the impact factor of the journal and review journals will therefore often have the highest impact factors in their respective fields. Journals may also attempt to limit the number of "citable items", ie the denominator of the IF equation, either by declining to publish articles (such as case reports in medical journals) which are unlikely to be cited or by altering articles (by not allowing an abstract or bibliography) in hopes that Thomson Scientific will not deem it a "citable item". (As a result of negotiations over whether items are "citable", impact factor variations of more than 300% have been observed.)

Then, there is “coercive citation”

in which an editor forces an author to add spurious self-citations to an article before the journal will agree to publish it in order to inflate the journal's impact factor.

The pressure on a researcher to publish in high IF journals according to Björn Brembs is extremely high:

As a scientist today, it is very difficult to find employment if you cannot sport publications in high-ranking journals. In the increasing competition for the coveted spots, it is starting to be difficult to find employment with only few papers in high-ranking journals: a consistent record of ‘high-impact’ publications is required if you want science to be able to put food on your table. Subjective impressions appear to support this intuitive notion: isn’t a lot of great research published in Science and Nature while we so often find horrible work published in little-known journals? Isn’t it a good thing that in times of shrinking budgets we only allow the very best scientists to continue spending taxpayer funds?

Ah, but Brembs then points out that as plausible as the above argument is regarding the superiority of high IF journals, the data do not support that statement. He refers to an article by Fang and Casadevall from which he obtains this stunning regression graph:

Brembs.png

The retraction index is the number of retractions in the journal from 2001 to 2010, multiplied by 1000, and divided by the number of published articles with abstracts. The p-value for slope is exceedingly small and the coefficient of determination is .77. Thus, “at least with the current data, IF indeed seems to be a more reliable predictor of retractions than of actual citations.” He reasons that

If your livelihood depends on this Science/Nature paper, doesn’t the pressure increase to maybe forget this one crucial control experiment, or leave out some data points that don’t quite make the story look so nice? After all, you know your results are solid, it’s only cosmetics which are required to make it a top-notch publication! Of course, in science there never is certainty, so such behavior will decrease the reliability of the scientific reports being published. And indeed, together with the decrease in tenured positions, the number of retractions has increased at about 400-fold the rate of publication increase.

Discussion

1. Obtain a (very) good dictionary to see how the grammatical uses of the word “impact” has differed down through the centuries with a shift taking place somewhere in the post-World-War-II world. Ask an elderly person for his view of “impact” as a verb let alone as an adjective. Do the same for the word “contact” which had a grammatical shift in the 1920s.

2. The Fang and Casadevall paper had the graph presented this way:

FangCasadevallFig1.png

Why is Brembs’ version more suggestive of a cause (IF) and effect (retraction index) relationship?

3. Give a plausibility argument for why many low-level IF journals might have a virtually zero retraction index.

4. For an exceedingly interesting interview with Fang and Casadevall see Carl Zimmer’s NYT article.

Several factors are at play here, scientists say. One may be that because journals are now online, bad papers are simply reaching a wider audience, making it more likely that errors will be spotted. “You can sit at your laptop and pull a lot of different papers together,” Dr. Fang said.

But other forces are more pernicious. To survive professionally, scientists feel the need to publish as many papers as possible, and to get them into high-profile journals. And sometimes they cut corners or even commit misconduct to get there.

Each year, every laboratory produces a new crop of Ph.D.’s, who must compete for a small number of jobs, and the competition is getting fiercer. In 1973, more than half of biologists had a tenure-track job within six years of getting a Ph.D. By 2006 the figure was down to 15 percent.

The article is packed with intriguing discussion points about funding and ends with Fang’s pessimistic/realistic lament:

“When our generation goes away, where is the new generation going to be?” he asked. “All the scientists I know are so anxious about their funding that they don’t make inspiring role models. I heard it from my own kids, who went into art and music respectively. They said, ‘You know, we see you, and you don’t look very happy.’ ”

Submitted by Paul Alper

Where are the 47%?

The geography of the 47%
by Richard Florida, TheAtlanticCities.com, 19 September 2012

The article included the scatterplot shown below. Each point represents a state. The full version from the Atlantic (available here) is interactive: you can click on points to identify the state.

Nonpayers.png


The tax data are available from the TaxFoundation.org. Note that non-payers are defined as those who filed tax returns indicating no liability. As explained in the article, there are other nonpayers who are not required to file (which is why there are no points on the plot at 47% or more!).

Suggested by Margaret Cibes

The sexiest job

Data scientist: The sexiest job of the 21st century
by Thomas H. Davenport and D.J. Patil , Harvard Business Review, October 2012

According to the article, the job title data scientist was "coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook."

Thanks to Nick Horton, who sent this link to the Isolated Statisticians list.

Skewed polling?

The skewed polls issue and why it is important
by Dean Chambers, Examiner.com, 25 September 2012

Chambers has a website, UnskewedPolls, where he reanalyzes polls published by other organizations in order to adjust for what he sees as inherent bias. A number of recent polls have shown President Obama with a lead in key swing states. Chambers challenges these results on the basis that respondents who self-identify as Democrats comprise too large a proportion of the sample. By reweighting the results to reflect what he asserts are the true party proportions among all voters, Chambers finds that most polling data actually indicate that Romney is leading. Here is one example from the article

The Gallup tracking poll, which has been over-sampled Democrats in the past, has released its latest numbers today showing President Obama leading 48 percent to 45 percent for Mitt Romney. But the non-skewed uses a sample weighted by the expected partisan makeup of the electorate, the QStarNews Daily Tracking poll [Chambers's organization], shows Romney leading over Obama by a 53 percent to 45 percent margin.

Gallup's editor-in-chief, Frank Newport, responds to this issue in a recent post, The recurring -- and misleading -- focus on party Identification (27 September 2012). He says that Gallup determines party identification as part of its surveys, asking, “In politics, as of today, do you consider yourself a Republican, a Democrat, or an independent?" Thus, rather than reflecting fixed percentages, party affiliation is itself dynamic. In other words, what Chambers interprets as an over-sampling of Democrats may instead reflect increasing support for the Democratic candidate.

The birthday problem

It’s my birthday too, yeah
by Steven Strogatz, New York Times, 1 October 2012

We are happy to report that Steven Stogatz has returned to the Times with a new Opinionator series entitled Me, Myself and Math (his earlier series, The Elements of Math, appeared in 2010).

For the present piece, he has unearthed some wonderful archival video of a Tonight Show episode from 1980, in which Johnny Carson and Ed McMahon attempt to validate the famous birthday problem using the studio audience. Alas, Ed inadvertently causes Johnny to confuse it with the "birthmate problem" (how many people do you need to match a particular birthday?). They wind up asking for the birthday of an audience member from the front row, and are then puzzled when no one else shares that birthday. But do watch the video--a verbal description can't do justice to Johnny's inimitable delivery!

The surprising revelation here, as Steven describes, is that it was Carson himself who brought up the birthday problem. Various retellings of the story over the years inserted a guest mathematician/statistician whose attempt to explain the problem was derailed by the host.

Submitted by Bill Peterson