Chance News 54: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
Line 62: Line 62:


<blockquote>"The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis ('love' is good, 'hate' is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray."</blockquote>
<blockquote>"The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis ('love' is good, 'hate' is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray."</blockquote>
Submitted by Steve Simon


===Questions===
===Questions===


1. No algorithm is going to be perfect, but some may provide sufficient accuracy to be useful. How would you measure the accuracy of a sentiment algorithm?
1. No algorithm is going to be perfect, but some may provide sufficient accuracy to be useful. How would you measure the accuracy of a sentiment algorithm?

Revision as of 16:21, 27 August 2009

Quotations

Do not put your faith in what statistics say until
you have carefully considered what they do not say.

William W. Watt

Forsooth

Maynard Keynes' game and the Efficient Market Hypothesis

Jeff Norman told us about an interesting game and its relation to investment theory. The game was descried in terms of professional investment by the famous British Economist John Maynard Keynes in his book The General Theory of Employment, Interest and Money, 1936. Here he writes:

Professional investment may be likened to those newspaper competitions in which the competitors have to pick out the six prettiest faces from a hundred photographs, the price being awarded to the competitor whose choice most nearly corresponds to the average preference of the competitors as a whole; so that each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one’s judgment, are really prettiest, nor even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practice the fourth, fifth and higher degrees

Keynes used this game in his argument against the Efficient-market hypothesis (EMH) theory witch is defined at Answers.com as:

An investment theory that states that it is impossible to "beat the market" because stock market efficiency causes existing share prices to always incorporate and reflect all relevant information. According to the EMH, this means that stocks always trade at their fair value on stock exchanges, and thus it is impossible for investors to either purchase undervalued stocks or sell stocks for inflated prices. Thus, the crux of the EMH is that it should be impossible to outperform the overall market through expert stock selection or market timing, and that the only way an investor can possibly obtain higher returns is by purchasing riskier investments.

That efficient market hypothesis is a controversial subject and discussed on many websites. We can see this in an article by John Mauldin who is president of Millennium Wave Advisors, LLC, a registered investment advisor. Here you will also see more about Keynes' game and its relation to the EMF.

We read here Keynes game can be easily replicated by asking people to pick a number between 0 and 100, and telling them the winner will be the person who picks the number closest to two-thirds the average number picked. The chart below shows the results from the largest incidence of the game that I have played - in fact the third largest game ever played, and the only one played purely among professional investors.

http://www.investorsinsight.com/cfs
http://www.investorsinsight.com/cfs-file.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/thoughts_5F00_from_5F00_the_5F00_frontline/jm080709image010_5F00_2F080074.jpg

The highest possible correct answer is 67. To go for 67 you have to believe that every other muppet in the known universe has just gone for 100. The fact we got a whole raft of responses above 67 is more than slightly alarming.

You can see spikes which represent various levels of thinking. The spike at fifty reflects what we (somewhat rudely) call level zero thinkers. They are the investment equivalent of Homer Simpson, 0, 100, duh 50! Not a vast amount of cognitive effort expended here!

There is a spike at 33 - of those who expect everyone else in the world to be Homer. There's a spike at 22, again those who obviously think everyone else is at 33. As you can see there is also a spike at zero. Here we find all the economists, game theorists and mathematicians of the world. They are the only people trained to solve these problems backwards. And indeed the only stable Nash equilibrium is zero (two-thirds of zero is still zero). However, it is only the 'correct' answer when everyone chooses zero.

The final noticeable spike is at one. These are economists who have (mistakenly...) been invited to one dinner party (economists only ever get invited to one dinner party). They have gone out into the world and realised the rest of the world doesn't think like them. So they try to estimate the scale of irrationality. However, they end up suffering the curse of knowledge (once you know the true answer, you tend to anchor to it). In this game, which is fairly typical, the average number picked was 26, giving a two-thirds average of 17. Just three people out of more than 1000 picked the number 17.

I play this game to try to illustrate just how hard it is to be just one step ahead of everyone else - to get in before everyone else, and get out before everyone else. Yet despite this fact, it seems to be that this is exactly what a large number of investors spend their time doing.

See also Efficient Market Hypothesis on Trial:A Survey by Philip S. Russel and Violet M. Torbey

Submittedby Laurie Snell

Measuring Emotion on the Web

Mining the Web for Feelings, Not Facts. Alex Wright, The New York Times, August 23, 2009.

There's a lot of data on the web, but it isn't data in the numeric sense.

"The rise of blogs and social networks has fueled a bull market in personal opinion: reviews, ratings, recommendations and other forms of online expression."

There are serious reasons to sift through this data.

"For many businesses, online opinion has turned into a kind of virtual currency that can make or break a product in the marketplace. Yet many companies struggle to make sense of the caterwaul of complaints and compliments that now swirl around their products online. "

A new methodology, sentiment analysis, attempts to summarize the positive and negative emotions associated with these reviews and ratings.

"Jodange, based in Yonkers, offers a service geared toward online publishers that lets them incorporate opinion data drawn from over 450,000 sources, including mainstream news sources, blogs and Twitter. Based on research by Claire Cardie, a Cornell computer science professor, and her students, the service uses a sophisticated algorithm that not only evaluates sentiments about particular topics, but also identifies the most influential opinion holders."

"In a similar vein, The Financial Times recently introduced Newssift, an experimental program that tracks sentiments about business topics in the news, coupled with a specialized search engine that allows users to organize their queries by topic, organization, place, person and theme. Using Newssift, a search for Wal-Mart reveals that recent sentiment about the company is running positive by a ratio of slightly better than two to one. When that search is refined with the suggested term “Labor Force and Unions,” however, the ratio of positive to negative sentiments drops closer to one to one."

This work isn't easy.

"Translating the slippery stuff of human language into binary values will always be an imperfect science, however. 'Sentiments are very different from conventional facts,' said Seth Grimes, the founder of the suburban Maryland consulting firm Alta Plana, who points to the many cultural factors and linguistic nuances that make it difficult to turn a string of written text into a simple pro or con sentiment. ' "Sinful" is a good thing when applied to chocolate cake,' he said."

"The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis ('love' is good, 'hate' is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray."

Submitted by Steve Simon

Questions

1. No algorithm is going to be perfect, but some may provide sufficient accuracy to be useful. How would you measure the accuracy of a sentiment algorithm?