Archive for the ‘Stories by Numbers’ Category

It has been inspiring to watch how Hans Rosling gave impressive talks about numbers and statistics. If you haven’t seen any of his great presentations, here is one example:

Chances are that you probably haven’t seen him showing his wild side before. I just saw this article, “Hans Rosling: the man who makes statistics sing“, in which he was referred as “the ‘Jedi master’ of data”, Not just because of his magical power with data. The fact is that the professor’s main hobby is sword swallowing.

What? sword swallowing? Yes! There is a video on YouTube showing him doing so (at around 8 min 30 sec).

Wow! This is eye-opening.

Read Full Post »

With the 2013-2014 NFL preseason games underway, the business for experts to predict games is about to start again. Cannot wait…

Here comes the ESPN expert pick for week 1.

CBSsports also joined the expert pick business this year with its collection of experts:

Fans must be eager to know who is the best expert in this NFL prediction game and there are already questions posted in the comment section of ESPN Expert picks.

Based on the record I collected from ESPN in the last two years, we clearly have a winner: Seth Wickersham, who correctly predicted 69.9% and 65.2% games, the best among ESPN experts, for the last two NFL seasons, respectively. Here are the overall prediction accuracy records of each expert in the last two season, with more details here.

Picks Allen Golic Hoge Jaworski Mortensen
2013 60.2% 63.3% 66.8% 65.6% 69.5%
2012 65.0% 62.9% 63.5% 64.6% 60.5%
Picks Schefter Schlereth Wickersham Jackson Johnson
2013 62.5% 64.5% 69.9% 62.9% 60.2%
2012 61.7% 65.2% 65.2% N/A N/A
Picks Ditka Carter Accuscore Pick’em
2013 64.8% 66.0% 64.1% 65.9%
2012 N/A N/A 68.0% 68.0%

Adam Schefter has the worst prediction average among the ones who made picks for the last two seasons, and Keyshawn Johnson was the worst for the last season.

Chris Mortensen‘s results are the most curious ones, winner of the most improved expert award in 2013. He did really well for the 2013 season, but his predictions was worst of the worst for 2012 (large variability?).  Let’s see if he keeps it up this year 🙂

Some additional background information: Accuscore is based on simulations (algorithms and data) by accuscore.com and Pick’em is the average of all predictions by NFL fans who submitted their picks on ESPN.com before the game (kind of a “crowd prediction” by non-experts).

Unlike predictions used in the last two years, the ESPN expert pick page shows that Accuscore prediction is no longer included this year. I wish ESPN still includes this algorithm (statistics) based prediction in this prediction game.

We also had fun of comparing expert picksalgorithmic prediction and crowd prediction of the 2011-2012 season.

For this year, more experts, more fun! Now, let the game start! Are you ready for the football (and those experts)?

Read Full Post »

The 2012-2013 NFL regular season games were in the book now. Following the fun of comparing expert picksalgorithmic prediction and crowd prediction of the last (2011-2012) season, let’s check how well they predicted this time. Some background information: Accuscore is based on simulations (algorithms and data) by accuscore.com and Pick’em is the average of all predictions by NFL fans who submitted their picks on ESPN.com before the game (kind of a “crowd prediction” by non-experts).

The first noticeable difference is that ESPN added a couple of experts to their pool, from 8 up to 12 by adding Jackson, Johnson, Ditka, Carter (to get a better crowd of experts?)


For the 2012-2013 season, twelve experts’ prediction accuracies range from 60.2% to 69.9% with the median around 64.6%, roughly the same as the median accuracy 64.1% of eight experts in 2011-2012 season.

Picks Allen Golic Hoge Jaworski Mortenson
2013 60.2% 63.3% 66.8% 65.6% 69.5%
2012 65.0% 62.9% 63.5% 64.6% 60.5%
Picks Schefter Schlereth Wickersham Jackson Johnson
2013 62.5% 64.5% 69.9% 62.9% 60.2%
2012 61.7% 65.2% 65.2% N/A N/A
Picks Ditka Carter Accuscore Pick’em
2013 64.8% 66.0% 64.1% 65.9%
2012 N/A N/A 68.0% 68.0%

Pick’em tied accusore with 68% accuracy, better than all experts, in 2011-2012, but both clocked in much lower for the 2012-2013 season. Pick’em achieved 65.9%, slightly beating 8 out of 12 experts, while accusore was worse than 7 experts. Now what do we say about crowd prediction and algorithm prediction?

By the way, it seems Wickersham is the best expert for prediction and did his homework. Way to go!

For statisticians, are these percentages differ significantly?

Read Full Post »

Between finishing a satellite data workshop in the afternoon and catching my red-eye flight on the new Dreamliner, I took a one-hour break and watched a documentary “Chasing Ice” in the sunny Pasadena, CA.

This is a stunning film, a must see! You should check if it is shown in your city and go watch if you can.

Without talking too much into the film, let me say that the film is about photographer James Balog and his colleagues share their journey at the Extreme Ice Survey project. With astonishing photos and videos, the project delivers understandings of what’s happening on earth much much more than any data or analysis can show.

Some of pictures and ideas are shown in James’s TED talk

and check out the trailer of the film

or James Balog’s photo gallery



Surprisingly (or unsurprisingly), I found a familiar face that I know in the film. Jason Box! Of course, he led them to Greenland 🙂

For Columbus viewers, the film is currently shown at Gateway Film Center. On December 13th, there is a special event.

Join us on Thursday, December 13 at 7:00 PM for a special screening of CHASING ICE followed immediately by a panel discussion and question and answer session led by members of the Byrd Polar Research Center. Panelists include:

Dr. Ellen Mosley-Thompson, Distinguished University Professor, Senior Research Scientist and Director of Byrd Polar Research Center

Dr. Lonnie Thompson, Distinguished University Professor of Earth Sciences and Senior Research Scientist, Byrd Polar Research Center

Santiago de la Peña, Byrd Post-Doctoral Research Fellow, Byrd Polar Research Center

Jeff La Frenierre, Ph.D. Candidate, Department of Geography and Byrd Polar Research Center

Read Full Post »

If you are in a relaxed mode and get spare time to kill in the long thanksgiving weekend? I got a small collection of videos about statistics I’d like to share. First, let me give my special thanks to the speakers who delivered those great stat talks in the fast few years.

If you enjoy these illustration of the power of data and proper analysis, spread the words!

Please, please … Let me know if you run into other nice related videos I have missed (I’m 99.99% sure that I did), I’d love to add them to this collection.

  • Arthur Benjamin: Teach statistics before calculus!
  • Hans Rosling: The Joy of Stats
  • Sebastian Wernicke: Lies, damned lies and statistics (about TEDTalks)
  • Hans Rosling: Stats that reshape your worldview
  • Hans Rosling: Let my dataset change your mindset

Read Full Post »

On the way to school late night yesterday, I randomly checked in Urban Meyer’s call in show for the first time.  I was pleasantly surprised by a question from the first caller, Adam, and was even more impressed by Urban‘s answer.

Adam’s question was asking the origin and evolution of the coach’s football philosophy and “To what extents, statistics, football statistics, play a role in your football philosophy?” Here is Urban’s answer:

Adam, that’s a. Congratulations. I appreciate your phone call. That’s well thought out and researched question. It is the essence of everything we do. You say what role statistics play in the management of a football game, a football program. In my world, it is everything.

And you will hear people say statistics are for losers. Usually losers are the one who’s saying that (made my day :)).

Statistics are very important and we. There are times, there is one thing that is not statistically analyzed. It is momentum. Momentum, to me, when you deal with young people. They are maybe inexperienced teams at some positions, it is even greater. So the higher level, think about this, the higher level football you get, momentum is not quite as much as a factor. […]

So we do play a game of statistics where we try to manage the game, try to force the team to drive the length of the field, take care of the football. Once you cross the fifty, that’s where you get more aggressive in play calling. However, there is times that we run a faked punt against Nebraska. That was not a statistically well-thoughtout play. However, we were in the quick sand and we were heading into the bad way against a good offense. So you need that momentum shift for your team. [……]

It was an interesting description of what statistics can do and cannot. Statistics and Momentum.

It reminded me a post, Winning with numbers, I wrote about Russ Rose around this time last year. Russ is the head coach of Penn State women’s volleyball. by coaching with numbers, he is the coach with the highest winning percentage among all NCAA sports. Interestingly, he also holds a master’s degree from Nebraska, where he wrote his thesis on volleyball statistics.

By the way, Ohio States are playing at Penn States this weekend. Go Buckeyes!

Now, what I really wonder is if momentum has anything to do with numbers that statistician can measure and study as well 🙂 By a quick search on the internet, I found this: (more…)

Read Full Post »

Heard two stories about unemployment rates on NPR today.

The first report is about the hottest topic today: “One Jobs Report, Two Different Political Spins” that reports the nation’s unemployment rate in US fell to 7.8 percent in September. As the title suggests: one number, two interpretations:

The Obama administration got good news Friday: Jobs are indeed growing.

But, as Republicans noted, the pace remains well below the level needed to provide paychecks for the 12.1 million people seeking them.


Even staring at the same number, people feel so different about it.

Meanwhile, another story about unemployment rate was reported about China: “No One Trusts China’s Unemployment Rate“. The official unemployment rate stands at 6.5%, but the report says no one believes it. 🙂

“The unemployment rate in China is one of the most useless and ridiculous statistics out there,” says macroeconomic researcher Arthur Kroeber of Dragonomics. “No one pays any attention to it, because everyone knows it’s a complete fiction.”

It’s not like China isn’t trying. It has a national statistics office that works very hard. But the country is so big, and changing so quickly, that it is actually really hard to keep track of what is going on.

I’m not an economist, but I do see reasons why the reported rate in China might be off target.

At the end of the day, statistics is still labeled as something beyond the damned lies. The value of those numbers really depends how we get them, how we use them and for what purposes.

Let the spin start ……

Read Full Post »

Given my bad memory, it is to a point that remembering all PIN numbers of different cards, accounts, and doors is close to impossible. However, PIN numbers are the ones you don’t want to share with others either. Here is an interesting analysis.

Robert Siegel at NPR talked to Nick Berry, president of the data mining consulting company Data Genetics, about his findings, the least popular PIN and why the password “1234” isn’t such a great idea. Nick analyzed a database of 3.4 million exposed passwords, mostly for websites, condensed from released/exposed/discovered password tables and security breaches.

The first big finding:

The most popular password is  1234 …… it’s staggering how popular this password appears to be. Utterly staggering at the lack of imagination …… nearly 11% of the 3.4 million passwords are  1234  !!!

Following 1234, here come 1111, 0000, 1212, 7777, ……, the most frequently used PIN numbers. The complete list can be found from Nick’s blog post PIN analysis, which is a very nice statistical analysis worth reading.

The high frequency list comes with puzzles as well. For example, the number 2580  is in position #22. What is the significance of these digits? Nick figured out that it is due to the layout of a telephone keypad!

The least used in the database 8068 has only 25 occurrences in 3.4 million (0.000744%). Be careful to use it since it may not be least used any longer 🙂

See more interesting analysis at Nick’s post, from which (and xkcd) I stole another nice cartoon.

Read Full Post »

After investigating results of  expert picks and algorithmic prediction of 2011-2012 season NFL game in earlier posts, we may have been convinced that systematic study of relevant data may lead to better “expert”. Machine beats human with a wide margin (68% v.s. 65% accuracy rate)?

Now if you move your eyes to the last column of the prediction image, you will notice the word “Pick’em“.  It is the average of all predictions by NFL fans who submitted their picks on ESPN.com before the game. A kind of “crowd prediction” by non-experts.

Like Accuscore, Pick’em prediction scored 10-6 in the first week of 2011-2012 NFL season, no more, no less.

This slideshow requires JavaScript.

However, at the end of the  17 weeks of regular season games, Pick’em tied with accusore with 174 right picks (68%accuracy). Isn’t it amazing? The next table contains the performance of the best four experts, Accuscore and Pick’em.

Allen Jaworski Schlereth Wickersham Accuscore Pick’em
Correct 165 155 167 167 174 174
Wrong 89 85 89 89 82 82
Accuracy .65 .65 .65 .65 .68 .68

Aha!  Wisdom of the crowd kicks in nicely. A classic example of this phenomenon is often mentioned as:

At a 1906 country fair in Plymouth, eight hundred people participated in a contest to estimate the weight of a slaughtered and dressed ox. Statistician Francis Galton observed that the mean of all eight hundred guesses, at 1197 pounds, was closer than any of the individual guesses to the true weight of 1198 pounds.

This year, Lior Zoref actually bring a live ox on stage of TED.

How good does the the mind of that crowd do? There were 500 estimates, and the results were:

-The lowest guess was 308 lbs.

-The highest was more than 8000 pounds.

-The average was 1792 pounds.

And the real weight? The ox weighs 1795 pounds. Three pounds off.

People rock! (How about random forests and boosting, if you know what I’m talking about.)

Read Full Post »

Following my previous post on expert picks of NFL games, one may wonder if someone can do better than those experts in picking winning teams ahead of the game. If you direct your eagle eyes to the second last column of the prediction image, you will notice the word “Accuscore”. What is it? ESPN describes it as:

AccuScore has powered more than 10,000 simulations for every NFL game on ESPN.com, calculating how each team’s performance changes in response to game conditions and opponent’s abilities. Each game is simulated and the game is replayed a minimum of 10,000 times to generate forecasted winning percentages.

In short words, it is prediction based on simulations (algorithms and data). As a company, Accusore runs simulation on almost every major sport game: NFL, NBA, NCAA FB, NCAA BB, … Of course, they sell their prediction through membership purchase. It seems that they are trying their best to use massive amount of data to generate a fortune. So, how well are they doing?

For predicting the first week of 2011-2012 season, Accuscore prediction scored 10-6, just like one extra expert, no better, no worse.

However, at the end of the  17 weeks of regular season games, accusore did pretty well, compared to any of those experts.

Allen Golic Hoge Jaworski Mortensen Schefter Schlereth Wickersham Accuscor
Correct 165 161 162 155 155 158 167 167 174
Wrong 89 95 93 85 101 98 89 89 82
Accuracy .65 .63 .64 .65 .61 .62 .65 .65 .68

Would we call it the power of data mining? (… to be continued …)

Read Full Post »

Older Posts »