Feeds:
Posts
Comments

It has been inspiring to watch how Hans Rosling gave impressive talks about numbers and statistics. If you haven’t seen any of his great presentations, here is one example:

Chances are that you probably haven’t seen him showing his wild side before. I just saw this article, “Hans Rosling: the man who makes statistics sing“, in which he was referred as “the ‘Jedi master’ of data”, Not just because of his magical power with data. The fact is that the professor’s main hobby is sword swallowing.

What? sword swallowing? Yes! There is a video on YouTube showing him doing so (at around 8 min 30 sec).

Wow! This is eye-opening.

Big congratulations to Terry Speed who won the 2013 Prime Minister’s Prize for Science award.

Image

When he joins ABC News Breakfast, he talks about the prize, the pride, and the role statistics played in the O.J. Simpson murder case in which he testified as an expert witness.

Listen to him and you will also figure out where the prize money, or at least half of it, will go 🙂

Here is an old post about Terry’s Stuff.

With the 2013-2014 NFL preseason games underway, the business for experts to predict games is about to start again. Cannot wait…

Here comes the ESPN expert pick for week 1.

CBSsports also joined the expert pick business this year with its collection of experts:

Fans must be eager to know who is the best expert in this NFL prediction game and there are already questions posted in the comment section of ESPN Expert picks.

Based on the record I collected from ESPN in the last two years, we clearly have a winner: Seth Wickersham, who correctly predicted 69.9% and 65.2% games, the best among ESPN experts, for the last two NFL seasons, respectively. Here are the overall prediction accuracy records of each expert in the last two season, with more details here.

Picks Allen Golic Hoge Jaworski Mortensen
2013 60.2% 63.3% 66.8% 65.6% 69.5%
2012 65.0% 62.9% 63.5% 64.6% 60.5%
Picks Schefter Schlereth Wickersham Jackson Johnson
2013 62.5% 64.5% 69.9% 62.9% 60.2%
2012 61.7% 65.2% 65.2% N/A N/A
Picks Ditka Carter Accuscore Pick’em
2013 64.8% 66.0% 64.1% 65.9%
2012 N/A N/A 68.0% 68.0%

Adam Schefter has the worst prediction average among the ones who made picks for the last two seasons, and Keyshawn Johnson was the worst for the last season.

Chris Mortensen‘s results are the most curious ones, winner of the most improved expert award in 2013. He did really well for the 2013 season, but his predictions was worst of the worst for 2012 (large variability?).  Let’s see if he keeps it up this year 🙂

Some additional background information: Accuscore is based on simulations (algorithms and data) by accuscore.com and Pick’em is the average of all predictions by NFL fans who submitted their picks on ESPN.com before the game (kind of a “crowd prediction” by non-experts).

Unlike predictions used in the last two years, the ESPN expert pick page shows that Accuscore prediction is no longer included this year. I wish ESPN still includes this algorithm (statistics) based prediction in this prediction game.

We also had fun of comparing expert picksalgorithmic prediction and crowd prediction of the 2011-2012 season.

For this year, more experts, more fun! Now, let the game start! Are you ready for the football (and those experts)?

The Crimson Tide and the Buckeyes are ranked the first and second in the Associate Press preseason poll of 2013 college football season. Alabama is  going for its third consecutive national championship, which has not happened before. Meanwhile, the Ohio State has never posted consecutive undefeated/untied seasons. Are they going to defeat the odds?
Prof. Mark Berliner and Prof. Bill Notz share their thoughts in Rob Oller commentary: Let’s crunch numbers for college football published in Columbus Dispatch today.

The Crimson Tide is going for its third consecutive national championship. The last time a team went back-to-back-to-back during the modern poll era (1936-present) was never. [……]

“The law of averages doesn’t mean something becomes more likely as time goes on,” said Mark Berliner, a professor in the Department of Statistics at Ohio State. “Just because you’ve never seen three (national titles) in a row doesn’t make it more probable. In my mind it becomes even less likely. It’s an indication of how hard it is to do.” [……]

Ohio State, meanwhile, has never posted consecutive undefeated/untied seasons. The Buckeyes finished 12-0 last year. Are they due for another round of perfection? It might help if the football program was birthed 10 million years ago.

“If an event has some positive probability of occurring, if given enough opportunities, eventually it will occur,” said Bill Notz, who also works in the OSU Department of Statistics. “So it’s possible, but the chance is very small.”

My thoughts: Alabama winning the third time and Ohio State going undefeated both have positive probability to happen and one of them might indeed happen. However, they are not going to happen together for sure 🙂

Go Bucks!

Old Statistics Books

Thanks to retirement of our faculty members, a large collection of old books on Statistics appeared in the lunch room.

All right, can you guess what is the oldest book I found among them? Of course it is called “Mathematical Statistics”, which is written by Henry Lewis Rietz (Professor of Mathematics, The University of Iowa).

How about we make another guess: when was this book published? It was 19xy, where x is one more than the first number and y is two fewer than second number. The price of the book was marked on the front page as $2.00 at the time.

I was curious about how Doug Wolfe, the former department chair and the previous owner of the book, got this book into his collection. He told me that he picked up the book from in a used book section of a bookstore when he was a graduate student at the University of Iowa. OH! This was a long time ago 🙂

Due to the respect of history, I read the whole book. It is puzzling to me that the book covers topics that are so similar with most “modern mathematical statistics” books. The title of each chapter reads like:

  • I. THE NATURE OF THE PROBLEMS AND UNDERLYING CONCEPTS OF MATHEMATICAL STATISTICS
  • II. RELATIVE FREQUENCIES IN SIMPLE SAMPLING
  • III. FREQUENCY FUNCTION OF ONE VARIABLE
  • IV. CORRELATION
  • V. ON RANDOM SAMPLING FLUCTUATIONS
  • VI. THE LEXIS THEORY
  • VII. A DEVELOPMENT OF GRAM-CHARLIER SERIES

The first five chapters are pretty much like what we currently teach in MathStat classes these days, and the last two chapters reflects the emphasis of the field at the time. One one side, we can think that we are teaching very old stuffs in our classes right now. On the other side, it also shows that these concepts are fundamental and long-lasting, just like the concept of integral for calculus.

From education point of view, delivering these concepts to public and making people used to thinking in these frameworks are good contributions from statisticians.

From research point of view, the field has evolved so much with recent dramatic explosion of data collection and computation power. Where does the new math come to help us in this data age? Law of large numbers, asymptotic upper bounds? or just building deeper and deeper networks (learning)?

Anyway, if we are tired of thinking, we may put the discussion aside and enjoy some neat book cover designs:

Introduction to Statistical Analysis by Wilfrid J. Dixon and Frank J. Massey Jr. (1951)

A Sampler on Sampling by Bill Williams (1978)

Applied Statistics, Principles and Examples by D.R. Cox and E.J. Snell (1981)

By the way, in case that you are curious about how much the book “Mathematical Statistics” by Professor Henry Lewis Rietz worth now, you may find it on Amazon.com. When I checked, it was being sold at

Continue Reading »

The latest issue of ISCA Bulletin published my interview: A conversation with Professor Bin Yu. It is quite long, but informative. Here I picked out some short paragraphs based on my personal bias.

[Before College]

A math book from a cousin gave me my first boost into math when I was in 3rd and 4th grade. I enjoyed taking exponentials and logarithms using a table in the book.  I  believe doing the math problems provided a refuge of certainty and safety for me during a very turmoil time in China.

Another big boost in my interest in mathematics occurred when I was in the Lab School of Normal University in Harbin.  There I had a wonderful and extremely talented sub math teacher, Jianye  Chen (陈建业) in my second year in junior high. [……] Under his strong influence and, in some sense, fulfilling his unrealized dream of going to the math department at Peking University, I chose to do math at Peking University after receiving a very good score on the national college entrance examination in 1980.

[PKU]

The first math analysis discussion class was hard for me since I didn’t know how to do the problems. But you know, I really liked math and we had good professors. We didn’t interact a lot with the professors, because that was not the norm.

In the entrance exam to graduate school in Peking University, I came first in the math subject exams. However, the professor I wanted to work with did not take me after the oral exam. So I switched into Probability and Statistics, although I originally wanted to do Functional Analysis. That was actually a very good move, a forced one, but it has benefited me tremendously.

[Qualify Exam at Berkeley]

Shi: Is it the same format as we took it? 10 questions?

Yu: Yes. If you do three, I think, you pass.

[Marriage]

In the summer of 1987, I went back to China and got married to my boyfriend who went to graduate school in China in 1985 in architectural history. He was able to join me a year later in Berkeley and went to Berkeley’s School of Architecture. My American friends were a bit shocked to hear that I married someone that I hadn’t seen for two years. It was a bit risky, but looking back, it was the best decision in my life.

[Suggestion for Young Researchers]

So I would say to junior people who just started their career: take more risks, instead of being more careful. If you work in a very desirable field like Statistics, you could not go too wrong. Ultimately, whether you enjoy your life or not is because whether you are happy, not because you make the system happy. And the system actually becomes happy because you are happy.

[Current Status of Statistics]

I think we are in a golden area for Statistics as an intellectual field. But this field has to be broadly interpreted. Basically a lot of people trained in other fields are also doing this type of work we do.

I think if we rise up to the challenge, we will be the leading data scientists. With our great traditions of critical thinking with us, at the same time, embracing machine learning, database, and computing challenges.

You take some risks, and you cannot really “fail” too much. You have a safe net. You have a Ph.D. in Statistics. How wrong could it go, right?

[Statistics in China]

Shi: By talking with people in China, I do feel industry, especially the high-tech companies, has a huge need for people who can analyze their growing volume of data. Meanwhile, in more scientific area like Biology and Physics, they do have the same need to find people who can work with them in designing and analyzing their experiments and do better science. Is there anything universities in China can do to help foster this type of collaboration?

Yu: I think it is kind of happening already. Peking University is talking about a data science center. You have to have cross discipline centers. Any culture change is going to be a slow process. But when there is a need, especially for economic reasons, things just happen in the end. The statistics majors in China, and here too, have to get on top of computing. At senior level, it is easy to find collaborators because you have ideas and a record. If you are a beginner and you cannot even touch the data, who’s going to hire a statistics undergraduate to give advice to a CS undergraduate? It is a constant struggle that we should keep up with computing training of our students. Eventually I hope we will be just as good as computer science majors. That would be the goal, then we will have both the critical thinking and computing skills. I’m not worried about the mathematical part as much not because it is not important. We have been giving our students that, so it is not the urgent need.  The weaker point is the cross-field critical thinking and computing for statistics students.

[Statistics and Data Science]

Yu: [……] Lots of people think of statistics as counting numbers, but they don’t know all the exciting things we do. That’s a misconception. Either we go all the way out as a community to change it, which is an uphill battle, or we just embrace data science. Just start saying that we do data science. It is psychology. This is a personal opinion, not representing the view of IMS. I’m just wondering and I think it is a discussion worth having because of the popular unfavorable misconception of statistics.

Shi: Yes. I have colleagues who seldom read the Annuals of Statistics. They think the journal mainly concerns about theoretical results and mainly about asymptotic, but they are not.

Yu: It is a dilemma in China. Statistics (统计) is 一级学科. Data science is not one of the 学科 yet. But in certain occasions, we can say that we do data science. We are statisticians and we do data science. At least we should go that far.

[Statistics and Critical Thinking]

Yu: That’s a gradual process. As I feel being the chair is confronting different opinions. As you said, you cannot form critical thinking without people counter you, even just playing the devil’s advocate. If it is all “great”, it is not critical thinking. Critical thinking is not the most natural thing in the Chinese culture because we tend to want to agree with each other, which has strength in lots of situations, but not in Science. It is something I think the western culture has an edge. In the Chinese culture, there are things called “思辨”and “承传”, but it is more about listening to others than questioning.

I’m not disapproving by critiquing, but some students might take that way. So the challenge to me is how to train those students to become critical thinkers. It is almost like they have to establish confidence first somehow.

[Data Collection and Quality in China]

Shi: I found it amazing to see on the Internet that comments about any data or any article written by Bureau of Statistics of China are usually like people don’t trust any of them. It seems don’t matter what the report is about. When it says something is good, they don’t trust it; when it says something is bad, they don’t trust it.

Yu: Yeah, that’s a big problem you bring up that is data quality. It is not unrelated to plagiarism in doing research at every level. For statistics, if we cannot trust the data, we are done. Maybe theoretical statistics will develop further first before data analysis or data science. But companies care a lot more about good quality of data. They cannot fake their data as much because it is related with their revenue. That’s why I say industry would play a huge role in pushing the development of statistics or data science, whatever it is called, in China.

Again, the full interview can be found here: A conversation with Professor Bin Yu

Came cross a Software Advice blog post: Google+ Hangout with Thomas Davenport: The Future of Working with Data posted at Plotting Success. In the video, Thomas Davenport chatted about the future of working with data and other topics from his latest book, Keeping Up with the Quants.

Some quick take aways [with my quick comments]:

  • Analytics Is (Regularly) Creative [very true]

“Many think that analytics is very cut and dry, that it’s just a matter of letting the computer crunch through numbers and that creativity isn’t required,” he said. In fact, it’s quite the opposite–Davenport argued that creativity is important throughout the entire process of analytical thinking, particularly in the first and last stages

  • The Perfect Analytical Correlation: Great Companies Have Great Analysts [might be true]

“I do hear predictions all the time that we’re creating a ‘data-scientist-in-a-box’ or even going back as far as the mid-90s, people were talking about data mining replacing a data analyst,” he noted. But he doesn’t envision a day in our near future where machines will be able to replicate humans’ ability to tell “data narratives.”

  • Hire “Ph.Ds with Personality” [hard to find]

When recruiting premier quantitative analysts, Davenport advised that organizations hire “Ph.Ds with personality,” meaning data scientists should have as much of an appetite for success in business as for research.

  • Encourage Everyone to Code [Yes, Yes!]

“You don’t have to be the world’s greatest programmer,” said Davenport. “But you should have some exposure to programming.” Davenport pointed to open source scripting languages such as Python as valuable resume builders.

I think the full video is worth watching. What do you think?

After spending the last few months in China and not able to see or post at my own blog,  this site seems dead.

For a long while the famous Great FireWall of China has been blocking access to all wordpress.com traffics. Computers in China have hard time to gain access to webpages with domain names, for example, youtube.com, facebook.com or even google.com. This is not new, but it is the first time I have a long period of time of using internet under such constraints. To be fair, there are certain ways to get around the Great Firewall, but it takes serious efforts and the connection speed would be too slow.

The effect on each person from blocking access  is not that dramatic in terms of daily life when I spent my time in China, but it does make my brain exercise less. I would like to think one would make better judgement with more useful information. From my naive user point of view, the GFW seems to be counter-productive for the development of Chinese society. If the government has the trust of their people to make the right judgement with the information, it would not spend such effort to block it from the beginning.

Anyway, my simple statistical question is how one might estimate the percentage of pages that has been blocked by the Great Firewall. First, we need a way to test if the website is block by the GFW or not. By looking my site statistics, I realize that it has been a long time that my blog has visitors from allover the world except from China. But it may just due to my content has nothing interest Chinese readers. Or maybe I was blocked.

To check if a site has been blocked by GFW seems to be an easy test, but it is hard to be done when one is outside of China. In other words, it takes extra efforts for any website outside of China to know if it reaches users in China. Fortunately, a simple google search find me a nice webpage called greatfirewallofchina.org where you can test if a site can be reached from China.

It comes handy if we can simple a selection of webpage and test there, we may have a better sense of the percentage of webpages that has been hidden from the users in China. Since GWF blocked blogs on wordpress.com, I thought it would be better if I change the domain of my site from (taoshistat.wordpress.com) to something else so it would not be blocked. I paid the due and changed it to statisticsforfun.com for a test. It is not blocked by GWF now. Cheers!

Even more surprising, a few days after I changed my domain address,I found that my older address becomes also accessible.

It turns out that other wordpress.com blogs also become accessible from China. What a pleasure and unpleasure surprise!

GFW is not blocking wordpress.com any longer, like the Larry Wasserman’s Normal Deviate on wordpress.com.

Now I have to wonder if this change on GFW has anything to do with the 10 bucks I paid or not 🙂

March Madness

The Buckeyes are among the Sweet 16 four years in a row. After the first weekend, my bracket is ranked at 6933 (with 48 points) among more than 1 million brackets on CBSsports.com and the score translates to a 99.5% on ESPN.com. The bracket actually ranked at 2129 after the first round and improved to 1470 after games on Saturday before dropping to the current place on Monday.

It feels good to make good guesses and the better feeling is that I still have all my final four teams and 7 out of 8 elite eight teams alive.

NCAA bracketLet’s go, bucks! We will see where my bracket stands after another round.

At 7pm on wednesday, 2/6/2013, the weather man is predicting a historical storm that is expected to hit the Northeastern U.S. during the weekend.  People in Boston, New York, and other northeastern areas should get ready for it. In this video when meteorologist Chad Myers talked with host Erin Burnett, he used a football play to describe the weather system.

It is a great job that Chad explained the weather (and storm) forecasting in such a simple and understandable way. At the end, Erin and Chad has an even more interesting chat on our weather foresting models:

Chad: This is 36 hours before the storm even starts. That’s why I’m so wish-washing.  I can’t say 1 to 12, because that was what the computer are. The computers are literarily for New York city could be 1 inch, mostly rain, and I’m saying. I will show you what is scary in one second. I will just hold on to the graphics here. Boston on this computer, 21 inches of snow. I think that’s a pretty good number, especially for the burghers. Look at this thing, this computer says, OMG, you know what, it is not going to warm up at all. The rain is not going to happen, it is going to be all snow, 23. So literarily we have 1 to 23 inches for snow possible for New York city. That’s why we have to wait for another computer run before we really understand what going to happen here. I hate to be wish-washing about it, but it is a big storm for somebody anyway!

Erin: But, but, … I guess I’m so confused that you seemed to me that we always knew what the weather would be, right? A few days in advance. Nowadays, every time there is a big storm, we have no idea. Could be big, could be little, what is going on? Why is this because … Climate change? I don’t know. What?

Chad (with smile): It is because over population of models and over thinking it. Now we have so many models. Look at this one, oh, look at this one. Before we had two, Elephant and GM that’s all we had. We look at one and we decide one or the other. Now we had like nine, so we don’t know which one to pick.

Erin: You know what, sometime the plenty of choices is not so great.

Chad: That’s right!

We know less and we are less certrain about our prediction because of “over population of models and over thinking”. NICE!

Watch out  for the storm of computation power, good or bad.