JCL Blog

My Version of Big Data

There is an article in the NY Times today about big data by Steve Lohr.  It has all of the parts of a newspaper article including a headline, quotes from experts, references to other articles... butI have read it twice and I can't find any actual description of what big data is.  And the headline says it is "How Big Data Became So Big".

Yes, everyone is into Big Data these days and it is getting bigger every day -- but what is it?

Wikipedia says:  "a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage,[4] search, sharing, analysis,[5] and visualization."

No so very helpful.  Aren't definitions not supposed to reference themselves? Yes indeed, big data is, well big data.

Network World quotes AWS:  "Any amount of data that's too big to be handled by one computer."


Here is my definition: Big Data is the complete set of all information associated with a topic or subject.

Here is why I think this is interesting:  the data world is a completely different place when you have all of the information.  When I say ALL I mean every single thing you have ever purchased at a grocery store, every single trade on the stock market, every single temperature reading at a weather station... you know:  ALL.

Until very recently, it has not been possible to put all of the data into one database and analyze it, so we have always sampled data.  Sampled is like polling.  A small amount of data is captured and then broad generalizations are made.  In some cases the broad generalizations turn out to be somewhat accurate.  People who buy butter also buy bread.  

People buying butter is completely different than when you are going to next buy butter.  And that is why big data is a big deal.

We know that 100,000 cars per day drive over the HWY 520 bridge, but that does not say when you are going to drive over it next.

The thing that I find so amazing about the article in today's paper is that the reference to artificial intelligence really waters down the whole movement.  It sounds like these awesome computer scientists have figure out how to take data sets that used to be too big to analyze and have figured out how to generalize things about them.  Why would you ever want to do that?  The benefit in building a space ship is in the going to space, not in building a better space ride at the park!  We already generalize -- by polling.

Here are a few cool things I think could happen with big data:

  1. My personal dataset:  An ever growing database of everything I do, that I can analyze however I want.  All of my friends, activities, purchases, pictures, work output, healthcare, even my emotions... all in a format that I can use to figure things out.  I could figure out what activities lead me to do healthy things.  Sounds goofy I know, but my happiness could be mapped against the things I did or the stuff I bought.  Who knows what I could learn.
  2. My next hire:  What if LinkedIn could give me a list of the top 10 people I should hire.  Not people that matched job descriptions I posted, but analyze all of my employees, my competition, and all of the millions of people in LinkedIn -- and help me target the people that will change my business the most.
  3. My next vacation:  Take all of my travel history, every book I have read, my business travel schedule, my kids interests (their books and experiences), and put that all together and give me a top ten list of places to go and maybe even which of my friends to invite.

Here are a few not so cool things that could happen with big data:

  1. My insurance gets cancelled right before I get diagnosed with something terrible.
  2. I get audited every year by a fully automated IRS.
  3. Telemarketers figure out what to say to keep me on the phone longer.

All up, I am a believer in big data -- no matter how everyone else defines it --  and I think it is going to be a great next ten years.