JCL Blog

I'll Take My Data Just Right

The amusing thing about the trade press, any journalists actually, is they are all looking for the next scoop.  They want something new to say that no one else has said before so they can stand out from the crowd.  Strangely, these new things usually turn out to be just slight variations on the things everyone else has said -- therefore propelling the reader and the industry further in the direction they were already going.  Until all of the sudden, someone breaks away from the lemmings and sends the herd back from whence they came.  

This is how we get a string of economy is improving stories, each with a unique spin, and then all of the sudden an economy is not improving story hits, sticks, and sends everyone back the other direction.

We are seeing this right now in the data and analytics field.  For the last 5 years it has been all about Big Data.  I challenge anyone in tech to get through a day, even at this late date, without someone saying something about how amazing Big Data is and how Big Data is going to change everything.  Yes, it is nice to know that someone out there is collecting all of the data about everything (insert your favorite joke about a three letter agency here), and people are finding new and better ways to put that big data to work.

Mid last year however, the articles started appearing about Small Data, and how it was going to change everything.  By this summer it will all be about Small Data.  The articles are going to say that the inustry is going back to Small Data because not black is white, not up is down, not east is west, and not Big Data is Small Data.  I propose that Small Data is not what came before Big Data.  Small Data is some other color, some other axis, and some other point on the compass that we have not seen before.  So to make it just a bit more natural, because in nature, small almost never comes after big, let's call this next new thing Just Right Data.

This is what I mean by Just Right Data: 

  1. The Data I Care About:  Clearly, getting just the right data is what Goldilocks was thinking about when she said "just right".  Big data is awesome because it means that all of the data is being collected (instead of sampled, here is my post about sampling from 2012), making it possible for me to get all of the data I care about.  
  2. Properly Adjusted:  Each of the data points are not of equal value.  The ones that mean more to my analysis should be amplified.  In some cases the most recent data points are more valuable, in some cases clusters of data points are more valuable.  
  3. Action Enabling:  We cannot lose track of the reason we analyze data -- to make better decisions.  We do not analyze data to create cool looking graphics.  We analyze data to enable better decision making.  Timing is the biggest part of this, but noise is also important.  No use getting great analysis after it is too late to use it, or mixed in with so much other stuff that it is impossible to absorb.

To Illustrate, here is an example from the channel marketing industry:

Let's say we have 100,000 channel partners enrolled in our channel partner program.  We have their profiles, their certifications, their competencies and a bunch of other pre-big data stuff.  We add in the amount of sales they generated for us last year, another pre-big data element.  Now we add in the big data stuff:  every lead we have ever sent to every parter, the outcome of every lead, who from our company has worked with them, everything we know about each employee that works for each of the partners and their history, how much revenue was generated from each sale of each partner, each customer from each sale, and when each of these events happened.  Big data is indeed named accurately.

Now in comes a new lead and my Just Right Data experience begins.  At the start I get just the data I want to analyze (just the 5 partners that are in the right location, and that have achieved sufficient status for example) which is pretty much a pre-big data thing.  And I get all of the big data stuff that is relevant to the those partners.  This is the data I care about.

Now I rank the partners by their relative status to the others, or the status of other leads already delivered, or the fine points of capabilities or personnel ratings.  This is the data adjusted.

Now the lead hand off to the selected partner (hopefully algorithmically selected, but manually works too) happens and it must happen before the lead expires.  As we know from being customer focused and customers ourselves, leads are perishable and must be acted upon in a timely manner.  This is action enabling.

Thanks for staying awake to the end, (unlike Goldilocks).  And thanks to the Big Data people who have set the stage for us to do Data Just Right.



IBM Gets It

It seems that just about every week I see something that reinforces how IBM is way out front in the customer centric-ness of big data.  Here is a great video they posted on YouTube showing what they are talking about when they say Smarter Marketing:

If you want a bit more of the IBM Smarter Marketing juice, they have a whole bunch of great content on this web site:IBM Smarter Planet: Marketing

Three Big Data Articles today

There are several good articles in the NY Times Sunday Business section today that serve to illustrate the coming world of Big Data.  

30% of customers opt in to driver monitoring.  This is Facebook meets car insurance.  I am amazed that this many people willingly subject themselves to this kind of monitoring.  Here is my post about how insurance companies have detached themselves from the basic concept of insurance.  In short, insurance companies are increasingly able to exit the insurance business.  They have always wanted to collect premiums, and not pay claims --- now they can do it.

Building snow skis from skier's DNA.  For $1,750 you can get custom skis made to your skiing DNA (not your biological DNA thank goodness).  It would be very interesting to know how unique the 1,000 pairs of skis this guy made last year are.  I would not be surprised if they all boil down to a dozen or less basic designs.  This kind of short run (run of 1 in this case) manufacturing brings to light IP that is actually protectable - the design process and the distribution of actual designs.  Very interesting.

Dr. Langer's Lab at MIT succeeds at tech transfer.  This one is a bit more of a stretch, but any new medical product involves a mountain of testing data and data proficiency and the cross over from one product to the next is indeed changing very fast due to better data management techniques.

Happy reading.

How Airlines Use Big Data

I cannot remember the last time I was on a plane with a noticeable amount of empty seats.  I also have not seen overbooked planes and crews working to buy back seats.  I also have been impressed with the on time performance of planes I have been flying on.  If you are interested in this kind of thing, there is a great web site tracking this (in the US anyway) and it turns out the number support my experience.  Load factor up, on time performance up, and guess what else - prices are up too.

There was a good article in the NY Times today about how Delta is doing this -- with better data management  There is so much hype about big data but this is a good reminder that through better data management practices -- everyone can win.  Unless you were counting on a few empty seats around you on your next flight.

Interesting Trends

We have arrived at that season where the list of predictions for 2013 will start to pile up.  I find them interesting reading, but I have not felt that I have much to add to the pile, so mostly I just read instead of making a list of my own.  

This year however, I will be taking note of a handful of trends that seem to capture my interest.  I see tend to read articles about these things and I just might have some thoughts congealing into a theory that brings them together.  

For now though, just the trends:


  1. Education:  One of our greatest exports to the rest of the world is educated people from our university system.  Why do we get that right and K-12 is seeming to fall farther and farther behind?
  2. Big Data:  Technically big data is just a lot of data.  Specifically, it is the ability for systems to capture and save everything.  Before big data we used to keep track of the closing price of a stock, then we stored the closing price and the high and low price for the day, big data is storing every single trade, who made the trades, their sequence…
  3. Internet of Things:  There are between one and two billion people connected to the Internet.  Devices and sensors are being added to the network by the billions and probably already outnumber the people.  Soon the number of connected machines will dwarf people and the Internet will change significantly.
  4. Vendor Relationship Management:  The relationship between the makers of things and their customers has been mostly one way and managed by the manufacturer, with CRM systems.  This relationship dynamic has been evolving through 1:1 marketing to an inversion of CRM where the customer is in charge and the vendor is managed.  The Berkman Center at Harvard is defining a new industry called Vendor Relationship Management (VRM).
  5. Digital Divide:  The people at the top of the economic ladder will advance ahead of the rest in earning capacity, lifespan, leisure time, and as a result will desire many new services. Those not at the top will have to serve the others or live off of charity or government assistance.  The gulf between the haves and the have nots is getting bigger in our country and around the world.  Right now the unemployment rate for white college graduates in the US is 4%.  Other social classes or ethnicities are much worse -- some over 25%.  It is hard to think about things getting even worse.

Of course the current year always feels like the one that is moving faster than ever before and 2013 will certainly feel the speediest ever.  In this context, and considering this list, it will be an interesting exercise to do the Gretsky thing and skate to where the puck is going to be.  It will be even more interesting to take a shot -- because the other famous Gretsky quote is: "You miss 100% of the shots you don't take."



Big Data in Big Companies

We work with big technology companies.  If there is anyone that is really doing Big Data, I would think it would be big technology companies.  After all, they believe in technology, have plenty of computing horsepower, and have people that have the necessary skills to do it.  

The reality is quite the opposite however.  Most of the time we are working to overcome very simple problems like duplicates or obviously incorrect entries.  The real data industry came up with ways to deal with these problems decades ago.  Nevertheless, our clients have such low confidence in their data that they often retain us to start over.

Here are a few of the things we see preventing big companies from truly using Big Data:

  1. Legal Departments:  The legal department does not play to win, they play to not lose.  They would much rather prevent the collection of data than otherwise.  After all, a company that has not collected any data does not have to worry about losing data in a breach and then getting sued. 
  2. Poor Planning:  Good data handling takes time and effort.  Data initiatives invariably take longer than a quarter to implement, and longer than that to produce returns.  Almost all companies are looking to hit the number this quarter.  
  3. Internal Competition:  Competition between departments can cause them to hoard data (at best) or go underground with their data (at worst), creating silos of data that is riddled with duplicates and innacuracies.
  4. Turnover:  The people in charge of these data initiatives have their eyes on bigger and more important (more visible) jobs -- so they change often.  The person taking over the job is just as uninterested in long term data health, so the problems go unaddressed.

As with many promising technologies, Big Data's biggest challenge is not in the technology but in the way people work together inside companies.  There are enormous gains to be made by the companies that realize what can be done with these new tools and organize themselves in such a way to take advantage of it.

My Version of Big Data

There is an article in the NY Times today about big data by Steve Lohr.  It has all of the parts of a newspaper article including a headline, quotes from experts, references to other articles... butI have read it twice and I can't find any actual description of what big data is.  And the headline says it is "How Big Data Became So Big".

Yes, everyone is into Big Data these days and it is getting bigger every day -- but what is it?

Wikipedia says:  "a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage,[4] search, sharing, analysis,[5] and visualization."

No so very helpful.  Aren't definitions not supposed to reference themselves? Yes indeed, big data is, well big data.

Network World quotes AWS:  "Any amount of data that's too big to be handled by one computer."


Here is my definition: Big Data is the complete set of all information associated with a topic or subject.

Here is why I think this is interesting:  the data world is a completely different place when you have all of the information.  When I say ALL I mean every single thing you have ever purchased at a grocery store, every single trade on the stock market, every single temperature reading at a weather station... you know:  ALL.

Until very recently, it has not been possible to put all of the data into one database and analyze it, so we have always sampled data.  Sampled is like polling.  A small amount of data is captured and then broad generalizations are made.  In some cases the broad generalizations turn out to be somewhat accurate.  People who buy butter also buy bread.  

People buying butter is completely different than when you are going to next buy butter.  And that is why big data is a big deal.

We know that 100,000 cars per day drive over the HWY 520 bridge, but that does not say when you are going to drive over it next.

The thing that I find so amazing about the article in today's paper is that the reference to artificial intelligence really waters down the whole movement.  It sounds like these awesome computer scientists have figure out how to take data sets that used to be too big to analyze and have figured out how to generalize things about them.  Why would you ever want to do that?  The benefit in building a space ship is in the going to space, not in building a better space ride at the park!  We already generalize -- by polling.

Here are a few cool things I think could happen with big data:

  1. My personal dataset:  An ever growing database of everything I do, that I can analyze however I want.  All of my friends, activities, purchases, pictures, work output, healthcare, even my emotions... all in a format that I can use to figure things out.  I could figure out what activities lead me to do healthy things.  Sounds goofy I know, but my happiness could be mapped against the things I did or the stuff I bought.  Who knows what I could learn.
  2. My next hire:  What if LinkedIn could give me a list of the top 10 people I should hire.  Not people that matched job descriptions I posted, but analyze all of my employees, my competition, and all of the millions of people in LinkedIn -- and help me target the people that will change my business the most.
  3. My next vacation:  Take all of my travel history, every book I have read, my business travel schedule, my kids interests (their books and experiences), and put that all together and give me a top ten list of places to go and maybe even which of my friends to invite.

Here are a few not so cool things that could happen with big data:

  1. My insurance gets cancelled right before I get diagnosed with something terrible.
  2. I get audited every year by a fully automated IRS.
  3. Telemarketers figure out what to say to keep me on the phone longer.

All up, I am a believer in big data -- no matter how everyone else defines it --  and I think it is going to be a great next ten years.