JCL Blog

I'll Take My Data Just Right

The amusing thing about the trade press, any journalists actually, is they are all looking for the next scoop.  They want something new to say that no one else has said before so they can stand out from the crowd.  Strangely, these new things usually turn out to be just slight variations on the things everyone else has said -- therefore propelling the reader and the industry further in the direction they were already going.  Until all of the sudden, someone breaks away from the lemmings and sends the herd back from whence they came.  

This is how we get a string of economy is improving stories, each with a unique spin, and then all of the sudden an economy is not improving story hits, sticks, and sends everyone back the other direction.

We are seeing this right now in the data and analytics field.  For the last 5 years it has been all about Big Data.  I challenge anyone in tech to get through a day, even at this late date, without someone saying something about how amazing Big Data is and how Big Data is going to change everything.  Yes, it is nice to know that someone out there is collecting all of the data about everything (insert your favorite joke about a three letter agency here), and people are finding new and better ways to put that big data to work.

Mid last year however, the articles started appearing about Small Data, and how it was going to change everything.  By this summer it will all be about Small Data.  The articles are going to say that the inustry is going back to Small Data because not black is white, not up is down, not east is west, and not Big Data is Small Data.  I propose that Small Data is not what came before Big Data.  Small Data is some other color, some other axis, and some other point on the compass that we have not seen before.  So to make it just a bit more natural, because in nature, small almost never comes after big, let's call this next new thing Just Right Data.

This is what I mean by Just Right Data: 

  1. The Data I Care About:  Clearly, getting just the right data is what Goldilocks was thinking about when she said "just right".  Big data is awesome because it means that all of the data is being collected (instead of sampled, here is my post about sampling from 2012), making it possible for me to get all of the data I care about.  
  2. Properly Adjusted:  Each of the data points are not of equal value.  The ones that mean more to my analysis should be amplified.  In some cases the most recent data points are more valuable, in some cases clusters of data points are more valuable.  
  3. Action Enabling:  We cannot lose track of the reason we analyze data -- to make better decisions.  We do not analyze data to create cool looking graphics.  We analyze data to enable better decision making.  Timing is the biggest part of this, but noise is also important.  No use getting great analysis after it is too late to use it, or mixed in with so much other stuff that it is impossible to absorb.

To Illustrate, here is an example from the channel marketing industry:

Let's say we have 100,000 channel partners enrolled in our channel partner program.  We have their profiles, their certifications, their competencies and a bunch of other pre-big data stuff.  We add in the amount of sales they generated for us last year, another pre-big data element.  Now we add in the big data stuff:  every lead we have ever sent to every parter, the outcome of every lead, who from our company has worked with them, everything we know about each employee that works for each of the partners and their history, how much revenue was generated from each sale of each partner, each customer from each sale, and when each of these events happened.  Big data is indeed named accurately.

Now in comes a new lead and my Just Right Data experience begins.  At the start I get just the data I want to analyze (just the 5 partners that are in the right location, and that have achieved sufficient status for example) which is pretty much a pre-big data thing.  And I get all of the big data stuff that is relevant to the those partners.  This is the data I care about.

Now I rank the partners by their relative status to the others, or the status of other leads already delivered, or the fine points of capabilities or personnel ratings.  This is the data adjusted.

Now the lead hand off to the selected partner (hopefully algorithmically selected, but manually works too) happens and it must happen before the lead expires.  As we know from being customer focused and customers ourselves, leads are perishable and must be acted upon in a timely manner.  This is action enabling.

Thanks for staying awake to the end, (unlike Goldilocks).  And thanks to the Big Data people who have set the stage for us to do Data Just Right.