Big Data to Smart Data: How to Structure Your Analysis

Data-Insights-1.jpg

Big data is often seen the enabler of continual competitive advantage.  But it’s no big surprise that big data can be unwieldy.   The challenge organizations often face is how to create an ongoing framework to truly unlock the power of big data so that it can continually drive strategic initiatives with confidence.  To help you convert big data to smart data, here are some recommendations and guidelines to help you tweak your knowledge and industry experience to make them your own.

Find out what Big Data can answer for you

  • Identify Challenges: A good starting point for understanding how to convert big data to smart data, is first evaluating and extracting the key challenges your organization is facing.  These key challenges will create a directional focus for evaluating data assets that are relevant, and could potentially add value towards a solution to the defined problem.

  • Identify key data assets: Once you know what you are trying to solve, the next natural progression is towards identifying the data assets (internal or external) to the organization which you evaluate as the ones that might have potential to solve the problem at hand.

The key in any kind of data analysis is to identify potential relations between data, and have a cause and effect type correlation between two seemingly different assets.  An example would be an insight like “The rise in temperature by every degree, has a negative correlation with public transport usage”.  Once you have an insight like this you can drill into variables and deduce the cause, possibly the expansion of train tracks on hot days has something to do with it.

To break it down, this insight would have been derived from weather and public transport data, which are seemingly different but might have an embedded intrinsic correlation which can give you an edge that you might be trying to extract.

 

Data Lifecycle | DIKW

DIKW stands for DATA → INFORMATION → KNOWLEDGE → WISDOM, which is a pretty basic but structured way of observing the transition of the raw data to applied wisdom.  Every step in the DIKW can get quite detailed, but for the sake of high level observation we can stick to this:

Data

This is where the assemblage of data assets occurs, where you choose the potential data sets and even do high level selectivity on what might be relevant to the challenge you are trying to solve.  But you should be thinking slightly out of the box to align some correlating data sets which might not have any exact relation to the problem.  The data in the current stage is always considered raw.

 

Information

Information is the first step in making sense from the raw data.  This is basically the preliminary observations we could make from high level analysis of the data assets, and which is normally done exclusively for the data assets we have chosen for the analysis.  What we try to ultimately achieve from this stage is “Meaning”, essentially a bit more meta information beyond the basic raw data.  An example; raw data is “its day time” and information is “the sun rise is at 5am every morning making it day”.

 

Knowledge

The information becomes or qualifies as knowledge when it becomes contextual.  This is where we start going into the more actionable realm with the data we started with, or as a bare minimum start thinking about the action.  The example is “If I wake up at 5 am, I would be able to utilise most of the day and get more usable time out of my typical day”.

 

Wisdom

This is the stage of derived action, where the action is in context with the original problem and the chances of it leading to a favorable outcome are good.  The knowledge is that “Trying to utilize the day by waking up early could make life more efficient”, so the deduced action or applied wisdom could be getting into an early sleep routine to increase the likelihood of capitalizing on the additional day time because of waking early.

Another interesting thing to observe is that the raw data and the relayed information is normally a form of historical facts.  However, as we progress towards the knowledge and wisdom transitions it becomes more relevant in the context of time as well, which is the present and immediate future.

 

Data Integrity is Key

Common sense would have us ask why this step is not a pre-qualifier before diving into analysis of any sort.  The answer is simple, in the more exploratory phase where we are experimenting with different data sets and identifying potential insights, working through validating data accuracy incurs a lot of time as all we discuss here is in the context of big data.

The key is having an agile approach where data validation is visited as needed (as we are evaluating certain findings worth pursuing and deep diving into).  However, this in no way should be taken as undermining the criticality of this phase – you could have the smartest of algorithms and the best statisticians coming up with breakthroughs, which are rendered unusable if the quality of the underlying data is compromised.

 

Make it Comprehensible

Until now, the focus has been towards coming up with insights and making sense of the data to mold the derived strategic initiatives.  One of the more important aspects of your strategy should be about communication of the discovered insights, and how to make them easily consumed by the broader organization.  This would give you the leverage to align the efforts of different areas of your organization to work off the insights that have been discovered.

This is why “analysis needs the analytics” to serve as the basis for competitive advantage.  An example of a typical scenario of a digital marketing initiative is, trying to build a single view of the customer that is easily accessible and flexible enough to be used for analyzing the behavioral and attitudinal aspects of the customer profile, which can fuel the creativity of the marketing and content teams using real time and usable data and analytics capabilities.

 

Derive Actions with Awareness

At this stage, we expect to have proven confidence around the knowledge and wisdom generation capabilities of the data analysis structure that is in place.  However, a very vital element or the secret sauce to the entire strategy is adding the flavor of your industry specific expertise in conjunction with the learnt knowledge from analyzing the developed data assets.

As mentioned in the opening statement that big data is the enabler that is very powerful and can create tremendous value when applied in close conjunction with the understanding of the specific industry.  This ensures that the direction of the initiative never loses context in which the power of big data is unleashed.

Data is just a heap of information that can be leveraged to extract a lot of value, but it is purposeless without the right leadership and industry experience directing the efforts.  Just like “a loaded M16 without a trained Marine to pull the trigger” (quoted from Wolf of Wall Street).

 

Test and Scale

The final assessment of the data cycle is to test the hypothesis of analytically proven knowledge, and action it in a real environment.  There are different testing methodologies, like designing comparative AB tests and then concluding the results.  But no matter how you test, the first few tests should be on relatively smaller sets, or more containable environments, so that you can have some risk mitigation built into your strategy and to give you the flexibility to revisit any of the previous stages as necessary.  If the results continue to come as favorably expected then the scale can be increased in subsequent phases.

If you achieve a favorable outcome in the preliminary tests in the real environment, it will build a sense of validity and confidence in your concluded hypothesis that was originally derived from your analysis cycle.

 

Conclusion

DIKW applied in the right agile way could lead you to generate a lot of expertise from correlations or internal data mining and transition your big data to smart data.  This process can contribute invaluably to providing a deeper understanding and formulating your strategic initiatives.

My key learning from following the DIKW methodology is not to waste time at the beginning trying to get the data well organized and validated.  Instead look for preliminary patterns and identify the correlations that support your organizations strategic goals, and the data sources that add the most value.  Once you have identified the data sources that provide the most value, then invest the time in integrating and validating these assets.  I hope you find this article useful in turning your big data to smart data.  Subscribe to our e-newsletter for more technical articles and updates delivered directly to your inbox.


Next Steps

If you have any questions or would like PMsquare to provide guidance and support for your analytics solution, contact us today.

Pankit Dhawan