Generally speaking, data analysis refers to the process of mining, inspecting, cleansing, and modeling data in order to reveal useful insights and information that would have otherwise been unobtainable. Most organizations that make analysis an integral part of their day-to-day operations use the conclusions suggested by the data to make informed business decisions. For this reason, it is imperative that the whole analysis process is done right. But, what does “right” mean?
The whole purpose of reporting is to reveal accurate insights that can then be used to answer pressing questions, test hypotheses, and disprove theories. If your data analysis process is faulty, so too will your data be. Unfortunately, this process looks different for most people, and depending on the person doing the analyzing, it can encompass a variety of techniques and approaches. But which technique is correct?
For the purpose of this post, we’re going to focus on the three main data theories:
- Exploratory Data Analysis
- Confirmatory Data Analysis
- Grounded Theory
While no one technique is categorically “better” than others, there are some best practices that each theory follows. Your organization can use these as a guide to ensure that you really make the most of your analysis and reporting efforts.
Exploratory Data Analysis
Before we delve too deeply into John Tukey’s Theory of Exploratory Data Analysis (EDA), it’s important to note that exploratory data analysis and confirmatory data analysis complement each other, and it would not make sense to perform one method without performing the other. That said, during this stage of analysis, you would evaluate your data as a whole and look for clues and patterns, much like a detective would look at all of the evidence available to her and try to make sense of it. You may establish questions to ask, how you’re going to frame them, and determine the best way to manipulate the information to draw out important insights.
As its name suggests, you’re exploring the information at hand for clues that hint at a bigger meaning. You’re also using visual representations, like dashboards, to tease out patterns from the data. These could include trends, unexpected results, and deviations from the model. What you find during this phase of EDA will help you establish the right questions to ask and, more importantly, what areas of data deserve further exploration.
Read next: Data in Context
EDA relies on a number of techniques in order to get to that point of, “What’s next?” A typical journey might include the following pit stops:
- Establishing a data “structure,” as you can’t explore something that is not founded.
- Establishing key variables.
- Looking for errors, missing data, and anomalies.
- Checking assumptions (assuming you made some already), and testing hypotheses.
- Establishing confidence intervals and margins of error.
- Estimating parameters.
- Determining how best to explain the data with the fewest possible predictor variables.
This last step is important, and it’s where CDA comes in.
Confirmatory Data Analysis
Next up: What is Confirmatory Data Analysis? Think of Confirmatory Data Analysis (CDA) as you would the discovery phase of a trial. The detective can put in all the hard work she wants to come up with her own hypotheses based on the information she has at hand, but she won’t make anything stick if she can’t prove each piece of evidence that she has to be accurate. Data works in much the same way.
In order for data sets to be accurate, each piece of information within them must also be accurate. CDA focuses on utilizing traditional statistical tools such as confidence, inference, and significance to evaluate the data and challenge any assumptions you made during EDA. Not only are you looking for bad data during this stage, but you are also looking for answers as to why anomalies present themselves as anomalies, and if it is possible that any deviation from the norm was just a coincidence.
CDA relies on the following techniques to be successful:
- Regression analysis.
- Variation analysis.
- Testing hypotheses.
- Developing estimates that adhere to a certain level of exactness.
Grounded Theory of Analysis
Grounded theory is a slightly more eccentric approach to data analysis in that it involves the collection and analysis of information at the same time. In utilizing this approach, analysts may hope to reveal insights as they gather more and more data. In order for this to happen, the data is analyzed from the moment data mining begins and continues until the research ends. Only once data miners believe that they have collected sufficient evidence will they move on to the next state: building a report, or, in your case, a dashboard.
Once a satisfying amount of information has been collected, the second half of the Grounded Theory approach begins. The second stage looks something like this:
- An exhaustive analysis of the text is performed.
- The text is indexed and explored for relevancy.
- These topics are labeled according to their relevancy.
- Eventually, a series of concepts or trends are revealed, which start to explain the singularities that emerge from the study.
This process repeats itself as more data is gathered and analyzed. Along the way, certain concepts or anomalies will be removed or added, depending on how they fit in with the overall picture. If too many changes are made, it may cause the analyst to rethink their initial hypotheses and to change it entirely.
As you may have already gathered, the grounded theory approach is not all that reliable. For our purposes at iDashboards, we prefer to use the EDA approach, with a little bit of help from CDA.
Why We Prefer the EDA Approach
In our experience, Grounded Theory is too unpredictable, and while CDA undoubtedly has its place, it would require far too much time to perform for businesses that have mass amounts of data, as most mid- to large-sized corporations do. When you’re dealing with enormous data pools, you need to be able to look at whole data sets and identify patterns, trends, and insights in real time. If it’s not in real time, you risk making key business decisions off of dated information, which is almost the same as relying on bad data.
As we mentioned above, EDA and CDA go hand in hand, and once you build your charts and graphs, or your “case,” so to speak, you need to be able to prove why the information and metrics you chose to cite is relevant and accurate. If this seems like a daunting task to you, get on board. Fortunately, dashboards can take the legwork out of the sleuthing.
Dashboards can give you a platform on which you can visualize and interrogate results (EDA), and that can also debrief new information as it flows in to automatically update your model and results to present you with the most accurate and real-time information available. With Dashboards, you can effectively cover all your bases of mining, presenting, and challenging your evidence so that you can reach genuinely insightful conclusions.
Barriers to Data Analysis
Of course, there are some barriers that you must overcome. In my most recent post, How to Spot and Stop Bad Data, we provide tips for how to negate the most common data reporting mistakes, which include:
- Confusing fact with opinion
- Approaching data reporting with confirmation bias
- Approaching data without a higher purpose
- Failing to appoint a data steward
Though common, these barriers aren’t so difficult to overcome. By identifying trustworthy sources (consider where the information came from, where it was found, and who funded the research), asking the right questions before you begin your detective work, neutralizing biases so that you’re not intentionally seeking out information that supports a certain theory, belief, or action, and appointing a data steward who can ensure the quality and credibility of your data, you can overcome these barriers and, as a bonus, knock out half of the requisite CDA detective work.
Of course, you don’t have to do all the hard work on your own. Dashboards were created with EDA in mind and can help you pool large amounts of data from multiple sources to build compelling, accurate, and insightful visuals that can lead to more informed decision-making.