Data Discovery

**Sample Data - Country Car Sales**

**Sample Data - Customer Response**

**Analytics**** means set of techniques, which give Structure to large amount of Information for actionable Insights**. Structure is given by Charts, Graphs, Aggregations, Groupings, Inferences, Combinations, Trends etc. and Insights by showing What happened, Why it happened, What can happen and What should be done for that.

Usually we encounter massive amount of data in real world, which in its raw form is simply not palatable for any analysis. We need to give Structure to them first and most importantly DISCOVER vital information, not only are they extremely useful by themselves, but also prepare us for more advanced Analytics. We would apply **Descriptive Analytics** as one of the means to **Discover** data in this section.

Please download the **Sample Data** and use the **Self-service Tool** to get vital clues from raw data. **Descriptive Analytics refers to Numerical and Graphical Techniques**. We would use **Summary Statistics** and **Query** as two Numerical Techniques under the Descriptive Analytics option.

Please use **Summary Statistics Option**, select any variable (Numerical) and Statistics of your choice (Mean, Median, Variance etc.). This would summarize huge amount of data, by means of handful of numerical measures.

Please use **Query Option**, to filter out subset of data based on certain conditions. Use the Drop Down menu to generate subset of data.

The options can be read as a language -

* *

* *Select __Fields__ (Var1, Var2, ....) where __Field__ (Var3, Var4, ....) meets __Condition__ (>, <, =, .....) Value (Number or Category).

Hence we can query "Age, Income, Asset, House" where "Asset" > 10000

In fact it is indeed a language, and a de facto standard for query, called "**Structured Query Language (SQL)"**. If you have not heard of this term, better get familiarized with this, as it is one of the most important tool in Data Discovery.

We can use **Histogram, Pie Chart, Scatter Plot, Bubble Chart** etc. for **Graphical Analytics**. Histogram and Pie Chart are Univariate Plots, which are used for Continuous and Categorical data respectively. The Histogram and Pie Chart are shown for the sample data for Assets and Response in the following pictures.

Column, Scatter and Bubble Charts are Multivariate Graphical Techniques. Column Charts are used for a Numerical Measure for given Dimensions (Categories). The following Column Chart shows Average of Assets across different Response Groups. Scatter Chart shows the Income vs. Spending Pattern (or lack of pattern in this case). Please try these and other Data Discovery techniques on the other data set.