Data Science is hot. Ad-tech is hot. Mar-tech is hot. Underneath each of these is data and the analytics that drive discovery. Although the popularity of the data field is new, the dimensions that support the field are not. In this post we will summarize the Five V’s of Data and some of the tools used to tease out insights.
The Five V’s are Velocity, Validity, Volume, Variability and Voom.
Nothing is new under the sun. But we now have more data than ever before and it is coming at us at an ever-increasing rate. The rate at which data and our ability to analyze it are growing, businesses of all sizes — large and small — will be using some form of data analytics to impact their business in the next five years.
Are you seeing what you think you are seeing? For many centuries, human beings believed the sun rotated around the earth. This is what people saw and had no good reason to dispute their limited data set. Then along came Copernicus, Galileo and the heliocentric theory. This rocked the boat a bit and Galileo spent the final years of his life in prison steadfastly supporting his view of a valid observation. In our data world the manner in which data is gathered is central to this question. Bias, whether intentional or unintentional, can easily distort a data set to the point where analytical results are flat out wrong. Be en garde!
Some call it data cleansing, some call it data hygiene; we call it Voom. Many moons ago the useful adage “garbage in – garbage out” was popular. It should stull be popular. Clean you data before you start working with it. It reminds me of Mark Twain’s great line regarding the press and publishing, “get your facts in order before you start manipulating them.”
In so far as data is concerned, normalize your distributions, discard outliers, verify nulls, make certain time periods are consistent, use common methods for fielding phone numbers, addresses, SIC codes patient record detail, ANSI and so forth.
It is almost impossible to overstate the volume of data coming at every area of modern culture. Our fascination with and our need to ingest digest and search for more data is not a flash in the pan of contemporary life. Data is now an integral part of daily life. We have created more data in the past two years than in the entire previous history of the human race. More significant is the growth rate. We now create 1.7 megabytes of new information each second for every human being on Earth. Within 50 years there will be 50 billion smart connected devices.
Variability. The data we generate today is highly variable. We get voice, movies, search, photos, tweets, posts, transactions, dollar values, unstructured data, distance values, maps, drive times, responses to promotions, responses to page layouts and color schemes and fonts, highly structured data, short time frame, long time frame, longitudinal data and there is no need to go on with the list. But, there is not a one size fits all formula for analyzing it.