The nature of big data suggests a world of possibility, and businesses are applying information derived from big data in ways that are redefining their relationships with their customers and helping them maximize profits. But businesses are lacking a big-data language they can understand. A technology-based language doesn’t help most ‘normal,’ non-technical, business folks ‘get it.’
All businesses can or do generate huge amounts of data. Most do not have a real plan for what to do with that data. If you don’t understand what big data means, how can you apply big-data-generated information to your business? How will you know which big data tools will make a difference?
That’s where we hope to be of service. We wrap a language and a business perspective around the world of big data to help you develop big-data business strategies.
We start with some simple labels to define big data.
Big, Unstructured and Real-time
People usually define big data by three qualities: it’s big; it can come from unstructured sources; you can use it in real-time.
Big. Big data is … big. The-mind-can’t-comprehend-it big. We now generate every two days an amount of data equivalent to all the data in the Library of Congress before 2003. In just one minute, 639,800GB of global IP data are transferred:
135 botnet infections
6 new Wikipedia articles are published
20 new victims of identity theft
204 million emails sent
1300 new mobile users are added
47,000 applications are downloaded
$83,000 in sales take place on Amazon
61,141 hours of music is streamed on Pandora
100 new accounts are created on LinkedIn
3000 images are uploaded to Flickr (20 million are viewed)
320 new Twitter accounts are created (100,000 tweets are sent)
277,000 people log in to Facebook (6 million view a Facebook page)
2 million search queries are entered into Google
30 hours of video is uploaded to YouTube (1.3 million videos are viewed)
As soon as you read these numbers, they are out of date; the staggering volume of digital data we are creating is growing at a phenomenal rate.
When you have massive amounts of data coming from multiple channels and across silos, how do you use it? This is an important business question: what you need to know influences the data sets you relate to each other and the best formats for presenting information in a way you can quickly understand it so you can act on it.
Unstructured. The variety of data available to us is impressive. About ten percent of it comes from structured data sources and can be processed by conventional means. Ninety percent comes from unstructured sources, and processing it requires new technologies. As digital technologies for data-mining and text-analyzing develop, we increasingly use these data sources that aren’t well-suited to relational database formats.
Structured data can come from sources inside or outside a company, from utilities, government agencies and GPS-enabled devices to radio-frequency identification chips (RFIDs), site search and website clicks. These are primarily structured data sources.
Unstructured data comes from from videos, audio files (for example, telephone conversations, audio recordings of presentations), images, texts, tweets, Facebook posts and more.
One of the most powerful aspects of big data is its ability to accomplish the equivalent of comparing apples and oranges without having to normalize them as fruit across datasets.
Real-time. Ideally, some online transfers should take place almost instantaneously. If someone just used your credit card illegally, you hope your bank discovers that immediately, not at the end of the month. The speed at which you can distill useful information from your data and execute a decision confers agility. An agile business has the advantage of being able to respond immediately, in real-time, as opportunities present themselves. Big data technology makes this possible.
IBM, a leader in creating and providing big data technology, defines big as Volume, unstructured as Varied and Velocity as the speed at which the technology can stream data into the business so the business can use the data immediately. These are great definitions; they are just aimed at a more technologically-literate audience. Business people do not use this language, so we substitute these with more intuitive terms.
IBM includes Veracity—the accuracy of the data—as an important big data quality. While this is a critical concern, from our point of view, accuracy is an inherent problem associated with all data, not a distinguishing feature of big data.
“Big Data is really about new uses and new insights, not so much the data itself,” says Rod A. Smith, an IBM technical fellow and VP for emerging internet technologies.
Businesses require the technical expertise of data scientists to extract, manage and refine information from data sources. Technical considerations usually fall outside our purview.
We care about how business people make sense of the information big data generates and how you can apply big data information to your business decisions.