How is Big Data Different From Previous Data?

Comparing apples to orangesThe three qualities that distinguish big data from all previous information-producing methods—big, unstructured and real-time —suggest a world of possibility. Businesses are creating and applying big data solutions in unprecedented ways that not only help them maximize profits, but also redefine their relationships with their customers.

Why get so excited? Is big data really so different from the kind of large-scale data processing that came before?

Sampling versus All the Data

Until recently, businesses analyzed massive amounts of data using statistical sampling techniques. This produced subsets of data the business could then analyze to infer information and make predictions based on those results. With today’s big data toolset, companies can analyze massive volumes of data all at once.

Big data not only puts businesses in the attractive position of being able to use massive quantities of data without sampling, it renders previous information structures impotent and requires new perspectives and technologies for turning data into information.

Agility

More often than not, companies have collected data and analyzed it later. Today, data analysis can take place in real-time. Businesses who can respond in real-time have the huge advantage of agility. An agile organization can act immediately on information to expand the business value of their data, no matter the speed at which it comes into the business.

There is a chance any dataset could be “dirty” (contains data that is not accurate). This is not new; it has long been one of the issues associated with data collection. As a result, businesses have always needed to understand the importance of listening to the data—not just the data providing good information, but also data generating more problems than it is solving. With big data technologies, an agile business can evaluate and correct potentially costly errors quickly, in real-time.

Data Sources

Prior to the big data era, businesses were constrained to using only structured data sources. These sources generate data suitable for relational databases. With big data technology, a business can start tapping into the huge pool of unstructured data now becoming available: video files; audio files; images; texts; tweets; Facebook posts and other yet to be created.

Costs

Data processing on a massive scale used to be cost prohibitive for most businesses. Big data can now become the great equalizer for businesses large and small. The cost structure and growing availability of big data solutions make them accessible to a greater number of businesses. As the technology continues to develop, big data can offer a Global 5 million, not just a Fortune 100, solution.

Business Structures

Big data does not change how we need to apply information from new data sources to long-standing business issues, but it does give companies a far more finely-tuned edge for honing business intelligence, and this is the arena in which business efforts to apply the information make all the difference.

Because big data is new, because it requires business cultures to rethink how big data will change the way their organizations operate, businesses that want to benefit from big data solutions must have an organized way to integrate all areas of data usage and information application (e.g., conversion rate marketing, across channels and silos). To accomplish this through experimentation, creative development and application, businesses must have buy-in from higher level superiors and C-level officers.

New Possibilities

Big data creates new possibilities of how we process data to generate useful information. Previously, analytics was a way of discovering information that already fit into the framework of how companies processed what they collected. Big data changes everything.

Explaining how it views big data, IBM writes, “Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach. Until now, there was no practical way to harvest this opportunity.”

What is Big Data?

data in cloud comicThe nature of big data suggests a world of possibility, and businesses are applying information derived from big data in ways that are redefining their relationships with their customers and helping them maximize profits. But businesses are lacking a big-data language they can understand. A technology-based language doesn’t help most ‘normal,’ non-technical, business folks ‘get it.’

All businesses can or do generate huge amounts of data. Most do not have a real plan for what to do with that data. If you don’t understand what big data means, how can you apply big-data-generated information to your business? How will you know which big data tools will make a difference?

That’s where we hope to be of service. We wrap a language and a business perspective around the world of big data to help you develop big-data business strategies.

We start with some simple labels to define big data.

Big, Unstructured and Real-time

People usually define big data by three qualities: it’s big; it can come from unstructured sources; you can use it in real-time.

3-vs

Big. Big data is … big. The-mind-can’t-comprehend-it big. We now generate every two days an amount of data equivalent to all the data in the Library of Congress before 2003. In just one minute, 639,800GB of global IP data are transferred:

What-happens-Every-60-Seconds-On-The-Internet

135 botnet infections
6 new Wikipedia articles are published
20 new victims of identity theft
204 million emails sent
1300 new mobile users are added
47,000 applications are downloaded
$83,000 in sales take place on Amazon
61,141 hours of music is streamed on Pandora
100 new accounts are created on LinkedIn
3000 images are uploaded to Flickr (20 million are viewed)
320 new Twitter accounts are created (100,000 tweets are sent)
277,000 people log in to Facebook (6 million view a Facebook page)
2 million search queries are entered into Google
30 hours of video is uploaded to YouTube (1.3 million videos are viewed)

As soon as you read these numbers, they are out of date; the staggering volume of digital data we are creating is growing at a phenomenal rate.

When you have massive amounts of data coming from multiple channels and across silos, how do you use it? This is an important business question: what you need to know influences the data sets you relate to each other and the best formats for presenting information in a way you can quickly understand it so you can act on it.

Unstructured. The variety of data available to us is impressive. About ten percent of it comes from structured data sources and can be processed by conventional means. Ninety percent comes from unstructured sources, and processing it requires new technologies. As digital technologies for data-mining and text-analyzing develop, we increasingly use these data sources that aren’t well-suited to relational database formats.

Structured data can come from sources inside or outside a company, from utilities, government agencies and GPS-enabled devices to radio-frequency identification chips (RFIDs), site search and website clicks. These are primarily structured data sources.

Unstructured data comes from from videos, audio files (for example, telephone conversations, audio recordings of presentations), images, texts, tweets, Facebook posts and more.

One of the most powerful aspects of big data is its ability to accomplish the equivalent of comparing apples and oranges without having to normalize them as fruit across datasets.

Real-time. Ideally, some online transfers should take place almost instantaneously. If someone just used your credit card illegally, you hope your bank discovers that immediately, not at the end of the month. The speed at which you can distill useful information from your data and execute a decision confers agility. An agile business has the advantage of being able to respond immediately, in real-time, as opportunities present themselves. Big data technology makes this possible.

big data is big unstructured realtime

IBM, a leader in creating and providing big data technology, defines big as Volume, unstructured as Varied and Velocity as the speed at which the technology can stream data into the business so the business can use the data immediately. These are great definitions; they are just aimed at a more technologically-literate audience. Business people do not use this language, so we substitute these with more intuitive terms.

IBM includes Veracity—the accuracy of the data—as an important big data quality. While this is a critical concern, from our point of view, accuracy is an inherent problem associated with all data, not a distinguishing feature of big data.

“Big Data is really about new uses and new insights, not so much the data itself,” says Rod A. Smith, an IBM technical fellow and VP for emerging internet technologies.

Businesses require the technical expertise of data scientists to extract, manage and refine information from data sources. Technical considerations usually fall outside our purview.

We care about how business people make sense of the information big data generates and how you can apply big data information to your business decisions.