Team Data Science

The Fix Is In

photo: JD Hancock

Big data are a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.

So say the opening sentences of the “Big data” article in Wikipedia. The people primarily responsible for conquering those challenges are data scientists. Being a data scientist these days is rather like being a Renaissance person; one must possess knowledge of and competency in a wide variety of subjects related directly and indirectly to the fields of mathematics and science.

Fortunately, data science has a number of sub-specialties to share the load. Understanding–defining–who does what (capturing, curating, storing, searching, sharing, transferring, analyzing, visualizing, processing) and why they do it means companies building data science teams can intelligently choose the areas of specialization that will best serve their goals.

Five Roles You Need on Your Big Data Team

Of course there’s the data scientist, the coveted knight in shining armor who visualizes models and creates (and continuously optimizes) sophisticated algorithms to transform data into something useful. But she could not do her part to fulfill corporate expectations without the support of equally coveted

  • Data hygienists, who deal with the “dirty data” problems inherent in collecting data so the data is clean now and stays clean in future.
  • Data explorers, who burrow into all the data a company collects to determine what, if anything, can be done with it, including how data originally collected for a different reason might be repurposed.
  • Business solution architects, who structure and organize data so it’s properly updated and where it needs to be within the necessary timeframe of every query–a critical feature of today’s data science when queries are ‘answered’ in real-time.
  • Campaign experts, who, with an in-depth understanding of both the technology and marketing, can turn the knowledge derived from data into insight and then into advice.

Assembling a powerful data science team, whether that team is internal or third-party, is necessary to applying big data tools. However, success rests as heavily in the hands of the right corporate culture as it does in the right specialized people. The best solution? Welcome reevaluation, innovation, experimentation, and keep the focus on the end game.

Scarce and growing scarcer

Being able to use the knowledge derived from data, achieving the insights to which data can lead is the centerpiece of a marketer’s requirements in the big data era. But before you can acquire knowledge, you must understand the data itself and how its patterns fit together and suggest other patterns, how to work with it to produce useful, meaningful knowledge. Enter data scientists.

Employment opportunities for data scientists are growing. They will continue to grow, and some institutions are putting educational programs in place to help meet future demand. However, projections suggestion the demand for data scientists will soon exceed their availability. A compelling graphic synthesizes the problem.

Top skill set Requirements to be a Data Scientist

Data scientists aren’t data analysts. While the two roles may start with a grounding in scientific and mathematical skills, a data scientist is far more a “Renaissance individual who really wants to learn and bring change to an organization,” says Anjul Bhambhri of IMB. About a data scientist’s skill set, Mark van Rigmenam writes,

They need to have statistical, mathematical, predictive modelling as well as business strategy skills to build the algorithms necessary to ask the right questions and find the right answers. They also need to be able to communicate their findings, orally and visually. They need to understand how the products are developed and even more important, as big data touches the privacy of consumers, they need to have a set of ethical responsibilities.

Often, related fields of study pair with a breadth of programming, managing, processing and curating skills to shape the qualities of individuals who will guide a business’s effective use of data. Rigmenam suggests an ideal data scientist would have the following skills.

  • Strong written and verbal communication skills;
  • Being able to work in a fast-paced multidisciplinary environment as in a competitive landscape new data keeps flowing in rapidly and the world is constantly changing;
  • Having the ability to query databases and perform statistical analysis;
  • Being able to develop or program databases;
  • Being able to advice senior management in clear language about the implications of their work for the organisation;
  • Having an, at least basic, understanding of how a business and strategy works;
  • Being able to create examples, prototypes, demonstrations to help management better understand the work;
  • Having a good understanding of design and architecture principles;

We would add, while an effective data scientist requires latitude to consider and experiment (work autonomously), she must also be able to work cooperatively. Data scientists are members of teams that aren’t simply made up of senior leaders. There are plenty of other employees who work in the trenches with ideas about situations that require solutions and how solutions would fit into goals of other departments. Failures in cooperation and communication can lead to costly disasters.

Likely, few data scientists possess all the above qualities, so a business should prioritize the ones important to them.

In planning for apply new technologies, businesses must also plan for how they will apportion responsibilities for critical data science needs–through third-party applications or data-science-specific internal departments or perhaps, both. At present, we are gazing at the tip of the big-data, data-scientist iceberg. Demand for big data solutions is increasing. So is the demand for the innovators behind the solutions.