Big data technologies are maturing to a point at which more organizations are prepared to pilot and adopt big data as a core component of the information management and analytics infrastructure. Big data, as a compendium of emerging disruptive tools and technologies, is positioned as the next great step in enabling integrated analytics in many common business scenarios.
As big data wends its inextricable way into the enterprise, information technology (IT) practitioners and business sponsors alike will bump up against a number of challenges that must be addressed before any big data program can be successful. Five of those challenges are:
The uncertainty of the Data Management Landscape
One disruptive facet of big data is the use of a variety of innovative data management frameworks whose designs are intended to support both operational and to a greater extent, analytical processing. These approaches are generally lumped into a category referred to as NoSQL (that is, “not only SQL”) frameworks that are differentiated from the conventional relational database management system paradigm in terms of storage model, data access methodology, and are largely designed to meet performance demands for big data applications (such as managing massive amounts of data and rapid response times).
There are many competing technologies, and within each technical area, there are numerous rivals. Our first challenge is making the best choices while not introducing additional unknowns and risk to big data adoption.
The Big Data Talent Gap
The excitement around big data applications seems to imply that there is a broad community of experts available to help in implementation. However, this is not yet the case, and the talent gap poses our second challenge.
And the talent gap is real—consider these statistics: According to analyst firm McKinsey & Company, “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”2 And in a report from 2012, “Gartner analysts predicted that by 2015, 4.4 million IT jobs globally will be created to support big data with 1.9 million of those jobs in the United States. … However, while the jobs will be created, there is no assurance that there will be employees to fill those positions.
Getting Data into the Big Data Platform
The scale and variety of data to be absorbed into a big data environment can overwhelm the unprepared data practitioner, making data accessibility and integration our third challenge.
This actually implies two challenges for any organization starting a big data program. The first involves both cataloging the numerous data source types expected to be incorporated into the analytical framework and ensuring that there are methods for universal data accessibility, while the second is to understand the performance expectations and ensure that the tools and infrastructure can handle the volume transfers in a timely manner.
Synchronization across the Data Sources
As more data sets from diverse sources are incorporated into an analytical platform, the potential for time lags to impact data currency and consistency becomes our fourth challenge.
Once you have figured out how to get data into the big data platform, you begin to realize that data copies migrated from different sources on different schedules and at different rates can rapidly get out of synchronization with the originating systems. There are different aspects of synchrony. From a data currency perspective, synchrony implies that the data coming from one source is not out of date with data coming from another source. From a semantics perspective, synchronization implies a commonality of data concepts, definitions, metadata, and the like.
Getting Useful Information out of the Big Data Platform
Lastly, using big data for different purposes ranging from storage augmentation to enabling high-performance analytics is impeded if the information cannot be adequately provisioned back within the other components of the enterprise information architecture, making big data syndication our fifth challenge.
Most of the most practical uses cases for big data involve data availability: augmenting existing data storage as well as providing access to end-users employing business intelligence tools for the purpose of data discovery. These BI tools not only must be able to connect to one or more big data platforms, they must provide transparency to the data consumers to reduce or eliminate the need for custom coding. At the same time, as the number of data consumers grows, we can anticipate a need to support a rapidly expanding collection of many simultaneous user accesses. That demand may spike at different times or the day or in reaction to different aspects of business process cycles. Ensuring right-time data availability to the community of data consumers becomes a critical success factor.
Considering the business impacts of these challenges suggests some serious risks to successfully deploying a big data program.