Scope of testing in Big Data

The year 2016 is coined by many research analysts and industry observers as an emerging year of Big Data. It is believed that those who are already leveraging big data are sure to surge ahead while those who did not think of it will lag behind in the digital world. IDC prophesies revenue from the sales of big data and business analytics applications, tools, and services will increase more than 50%, from nearly $122 billion in 2015 to more than $187 billion in 2019. The analyst firm estimates revenue from technology, industry, and geography in its Worldwide Semiannual Big Data and Analytics Spending Guide 2015.
Also as per one of the recent study, 76% of the organizations in the US are planning to increase or maintain their investments in Big Data implementation in the next 2-3 years. Data is mounting every second from social networks, mobile, CRM application, etc. to provide organizations with highly valuable inputs related to hidden pattern in data that can immensely help organizations to chalk their success story. The volume of this data is sometimes expected to be in zettabytes. Processing of such high amounts of data from various sources needs to be done at a speed that is relevant to the organizations and is projected to respective users via applications.
As in the case of many other applications, QA has an important role in Big Data applications as well. Testing of Big Data is more about verification of data rather than testing individual features of an application. When it comes to testing of an enterprise Big Data system, there are few initial hiccups that need to be addressed.
As we all know data is actually fetched from different sources, it needs live integration to make all this data standard and useful. This is possible by the end to end testing of data sources to make sure that user data is clean. Check if data sampling, data techniques are connected and also check that the application does not have a scalability issue. A thoroughly tested application only would facilitate live deployment.
Its utmost important for tester working on big data application to know that testing a big data application itself becomes a data. While testing Big Data system, the tester needs to dig deep into unstructured or semi-structured data with changing schema as these systems cannot be tested with “sampling” as in DWH applications. Big Data applications have very huge data sets and testing has to be done with help of R&D approach. This is not an easy task for every tester to handle.
The testing of Big Data systems demands the testers to verify large volumes of data from various sources by using a clustering method. The data needs to be processed systematically, real-time or in batches. Quality check of data becomes critically important to check for accuracy, duplication, validity, consistency, etc. Based on the different areas of testing, we can categorize enterprise applications into three buckets:
Data Validation:
Data Validation makes sure that the right data is gathered from the right sources. After this, the data is pushed into the Big Data system and followed with the source data to make sure it matches with the new system and is pushed into the right location.
Business Logic Validation:
During business logic validation, the tester needs to verify the business logic at every node and then verify it again at multiple nodes. This is repeated multiple times to make sure data segregation and data aggregation rules are correctly working and key values are generated accurately.
Output Validation:
During output validation, the output data files are generated and then moved to the required system or DWH. The tester then checks the data integrity to make sure data is loaded successfully into the target system and also checks for data corruption if any.
From all the predictions discussed so far, it is clearly evident that Big Data systems hold big promise in today’s business environment. However to appreciate or to unlock its full potential testers have to employ the right strategies, improve test quality and identify bugs at early stages. It’s a tedious effort, but on systematic & successful execution the results pay fairly large dividends.