The Four V’s of Big Data Testing: Variety,Volume, Velocity, and Veracity
The expression “garbage, garbage out” emphasizes the need for thorough testing in any Big Data and analytics implementation. Big Data testing means ensuring the correctness and completeness of voluminous, often heterogeneous, data as it moves across different stages—ingestion, storage, analytics, and visualization—producing actionable insights. What should be our testing focus? Which of the 4 V’s—variety, volume, velocity, and veracity—are most important at which stage? For example, in the ingestion stage, testing needs to focus on variety of data rather than volume. As the data moves on to the storage stage, testing needs to focus on veracity rather than velocity. Jaya Bhallamudi presents a unique approach for analyzing a typical Big Data implementation architecture to identify various testing interfaces and highlight the specific V’s as the focus of testing. The focus is based on the context of the data flow (type of source from which data originates and the type of target to which the data is destined to move) and the context of the data (source data format, target data format, the business, filter, and transformation rules applied on the data), and then mapping them to different testing strategies. Take back the testing strategies and a test automation approach that are in perfect alignment with the 4 V’s of Big Data testing.