Test data management (TDM) refers to the function that creates, manages, and delivers test data to application teams. Historically, application teams have manufactured data for development and testing in a siloed, unstructured fashion. Show Current State of Test Data ManagementIn today’s digital era, every company must bring high-quality applications to market at an increasingly competitive pace. While companies have adopted agile and DevOps methodologies in pursuit of this goal, many have significantly underinvested in test data—which has emerged as a constraint in the race to innovate. The TDM market has shifted to a new set of strategies, largely driven by an increased focus on application uptime, faster time-to-market, and lower costs. TDM is rapidly maturing alongside other IT initiatives such as DevOps and cloud. Once viewed as a back-office function, test data management (TDM) is a critical business enabler for enterprise agility, security, and cost efficiency. As the volume of application projects increases, many large IT organizations are recognizing the opportunity to gain economies of scale by consolidating TDM functions into a single group or department—enabling them to take advantage of innovative tools to create test data and operate much more efficiently than siloed, decentralized, and unstructured TDM teams. As increasing centralization has begun to yield large efficiency gains, the scope of TDM has since expanded to include the use of subsetting and synthetic data generation, and most recently, the use of masking to manipulate production data. Common Test Data ChallengesApplication development teams need fast, reliable test data for their projects, but many are constrained by the speed, quality, security, and costs of moving data across software development lifecycle (SDLC) environments. Below are the most common challenges that organizations face when it comes to managing test data. Test environment provisioning is a slow, manual, and high-touch process Most IT organizations rely on a request-fulfill model, in which developers and testers find their requests queuing behind others. Because it takes significant time and effort to create a copy of test data, it can take days, or even weeks to provision updated data for a test environment. Often, the time to turn around a new environment is directly correlated to how many people are involved in the process. Enterprises typically have 4 or more administrators involved in setting up and provisioning data for a non-production environment. Not only does this process place a strain on operations teams, it also creates time sinks during test cycles, slowing the pace of application delivery. Development teams lack high-fidelity data Development teams often lack access to test data that is fit for purpose. For example, depending on the release version being tested, a developer might require a data set as of a specific point in time. But all too often, he or she is forced to work with a stale copy of data due to the complexity of refreshing an environment. This can result in lost productivity due to time spent resolving data-related issues and increases the risk of data-related defects escaping into production. Data masking adds friction to release cycles For many applications, such as those processing credit card numbers, patient records, or other sensitive information, data masking is critical to ensuring regulatory compliance and protecting against data breaches. According to the Ponemon Institute, the cost of a data breach—including the costs of remediation, customer churn, and other losses—averages $3.92 million. However, masking sensitive data often adds operational overhead; an end-to-end masking process may take an entire week, which can prolong test cycles. Storage costs are continually on the rise IT organizations create multiple, redundant copies of test data, resulting in inefficient use of storage. To meet concurrent demands within the confines of storage capacity, operations teams must coordinate test data availability across multiple teams, applications, and release versions. As a result, development teams often contend for limited, shared environments, resulting in the serialization of critical application projects. Common Types of Test DataNo single technology exists that fulfills all TDM requirements. Rather, teams must build an integrated solution that provides all the data types required to meet a diverse set of testing needs. Once test data requirements have been identified, a successful TDM approach should aim to provide the appropriate types of test data, weighing the pros and cons of each. Production data provides the most complete test coverage, but it usually comes at the expense of agility and storage costs. For some applications, it can also mean exposing sensitive data. Subsets of production data are significantly more agile than full copies. They can provide some savings on hardware, CPU, and licensing costs, but it can be difficult to achieve sufficient test coverage. Masked production data (either full sets or subsets) makes it possible for development teams to use real data without introducing unsafe levels of risk. However, masking processes can elongate environment provisioning. Also, masking requires staging environments with additional storage and staff to ensure referential integrity after data is transformed. Synthetic data circumvents security issues, but the space savings are limited. While synthetic data might be required to test new features, this is only a relatively small percentage of test cases. If performed manually, creating test data is also prone to human error and requires an in-depth understanding of data relationships both within the database schema or file system, as well as those implicit in the data itself. Best Practices for Test Data Management: How to Effectively Prepare Your Test DataA comprehensive approach should seek to improve TDM in each of the following areas:
Data deliveryCreating a copy of production data for development or testing is often a time-consuming, labor-intensive process that usually lags demand. Organizations must build a solution that streamlines this process and creates a path towards fast, repeatable data delivery. Specifically, application team leaders should look for solutions that feature:
Data QualityOperations teams go through great efforts to make the right types of test data—such as masked production data or synthetic datasets—available to software development teams. As TDM teams balance requirements for different types of test data, they must also ensure data quality is preserved across three key dimensions:
Data securityMasking tools have emerged as an effective and reliable method of protecting test data. By irreversibly replacing sensitive data with fictitious yet realistic values, masking can ensure regulatory compliance and completely neutralize the risk of data breach in test environments. But to make masking practical and effective, organizations should consider the following requirements:
Infrastructure costsWith the rapid proliferation of test data, TDM teams must build a toolset that maximizes the efficient use of infrastructure resources. Specifically, a TDM toolset should meet the following criteria:
The Modern Approach to Test Data ManagementBy building a Data Platform for TDM, companies can transform how they manage and consume data. IT operations teams can mask and deliver data one hundred times faster while using ten times less space. The net result? More projects can be completed in less time using less infrastructure.
Explore how Delphix can help you test faster with greater confidence. |