|
Tamer Salman |
How
does an insurance company, financial institution, healthcare organization, or
government body get personal and confidential data to develop and test the
performance of new software? The challenges associated with managing personal and confidential
data are huge, especially with the increasingly stringent data privacy
regulations. Some data is private and confidential, other data
may have been redesigned and transformed, and some may not exist at all.
Typically, project leaders or database administrators will set up an additional
environment for development and testing. The big challenge is how to populate
it with data.
With expertise in constraint satisfaction and automatic test
generation, IBM researchers in Haifa developed the Data Fabrication Platform (DFP).
It’s a solution that efficiently creates high quality test data, while
eliminating any potential data security or privacy concerns. The platform is
already helping a large insurance company revamp their current
processes around test data.
Generating masses of personal (but fabricated) data
For
most situations, generating the mass of data needed involves in-house
scripting, simple data generation techniques, manual insertions and updates,
and a lot of masking and data scrubbing. Even after the test data is ready, the
requirements can change during development, rendering the current data useless
and necessitating a repeat of some processes. The result is a tedious, costly,
and time consuming process that doesn’t necessarily deliver results.
In order to accommodate distributed and outsourced development and
testing, our client needed test data that would not be susceptible to leaks or
breaches in security and privacy. They also need the ability to transform and
evolve the data as the business needs changed or were updated. DFP does this by allowing for rule sharing and migration. It also minimizes test-data generation efforts by eliminating security and privacy concerns, and offering support for early development and regression tests.
Data rules
The
logic of what’s needed in these secure, confidential instances can be
described using various rules that define the relationships between
different columns in your databases, resources for populating new data
columns, or transformations from archived data. DFP lets companies put these rules into the
system, and get the data needed as output. The platform consumes the
provided rules and generates the requested data, which can be
automatically inserted into the target databases, or any of a variety of
formats, such as XML, CSV, and DML files.
At the heart of the DFP lies a powerful Constraint Satisfaction Problem (CSP) solver, also developed in Haifa. A CSP typically involves numerous possibilities that can’t be solved automatically by straightforward algorithms within an acceptable amount of time. A form of artificial intelligence, the CSP solver from IBM solves these unique complex problems using it's ability to arrive at many more buildable solutions than traditional optimization approaches. The CSP solver provides accelerated solutions and helps eliminate errors by generating only data that is valid for the specific requirements.
In summary , the IBM Data Fabrication Platform is an easy to use technology
that allows for rule sharing and migration, minimizes test-data
generation efforts, eliminates security and privacy concerns, and makes it easier for companies to outsource development and testing. Labels: constraint satisfaction, CSP solver, data fabrication platform, IBM Research - Haifa, testing