当前位置：网站首页>How to use data pipeline to realize test modernization

How to use data pipeline to realize test modernization

2022-07-26 11:39:00 【51CTO】

How to use data pipeline to realize test modernization _ data source

Enterprises need to understand how data synthesis and data pipeline can provide scalable solutions , To create consistent data that meets the actual needs of the test system .

Many enterprises are now submerged in data . They collect data from many sources , And try to find ways to use this data to advance business goals . One way to solve this problem is to use the data pipeline as a connection to the data source , And convert the data into some form available to the endpoint through the pipeline .

Although this is part of the ongoing struggle to manipulate data for enterprises , But it is always necessary to find a way to provide a good data set for testing . Enterprises need these datasets to test applications and systems in the entire architectural environment . They also need data sets to focus on testing all aspects of their enterprise , For example, safety and quality assurance .

Creating synthetic data is a very practical need . In short , This really means that enterprises need to find a way to create fictitious or false data . Enterprises want to create consistent data similar to the actual requirements of the test system . The following will look at data pipelines , And explore how enterprises can use it to start creating their own synthetic data , In order to test in the enterprise .

PART 01

Data pipeline and testing 

A very simple definition of a data pipeline is “ A series of data processing elements , The output of one element is the input of the next ”. To put it more simply , These are used to return data from the data source to be analyzed 、 Transform the basic connection of the level then used by the enterprise .

Data pipeline starts from retrieving data . They can use the application programming interface （API） Such programmable interface , Or through data flow and event processing interface , from SQL（DB） Data sources and other platforms to extract the required data .

Once the data is retrieved , You can decide to transform the data to meet the needs of end users . This can be done by data generation API、 By cleaning up or changing the structure of the retrieved data to build data , Last , For safety reasons , Data can be anonymized before being presented to end users .

These are just a few examples of data pipelines that can be used as part of the testing process , chart 1 Is a simple example of a data pipeline from the source to the final data warehouse location for further use .

How to use data pipeline to realize test modernization _ data source _02

chart 1

Testing requires enterprises to report to the system being tested 、 Applications or code snippets provide data sets . This data set can be created manually 、 Copy or generate from existing datasets for use by the test team .

When dealing with very small data sets , Creating test data manually can be useful , But when a large number of data sets are needed , It will become very troublesome . If the data contains sensitive elements , From existing （ Production to test ） Environmental replication data sets pose security and privacy issues . Generating data based on existing data can provide good results .

If enterprises want to generate data on a large scale , Consider security to provide anonymous results , And ensure the flexibility of generating content , So what to do ？ This is where data synthesis plays an important role . It allows enterprises to generate data with the flexibility they may need .

PART 02 Data synthesis for beginners

Generating synthetic data can provide a large amount of data while processing sensitive data elements . Synthetic data can be based on key data dimensions , For example, name 、 Address 、 Phone number 、 Account number 、 Social security 、 The credit card 、 identifier 、 Driving license number, etc .

Synthetic data is defined as false or created data , But it is usually based on real data , Used to expand to create larger 、 More realistic data sets for testing . then , The data generated for testing is provided to business users and developers in a secure and extensible way in the enterprise .

This synthetic data has a wide range of uses in any enterprise , For example, health care 、 Finance 、 Manufacturing and any other area that uses new technologies to meet various business needs . Its direct use is continuous testing 、 Safety and quality assurance practices , To help implement 、 Application development 、 Integration and data science work .

Enterprises can not only provide data sets on a large scale through data synthesis , It also ensures data consistency across multiple domains in the enterprise , At the same time, it provides feasible data in a real-world format . It's for developers 、 Architects and data architects provide a consistent approach across any enterprise , To test with data .

PART 03 Introduction to data synthesis

The best way to find that data synthesis can provide benefits to enterprises is to explore the most common usage patterns , Then dive into open source projects to start their experience . There are two simple patterns to start data synthesis ： In cloud native environment and cloud native API in , Pictured 2 Shown .

How to use data pipeline to realize test modernization _ Data sets _03

chart 2

The first mode is to run the data synthesis platform in a single container on the cloud platform chosen by the enterprise , And make use of API From the source in the container （ For example, applications or databases ） Extract data from . The second is to deploy the data synthesis platform on the selected cloud platform , And take advantage of cloud native API From any source （ For example, external independent data sources ） Extract the data .

Data synthesis shines in the following use cases ：

Retrieve the required data in the platform (SQL)
data retrieval (API)
The data generated (API)
Build more fictitious data on demand or as planned
Data construction
Build more structured data on demand or plan based on fictitious data
Create structured or unstructured data that meets your needs
Data centered on the streaming media industry
Use data pipeline to process various industry standard data
Provide real-world attributes by parsing and populating from real-time systems , So as to realize de identification and anonymization

Covering each of these use cases is beyond the scope of this article , But this list gives people a good understanding of the applicability of data synthesis and testing . An overview of the data composition data layer , Pictured 3 Shown ：

How to use data pipeline to realize test modernization _ data source _04

This is an overview of the data layer , And how the platform uses data fields across the United States （ Postal code and area code ） As an example, connect them . In the figure 3 Center of , You can see the loosely coupled data model that can be extended as needed . They constitute from existing data 、 The core foundation for accessing implementation data and industry standard data . This can be set up using data and adjusted based on the existing data structure in the enterprise . The output is the generated data 、 Reference data and platform specific data .

After this short journey of data synthesis , The next step is to start exploring what is called Project Herophilus Open source projects for , Enterprises can start using the data synthesis platform .

Enterprises will find the key starting area of data synthesis ：

The data layer —— Designed to be scalable and support all the requirements of the platform .
The data layer API—— What supports user requests is the data layer API, This API Set is about being able to generate data and persist it to the data layer .
Web UI(s) —— It aims to be the smallest viable product that can be used to view the data synthesis data layer implemented by the enterprise .

The three modules in the data synthesis project should help enterprises quickly start developing test data sets .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261104276207.html

当前位置：网站首页>How to use data pipeline to realize test modernization

How to use data pipeline to realize test modernization

边栏推荐

猜你喜欢

随机推荐