当前位置:网站首页>Efficient ETL Testing
Efficient ETL Testing
2022-07-06 15:38:00 【InfoQ】
- A new ETL language – Easy SQL
- A guide to write elegant ETL
- Neat syntax design of an ETL language (part 1)
- Neat syntax design of an ETL language (part 2)
ETL testing challenges
- Create a production-like environment.
- Copy the database definition and table schema to the environment.
- For tables used in the ETL, we prepare testing data and insert data to tables.
- We run the ETL and it generates a new table with data as a result.
- We compare the generated data and the expected data to find if there are any issues.
Testing ETL in Easy SQL
- Test case: A single test used to test your code in some specific scenario.
- Test suit: A bundle of a few unit test cases. Could be used to run them together.
Test suit
Test case
CASE
CASE
VARS
INPUT
OUTPUT
VARS
: A table with header and exactly one row of data.
INPUT
: A name of the input table specified at column ‘B’; A table with header and number of rows of data starting from column ‘C’ of the same row; Mandatory descriptions of each row of data at column ‘B’ starting from the next row.
OUTPUT
: The same format with ‘INPUT’, except that the descriptions of each row of data is optional.
int
bigint
boolean
string
int
tinyint
bigint
double
float
string
decimal
boolean
date
timestamp
array<string>
array<int>
array<tinyint>
array<bigint>
array<double>
array<float>
array<boolean>
array<date>
array<timestamp>
int
bigint
boolean
text
text
Int8
Boolean
String
INCLUDES
- Column ‘B’ at the same row of the
INCLUDES
keyword should be filled with the file path of the include command in ETL.
- Column ‘C’ at the same row of the
INCLUDES
keyword should be filled with the mocked body of the included file.
- Add another row to specify a second
INCLUDE
to mock, with column ‘B’ and ‘C’ filled with file path and the mocked file body.
-- target=temp.THE_RETURNED_TEMP_TABLE
select * from some_mocked_data
some_etl.sql
some_etl.xlsx
OUTPUT
some_db.some_table
some_db.some_table.sql
some_db.some_table.xlsx
pip3 install click==6.7 pymongo==3.10.1 xlrd==1.2.0
python3 -m easy_sql.sql_test convert-json -f {YOUR_XLSX_FILE_PATH}
.json
Run test
pip3 install click==6.7 pymongo==3.10.1 xlrd==1.2.0
python3 -m easy_sql.sql_test run-test -f {YOUR_XLSX_FILE_PATH} -b {BACKEND}
python3 -m easy_sql.sql_test --help
Run test programmatically
import os
from easy_sql.sql_tester import SqlTester
from easy_sql.sql_processor.backend import SparkBackend
from pyspark.sql import SparkSession
SqlTester(env='test',
backend_creator=lambda case: SparkBackend(SparkSession.builder.enableHiveSupport().getOrCreate()),
work_dir=os.path.abspath(os.curdir))\
.run_tests('path/to/your/test/file')
Summary
边栏推荐
- ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics
- MySQL教程的天花板,收藏好,慢慢看
- 做国外LEAD2022年下半年几点建议
- The problem that dockermysql cannot be accessed by the host machine is solved
- How to confirm the storage mode of the current system by program?
- Thinkphp5 multi table associative query method join queries two database tables, and the query results are spliced and returned
- 华为云GaussDB(for Redis)揭秘第21期:使用高斯Redis实现二级索引
- Machine test question 1
- Rust knowledge mind map XMIND
- OpenSSL: a full-featured toolkit for TLS and SSL protocols, and a general encryption library
猜你喜欢
Matlab tips (27) grey prediction
CocosCreator+TypeScripts自己写一个对象池
MySQL authentication bypass vulnerability (cve-2012-2122)
Dayu200 experience officer homepage AITO video & Canvas drawing dashboard (ETS)
Improving Multimodal Accuracy Through Modality Pre-training and Attention
Adavit -- dynamic network with adaptive selection of computing structure
CUDA exploration
#DAYU200体验官# 在DAYU200运行基于ArkUI-eTS的智能晾晒系统页面
MySQL实现字段分割一行转多行的示例代码
Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
随机推荐
How big is the empty structure?
The difference between enumeration and define macro
金融人士必读书籍系列之六:权益投资(基于cfa考试内容大纲和框架)
#DAYU200体验官# 在DAYU200运行基于ArkUI-eTS的智能晾晒系统页面
npm无法安装sharp
CocosCreator+TypeScripts自己写一个对象池
Windows auzre background operation interface of Microsoft's cloud computing products
How to achieve text animation effect
关于声子和热输运计算中BORN电荷和non-analytic修正的问题
The statement that allows full table scanning does not seem to take effect set odps sql. allow. fullscan=true; I
config:invalid signature 解决办法和问题排查详解
uniapp设置背景图效果demo(整理)
ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics
Designed for decision tree, the National University of Singapore and Tsinghua University jointly proposed a fast and safe federal learning system
The problem that dockermysql cannot be accessed by the host machine is solved
树的先序中序后序遍历
UVa 11732 – strcmp() Anyone?
Puppeter connects to the existing Chrome browser
[unity] upgraded version · Excel data analysis, automatically create corresponding C classes, automatically create scriptableobject generation classes, and automatically serialize asset files
Traversal of a tree in first order, middle order, and then order