当前位置:网站首页>Open source SPL optimized report application coping endlessly
Open source SPL optimized report application coping endlessly
2022-07-05 02:01:00 【Program ape DD_】
Now most of the reports in the application are developed using report tools , Mature report tools provide rich display settings 、 Chart type 、 Export printing and other functions can simplify report development , Very convenient . however , In the actual report development, we often encounter some very thorny deep-seated problems , Even experienced developers who are already proficient in using reporting tools will scratch their heads .
Why do these problems occur with report tools ?
Report development , It seems that the data is presented in a table or graph in a specified format , This is also the link that the report tool has always been good at . however , Raw data is often not suitable for direct presentation , You need to do some complicated processing first , This is the data preparation step .
From the perspective of report tools , Data preparation belongs to something other than the report , You can openly refuse to deal with . however , Rejection does not mean non existence , This work always needs to be done . There is no good tool , At present, the data preparation of the report is still in the relatively original hard coding stage , Hundreds of thousands of lines SQL、 Tens of hundreds K Stored procedures and a large number of JAVA The code is flooded behind the report .
Backward tools will inevitably lead to low production efficiency , It will seriously drag down the whole report development process , That's what happened “ Scratch your head ” The phenomenon . And because most report tools don't pay attention to , There are signs that the problem has not been solved yet .
The preparation of report data is to the report what the root of a tree is to a big tree , If it can't be solved at all , How much energy you spend on branches and leaves is in vain .
Open source SPL Appearance , It will greatly improve the difficulty of report data preparation .
SPL(Structured Process Language) It is a professional open source structured data computing engine , It provides a rich library of computing classes , Support process calculation , Good at completing all kinds of complex calculations . Multi source open computing is supported , Support hot switching , Provide standards JDBC The interface can be seamlessly integrated with report tools .
Reduce the workload of report development
Report development is mainly divided into two stages :
The first stage is to prepare data for the report , Pass the original data SQL/ stored procedure /Java Data sets available for processing into reports ; The second stage is to use the prepared data to write expressions for report presentation .
Data presentation can be done simply by using report tools , Especially the progress of front-end visualization technology in recent years , More and more reports are presented graphically , This further reduces the difficulty of the report presentation stage .
However , Data preparation is not instrumented . Rough statistics , With the support of report tool , The workload proportion of data presentation in current report development can be reduced to 20%, be left over 80% It's all data preparation , Even higher . Now , If you want to optimize report development 、 To improve the efficiency of report development, we should also start with report data preparation .
At present, the main way of data preparation is SQL( Include stored procedures ) and Java. The latter is much more troublesome than the former , Mainly because Java Lack of structured computing class library , Not a specialized set computing language . but SQL The lack of step-by-step mechanism is also very cumbersome to realize complex calculation , Plus it can only be based on Database , When you encounter other types of data sources, you can only rely on Java 了 . in addition , At present, some front and rear ends are separated 、 Microservice architecture requires that it can only be used on the application side Java Hard encoding . These factors increase the workload of report data preparation , This leads to inefficient report development .
Using open source SPL Can help / Replace the original report data preparation method , With the help of SPL Simple syntax and rich class library can quickly complete the task of report data preparation , So as to reduce the workload of report development .
Rich computing class library :
And SQL Different ,SPL Step by step calculation is advocated , The algorithm can be realized step by step according to natural thinking , This avoids writing overly complex SQL( complex SQL Not only is it difficult to write , Maintenance is also inconvenient ).
Here SPL and SQL Make a comparison (Java The calculation is much more complicated , Less comparable ).
Query target : According to the stock record table, the stock price has risen continuously by more than 5 Days of stocks and rising days ( Equal share prices are recorded as an increase )
SQL The implementation should be nested 3 Layer sub query can be completed :
select code,max(risenum)-1 maxRiseDays from
( select code,count(1) risenum from
(
select code,changeSign,sum(changeSign) over(partition by code order by ddate) unRiseDays from
(
select
code,
ddate,
case when price>=lag(price) over(partition by code order by ddate)
then 0 else 1 end changeSign
from stock_record
)
)
group by code,unRiseDays
)
group by code
having max(risenum) > 5
And the same calculation uses SPL It's much simpler :
A | ||
1 | [email protected]("orcl")[email protected]("select * from stock_record order by ddate") | |
2 | =A1.group(code) | |
3 | =A2.new(code,[email protected](price<price[-1]).max(~.len())-1:maxrisedays) | Calculate the number of consecutive days of rise for each stock |
4 | =A3.select(maxrisedays>=5) | Select qualified records |
SPL It also provides a simple and easy-to-use development environment , Step by step 、 To set breakpoints , WYSIWYG results preview window …
The results of each calculation can be viewed in real time , comparison SQL And stored procedures are not easy to debug, which is much more convenient and efficient .
SPL It can be integrated and embedded with report tools ,SPL Provides standards JDBC Interface is used by report tool , This can seamlessly replace the original hard coding method for report data preparation , It can even coexist with the original way .
Improve stored procedures and JAVA Disadvantages of data preparation
In report development, stored procedures and methods are used to deal with complex data preparation logic Java Data processing is not uncommon , When you get very limited development convenience , But it brought great trouble .
It is difficult to edit and debug stored procedures , It is more difficult to expand without portability , It will cause high coupling between report and database , Creating and modifying stored procedures requires high database permissions, which will bring security risks , Stored procedures often require specialized DBA maintain , It also pushes up the cost of report development . More Than This , The same stored procedure may also be shared by different modules or even different applications , This leads to tight coupling between applications , Pull the storage process and apply it all over the body .
SPL It provides computing power independent of database , From the perspective of stored procedures SPL Like a kind of “ Stored procedures outside the library ”. In this way, the report and database can be fully decoupled 、 Application and Application , There are no more security issues , Portability is also greatly enhanced , Again with the help of SPL Open and diverse data sources support , When the database is expanded or changed, you only need to modify the data source connection , There is no need to change the calculation logic , It can achieve smooth migration .
Java Due to the lack of structured computing class library , It is difficult to write the report data preparation code , There is also the problem of relying on professional programmers to push up the cost of report development .Java The realization of data preparation will also cause the tight coupling between the report module and other application modules , It is not conducive to maintaining the report module with high query pressure separately . As a compiled language ,Java Lack of effective hot switching mechanism , It is very disadvantageous to the frequent and changeable report business .
SPL The syntax is more concise , The code to implement the same calculation is shorter , Report developers can learn to use , Lower labor costs .SPL It can be integrated with the report module , Independent of other application modules , Separate operation and maintenance , Reduce coupling between applications . meanwhile ,SPL Explain that the implementation supports hot switching , It can better adapt to the changeable report business .
Reduce the number of intermediate tables in the database
In order to simplify the SQL Operation difficulty or improve query performance , Or deal with multi-source situations , Data preprocessing is often performed , Some intermediate results are processed in advance and stored in the database to form Database intermediate table . During report development, data preparation is completed based on these intermediate tables , Generally, it can simplify the development difficulty to a certain extent and obtain high query performance .
But the middle watch is a double-edged sword , While providing convenience, there are many disadvantages . In most cases, once the intermediate table is established, it can hardly be deleted ( Because the management mechanism of database tables is linear , It is difficult to classify to determine the attribution of the intermediate table , Dare not easily delete ), This will lead to more and more intermediate tables , Sometimes tens of thousands .
Intermediate tables take up database space , Resulting in insufficient database capacity ; Processing intermediate tables require database computing resources , Cause database performance degradation , Too many intermediate tables will cause the database to face the pressure of capacity expansion .
SPL It provides computing power independent of database , Intermediate tables can be stored in files outside the library ( Open data file format or SPL Storage format ),SPL Conduct data processing based on files and output calculation results for reports , Complete data preparation .
Putting a large number of intermediate tables outside the file system can greatly reduce the pressure on the database , No need to occupy the precious space of the database , There is no need to sacrifice database computing resources to process intermediate tables. It can kill two birds with one stone .
More Than This ,“ Intermediate table outside the library ” You can use the tree structure of the file system for management , Intermediate data of different directories correspond to reports of different businesses , Not only convenient for management , It can also further reduce the coupling between report modules .
Realize the hot switching of reports
Hot switch (Hot Swap) It refers to the replacement of system components without stopping the system , In report business, it refers to the maintenance of reports without restarting reports and related applications ( newly added 、 modify 、 Delete ), Real time modification , In real time .
At present, the rendering templates developed by most report tools can be hot switched , However, the data preparation as part of the report is different . Using a database SQL Hot switching can be achieved directly after data preparation , But compiled Java no way . And now with more advanced architecture ( Such as microservices ) Application , Use Java It is very common to complete report data preparation .
In order to solve the problem of report hot switching , have access to SPL replace Java Complete report data preparation in microservice architecture .SPL Explain to perform , Natural support for hot switching , At the same time, it has a perfect computing system and agile syntax, which can easily realize the task of data processing .
Address diverse data sources
The data sources of the current report are very rich ,RDB、NoSQL、CSV、Excel、HDFS、Restful/Webservice、Kafka… Can become a report data source , Diverse data sources pose two problems , How to connect these data sources ? How to associate calculation after connection ? And separate at the front and rear ends 、 Microservices and other architectures , Almost all reports will not be developed directly based on the database , The problem of multiple data sources is even more serious .
In the past, there were three ways to solve the problem of multiple sources of reports :
First, with the help of report tools . Some reporting tools provide multiple data source connection support , After fetching the data separately, complete the calculation of association in the report presentation template . however , The computing power of report tools is very weak , It can only realize very limited multi-source hybrid computing .
Second, multi-source data ETL To a RDB in , Transform single source into multi-source SQL Ability to complete calculations . This way is not only cumbersome , Real time data is not , Large amount of data or complex calculation will also cause database performance problems . And this is also a serious violation of the principles of microservice Architecture .
Third, use Java Hard encoding .Java We have said this many times , Not only is coding difficult , And it doesn't support hot switching .
Open source SPL At present, dozens of data sources are provided to support , These data sources can be quickly connected to complete data retrieval calculation . Not just connection access ,SPL Provide a rich computing class library, which is very convenient for heterogeneous source hybrid computing , Realize complex calculations such as multi-source correlation .
SPL Complete the calculation based on multi-source in real time , Output the calculated results directly to the report for presentation , It not only solves the problem of real-time data , It also improves the insufficient computing power of report tools 、Java The difficulty of coding and hot switching , It is an effective solution to the problem of multiple sources of reports .
Improve report performance
Report performance is an unavoidable topic , Report as OLAP Main application scenarios in , There may be a lot of data involved , Large amount of data 、 Complex calculation logic often leads to report performance problems . Reports are presented to business users , Poor performance will lead to a bad user experience . The problem of report performance is that report query is slow , But in fact, most of them are caused by data preparation , Once the data is ready , Presentation efficiency is often high .
Report data preparation is to process the original data into the data set required by the report , The report usually presents the aggregate results after aggregation, and the amount of data is not large , But the raw data can be very large , It's not just a lot of data , Data processing logic can also be complex , These will result in low performance .
In addition to common optimizations SQL outside , You can also use SPL To speed up .SQL Execution efficiency depends on the optimization ability of the database , And for complex SQL Database optimization engines often fail , Resulting in inefficient execution . In this case, the calculation logic can be used SPL Realization , With the help of SPL High performance algorithm to speed up .
If the database is too busy ( overpressure ) Resulting in slow query , Optimize SQL There's nothing you can do ; Not even at all SQL You can use ( Not RDB Source ) when , Use a that does not rely on database capabilities SPL It's more effective . In particular , For computing intensive tasks , Use SPL When optimizing, it is often necessary to put the data outside the file system in advance , The aim is to reduce RDB To SPL Of IO Time , If the data is retrieved and calculated in real time from the database ,IO The time may be longer than the calculation time . in addition ,SPL Storage has great advantages in data organization , be based on SPL Storage computing can achieve higher performance .
In addition to data preparation , Data transmission is another bottleneck . Report passed JDBC When the interface accesses the database and reads the required data , If the amount of data is large or the database JDBC Poor performance ( Of various databases JDBC Efficiency is different ) It will cause data transmission time to be too long , Cause the report to slow down .
For data intensive reports , Can pass SPL Parallel access To speed up . stay SPL Establish multiple database connections in ( At this time, the database is required to be relatively idle ), Use multithreading to read the data required by the report at the same time , It can be the same table , It can also be the result of associated calculation of multiple tables , In this way, the time of data transmission will be shortened to the original time in theory 1/n(n It's the number of threads ), To improve report performance .
Besides , The report itself may also have the problem of slow calculation . For example, the report tool completes the association of multiple data sets in the expression of report cells , Like this ds2.select(ID==ds1.ID), When parsing this expression, the report engine will complete the association in the way of sequential traversal , From ds2( Data sets 2) Take out a record , To ds1 ( Data sets 1) Middle traversal , lookup ID The same record ; Then take the second one and go through the search ;…
The computational complexity is square , It doesn't matter when the amount of data is small , When the amount of data is a little large, the performance will decline sharply .
The solution is to transfer the multi dataset association operation realized at the report end to the data preparation stage . If it is the same database, you can use SQL, But if SQL The operation efficiency is not high , Or when the data comes from multiple sources , have access to SPL Complete the correlation calculation . Still with the help of SPL Multi source capability 、 High performance algorithm and high-performance storage to achieve the purpose of speed up .
Low cost coping is endless
Report is different from other parts of enterprise information system , It will be continuously added with the system life cycle 、 modify . This is because enterprises will continue to give birth to new report requirements in the process of production and operation , This leads to endless reports . Endless reports cannot be eliminated , Can only adapt to , This requires low-cost adaptation .
In general , Want to be efficient 、 Low cost local response reports are endless. You can follow these steps :
First step , Introduce report tool to solve the manpower problem in report presentation stage .
First solve the easiest problem , By introducing professional report tools, we can liberate the manpower in the report data presentation stage , Complete the presentation of various charts . At present, most users will use report tools to develop reports , Therefore, this step has been basically realized .
The second step , Introduce calculation tools to solve the manpower problem in the report data preparation stage .
Similar to the first step , Report data preparation should also be instrumented in order to completely solve the problem of low efficiency of report data preparation in the past . What we discussed earlier is the problem at this stage , Use open source SPL The coding is simple 、 Multi source support 、 Hot switching and other features can well realize the instrumentalization of data preparation . Cooperate with the first step , It can make the whole report development work fully instrumented , So as to obtain higher development efficiency .
The third step , The independent report module optimizes the application structure .
After the report development is fully instrumented , You can adjust the application structure , Decouple the report module from the business system . The report module only shares the data source of the business system ( Database or other data storage medium ), Instead of being tightly coupled with the business system . After the report presentation and data preparation are instrumented , Report calculation can be interpreted and executed by middleware , such , The frequent modification and increase of reports do not need to restart the business system , Greatly reduce the complexity of operation and maintenance . In this process, it is particularly important to sort out the data sources , Sort out the data sources required by the report module separately , In the future, developing reports only needs to deal with these data sources .
Through these three steps, report development will be fully instrumented , Improved report development efficiency , At the same time, the application structure is optimized , The independent report module is operated and maintained separately , It is more reasonable in terms of technical structure and personnel structure , Effectively deal with endless reports .
appendix
The report cannot be finished , It all depends on the data source ; Report development is fast and slow , Data preparation is the key . With SPL, Report development efficiency will be raised to a new level , No matter how endless, I'm not afraid .
How can we make C Shorten the development cycle of bank performance appraisal report 5 times
blockbuster ! Open source SPL The exchange group was established
Easy to use SPL Open source !
In order to provide a platform for interested partners to communicate with each other ,
Specially opened an exchange group ( The group is completely free , No advertising, no classes )
Friends who need to join the group , Long press to scan the QR code below
Friends interested in this article , Please go to reading the original text to collect ^_^
边栏推荐
- Matrixone 0.2.0 is released, and the fastest SQL computing engine is coming
- batchnorm.py这个文件单GPU运行报错解决
- [swagger]-swagger learning
- Li Kou Jianzhi offer -- binary tree chapter
- Data guard -- theoretical explanation (III)
- Win: use PowerShell to check the strength of wireless signal
- R语言用logistic逻辑回归和AFRIMA、ARIMA时间序列模型预测世界人口
- Outlook:总是提示输入用户密码
- phpstrom设置函数注释说明
- Vulnstack3
猜你喜欢
Is there a sudden failure on the line? How to make emergency diagnosis, troubleshooting and recovery
Runc hang causes the kubernetes node notready
Restful Fast Request 2022.2.1发布,支持cURL导入
Variables in postman
官宣!第三届云原生编程挑战赛正式启动!
JVM - when multiple threads initialize the same class, only one thread is allowed to initialize
Kibana installation and configuration
PowerShell:在代理服务器后面使用 PowerShell
【LeetCode】88. Merge two ordered arrays
Win: use shadow mode to view the Desktop Session of a remote user
随机推荐
A label colorful navigation bar
Flutter 2.10 update details
85.4% mIOU! NVIDIA: using multi-scale attention for semantic segmentation, the code is open source!
Subject 3 how to turn on the high beam diagram? Is the high beam of section 3 up or down
187. Repeated DNA sequence - with unordered_ Map basic content
Matrixone 0.2.0 is released, and the fastest SQL computing engine is coming
Yyds dry inventory swagger positioning problem ⽅ formula
Richview trvunits image display units
Prometheus monitors the correct posture of redis cluster
[illumination du destin - 38]: Ghost Valley - chapitre 5 Flying clamp - one of the Warnings: There is a kind of killing called "hold Kill"
Win:将一般用户添加到 Local Admins 组中
MATLB | multi micro grid and distributed energy trading
Wechat applet: the latest WordPress black gold wallpaper wechat applet two open repair version source code download support traffic main revenue
如何做一个炫酷的墨水屏电子钟?
Win:使用组策略启用和禁用 USB 驱动器
小程序容器技术与物联网 IoT 可以碰撞出什么样的火花
Rabbit MQ message sending of vertx
Using openpyxl module to write the binary list into excel file
STM32 series - serial port UART software pin internal pull-up or external resistance pull-up - cause problem search
Redis' hyperloglog as a powerful tool for active user statistics