当前位置:网站首页>From stream to kotlin to spl
From stream to kotlin to spl
2022-06-21 06:47:00 【Scholar's day 3 WYX】
JAVA It is often inconvenient to use database in development , But the scenario of structured data calculation .JAVA No relevant class libraries were provided in the early stage , Even sorting 、 The basic calculation of grouping also needs to be hard coded , Development efficiency is very low . later JAVA8 Launched Stream library , rely on Lambda expression 、 Chained programming style 、 Set function , Finally solved the problem of structured data computing class library from scratch .
Stream It can simplify the calculation of structured data
Sorting for example :
Stream<Order> result=Orders
.sorted((sAmount1,sAmount2)->Double.compare(sAmount1.Amount,sAmount2.Amount))
.sorted((sClient1,sClient2)->CharSequence.compare(sClient2.Client,sClient1.Client));
In the above code sorted Is a set function , It is easy to sort ."( Parameters )-> The body of the function " I.e Lambda expression , You can simplify the definition of anonymous functions . Two sorted Functions linked together belong to the chain programming style , It can make multi-step calculation intuitive .
Stream Computing power is not strong enough
Still take the sorting above as an example ,sorted The function only needs to know the sort field and order / Reverse order is enough , Reference resources SQL Writing "…from Orders order by Client desc, Amount", But in fact, you have to enter the data type of the sorting field . The order / Reverse order asc/desc( or +/-) And other symbols can simply express , But here you have to use compare function . in addition , The actual order of the fields to be sorted is the opposite of the order in which the code is written , Some counterintuitive .
Another example is group summary :
Calendar cal=Calendar.getInstance();
Map<Object, DoubleSummaryStatistics> c=Orders.collect(Collectors.groupingBy(
r->{
cal.setTime(r.OrderDate);
return cal.get(Calendar.YEAR)+"_"+r.SellerId;
},
Collectors.summarizingDouble(r->{
return r.Amount;
})
)
);
for(Object sellerid:c.keySet()){
DoubleSummaryStatistics r =c.get(sellerid);
String year_sellerid[]=((String)sellerid).split("_");
System.out.println("group is (year):"+year_sellerid[0]+"\t (sellerid):"+year_sellerid[1]+"\t sum is:"+r.getSum()+"\t count is:"+r.getCount());
}
In the above code , All places where field names appear , You have to write the name of the table first , namely " Table name . Field name ", But not like SQL Omit the table name in that way . The syntax of anonymous functions is complex , As the amount of code increases , Complexity grows rapidly . Two anonymous functions form a nest , The code is more difficult to interpret . To implement a group summary function, you need to use multiple functions and classes , Include groupingBy、collect、Collectors、summarizingDouble、DoubleSummaryStatistics etc. , The cost of learning is not low . The result of group summary is Map, Instead of structured data types , If you want to continue the calculation , It is usually necessary to define new structured data types , And convert the type , The process is cumbersome . Two grouping fields are common in structured data computing , But the function grouping Only one grouping variable is supported , To make a variable represent two fields , You have to take some workarounds , For example, create a structured data type with two fields , Or spell the two fields with underscores , This makes the code more cumbersome .
Stream Insufficient computing power , The reason is that its basic language JAVA It's a compiled language , Unable to provide professional structured data objects , Lack of strong support from the bottom .
JAVA It's a compiled language , The structure of the return value must be defined in advance , When there are many intermediate steps , You need to define multiple data structures , This not only makes the code cumbersome , It also leads to inflexible parameter processing , To implement anonymous syntax with a complex set of rules . Interpretative languages naturally support dynamic structures , You can also easily specify parameter expressions as value parameters or function parameters , Provide simpler anonymous functions .
under these circumstances ,Kotlin emerge as the times require .Kotlin Is based on JAVA Modern development language , The so-called modern , The focus is on JAVA Grammar, especially Stream On the improvement of , namely Lambda The expression is more concise , Set functions are more abundant .
Kotlin Computing power is better than Stream
Sorting for example :
var resutl=Orders.sortedBy{
it.Amount}.sortedByDescending{
it.Client}
The above code does not need to indicate the data type of the sorting field , There is no need to express the order in a function / The reverse , Direct reference it As the default parameter of anonymous function , Instead of deliberately defining , Overall ratio Stream A lot shorter .
Kotlin Not much improvement , Computing power is still insufficient
Still take sorting as an example ,Kotlin Although the it This default parameter , But in theory, just knowing the field name is enough , There's no need to bring the watch name (it). The sort function can only sort one field , Cannot receive multiple fields dynamically .
Another example is group summary :
data class Grp(var OrderYear:Int,var SellerId:Int)
data class Agg(var sumAmount: Double,var rowCount:Int)
var result=Orders.groupingBy{Grp(it.OrderDate.year+1900,it.SellerId)}
.fold(Agg(0.0,0),{
acc, elem -> Agg(acc.sumAmount + elem.Amount,acc.rowCount+1)
})
.toSortedMap(compareBy<Grp> { it. OrderYear}.thenBy { it. SellerId})
result.forEach{println("group fields:${it.key.OrderYear}\t${it.key.SellerId}\t aggregate fields:${it.value.sumAmount}\t${it.value.rowCount}") }
In the above code , An action of grouping and summarizing , You need to use multiple functions , Including complex nested functions . Where fields are used, the table name should be brought . The result of grouping is not structured . Define the data structure of intermediate results in advance .
If we continue to investigate the collection 、 More calculations such as correlation , You'll find the same pattern :Kotlin The code is really better than Stream Shorter , But most of them are insignificant quantitative changes , There has been no profound qualitative change , There should be a lot of steps .
Kotlin Nor does it support dynamic data structures , Unable to provide professional structured data objects , It's hard to really simplify Lambda grammar , Cannot reference field directly without table name , Dynamic multi field calculation cannot be directly supported ( For example, multi field sorting ).
esProc SPL Appearance , Will completely change JAVA The dilemma of structured data processing under Ecology .
esProc SPL yes JVM Open source structured data computing language under , Provides professional structured data objects , Built in rich calculation functions , Flexible and concise grammar , Easy to integrate JDBC Interface , Good at simplifying complex calculations .
SPL Built in rich calculation functions to realize basic calculation
Sorting for example :=Orders.sort(-Client, Amount)
SPL It is not necessary to specify the data type of the sorting field , There is no need to use a function to indicate the direction / The reverse , You don't need to attach a table name when using fields , One function can dynamically sort multiple fields .
Group summary :=Orders.groups(year(OrderDate),Client; sum(Amount),count(1))
The above calculation results are still structured data objects , Can directly participate in the next calculation . When grouping or summarizing two fields , There is no need to define the data structure in advance . The whole code has no redundant functions ,sum and count The usage is simple and easy to understand , It's even hard to detect that this is a nested anonymous function .
More calculations are just as simple :
duplicate removal :=Orders.id(Client)
Fuzzy query :=Orders.select(Amount*Quantity>3000 && like(Client,“S”))
relation :=join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))
SPL Provides JDBC Interface , Can be JAVA The code seamlessly calls
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="=T(\"D:/Orders.xls\"). Orders.groups(year(OrderDate),Client; sum(Amount))";
ResultSet result = statement.executeQuery(str);
SPL The grammatical style is simple and flexible , With strong computing power .
SPL It can simplify step-by-step calculation 、 Ordered computing 、 Calculation with complex logic such as calculation after grouping , quite a lot SQL/ Calculations that are difficult to implement by stored procedures , use SPL It's easy to solve . such as , Find out the top half of the total sales n A big client , And sort by sales in descending order :
| A | B | |
| 1 | … | / Take the data |
| 2 | =A1.sort(amount:-1) | / Sales are sorted in reverse order |
| 3 | =A2.cumulate(amount) | / Calculate the cumulative sequence |
| 4 | =A3.m(-1)/2 | / The final accumulation is the total amount |
| 5 | =A3.pselect(~>=A4) | / More than half the position |
| 6 | =A2(to(A5)) | / Take value by location |
In addition to computing power ,SPL In system architecture 、 data source 、 Intermediate data storage 、 There are also some unique advantages in computing performance , These advantages help SPL Calculate the structured data outside the library .
SPL Support computing, hot switching and external code , It can reduce the system coupling .
such as , The above SPL Save the code as a script file , And then JAVA The file name is called as a stored procedure in :
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call getClient()");
SPL It's interpreted language , After modification, you can directly run , No need to compile , No need to restart. JAVA service .SPL The code is external to JAVA, Called by file name , Do not rely on JAVA Code , Low coupling .
SPL Support multiple data sources , Cross source calculation and cross database calculation can be carried out .
SPL Support all kinds of databases ,txt\csv\xls Wait for the documents ,MongoDB、Hadoop、redis、ElasticSearch、Kafka、Cassandra etc. NoSQL, Specially , And support WebService XML、Restful Json And so on :
| A | |
| 1 | =json(file("d:/Orders.json").read()) |
| 2 | =json(A1).conj() |
| 3 | =A2.select(Amount>p_start && Amount<=p_end) |
| A | |
| 1 | =T("Employees.csv") |
| 2 | =mysql1.cursor("select SellerId, Amount from Orders order by SellerId") |
| 3 | =joinx(A2:O,SellerId; A1:E,EId) |
| 4 | =A3.groups(E.Dept;sum(O.Amount)) |
SPL Provides its own storage format , Data can be stored temporarily or permanently , And carry out high-performance calculation .
SPL Support btx Storage format , It is suitable for temporarily storing data from low-speed data sources , such as CSV:
| A | B | |
| 1 | =[T("d:/orders1.csv"), T("d:/orders2.csv")][email protected]() | / Merge records |
| 2 | file("d:/fast.btx")[email protected](A1) | / Write set file |
=T(“D:/fast.btx”).sort(Client,- Amount)
If the btx Orderly storage , And high computing performance , For example, parallel computing 、 Two points search .SPL It also supports higher performance ctx Storage format , Support compression 、 Column to save 、 Bank deposit 、 Distributed computing 、 Large concurrent computing , Suitable for persistent storage of large amounts of data , And carry out high-performance calculation .
In the calculation of structured data outside the database ,Stream Made a breakthrough contribution ;Kotlin Strengthened this capacity , But the characteristics of compiled language make it unable to go further ; To completely solve the problem of computing outside the library , It also needs to be SPL This professional structured data computing language .
SPL Information
边栏推荐
- Zongzi battle - guess who can win
- 【基于栈的二叉树中序遍历】二叉树的中序遍历+栈,O(h)的空间复杂度
- easyUI的combox下拉列表的远程数据的绑定方法
- Argo CD 使用
- Excel_ submit
- Pyg tutorial (5): analyzing the message propagation mechanism in GNN
- Argo CD usage
- [JDBC from starting to Real combat] JDBC Basic clearance tutoriel (Summary of the first part)
- 工作那点事
- 154-Solana分发token
猜你喜欢

Pyg tutorial (2): graph data

机器学习之数据归一化(Feature Scaling)
![Markdown mathematical grammar [detailed summary]](/img/c2/7aff61f7e82595a9d22c2d593148f0.png)
Markdown mathematical grammar [detailed summary]

156 rust and Solana environment configuration

异常的相关介绍
Butler-Volmer 公式的由来

如何通过JDBC访问MySQL数据库?手把手实现登录界面(图解+完整代码)

Blasting with burp (ordinary blasting + verification code blasting)

How powerful are spectral graph neural networks

Microphone loading animation
随机推荐
PyG教程(5):剖析GNN中的消息传播机制
端午节-简单侧边导航栏
What is a good primary key for MySQL
156 rust and Solana environment configuration
Sqlmap命令大全
onnx转tensorrt学习笔记
Binding method of remote data in the combox drop-down list of easyUI
520 bubble source code
Pychart sets the default interpreter for the project
The database has the problem of user changing password
Sqlmap tool
[query the data in the third row of the data table]
Why should I use the source code of nametuple replace(‘,‘, ‘ ‘). Split() instead of split(‘,‘)
Dynamic planning exercises (II)
MySQL使用什么作为主键比较好
Answer the question: what do you think AgI should adopt?
小程序【第一期】
Old users come back and have a look
数据可视化实战:数据处理
Sqlmap工具