当前位置：网站首页>From stream to kotlin to spl

From stream to kotlin to spl

2022-06-21 06:47:00 【Scholar's day 3 WYX】

JAVA It is often inconvenient to use database in development , But the scenario of structured data calculation .JAVA No relevant class libraries were provided in the early stage , Even sorting 、 The basic calculation of grouping also needs to be hard coded , Development efficiency is very low . later JAVA8 Launched Stream library , rely on Lambda expression 、 Chained programming style 、 Set function , Finally solved the problem of structured data computing class library from scratch .

Stream It can simplify the calculation of structured data

Sorting for example ：

Stream<Order> result=Orders
.sorted((sAmount1,sAmount2)->Double.compare(sAmount1.Amount,sAmount2.Amount))
.sorted((sClient1,sClient2)->CharSequence.compare(sClient2.Client,sClient1.Client));

In the above code sorted Is a set function , It is easy to sort ."( Parameters )-> The body of the function " I.e Lambda expression , You can simplify the definition of anonymous functions . Two sorted Functions linked together belong to the chain programming style , It can make multi-step calculation intuitive .

Stream Computing power is not strong enough

Still take the sorting above as an example ,sorted The function only needs to know the sort field and order / Reverse order is enough , Reference resources SQL Writing "…from Orders order by Client desc, Amount", But in fact, you have to enter the data type of the sorting field . The order / Reverse order asc/desc（ or +/-） And other symbols can simply express , But here you have to use compare function . in addition , The actual order of the fields to be sorted is the opposite of the order in which the code is written , Some counterintuitive .
Another example is group summary ：

Calendar cal=Calendar.getInstance();
Map<Object, DoubleSummaryStatistics> c=Orders.collect(Collectors.groupingBy(
        r->{
            cal.setTime(r.OrderDate);
            return cal.get(Calendar.YEAR)+"_"+r.SellerId;
            },
            Collectors.summarizingDouble(r->{
                return r.Amount;
            })
        )
);
    for(Object sellerid:c.keySet()){
        DoubleSummaryStatistics r =c.get(sellerid);
        String year_sellerid[]=((String)sellerid).split("_");
        System.out.println("group is (year):"+year_sellerid[0]+"\t (sellerid):"+year_sellerid[1]+"\t sum is："+r.getSum()+"\t count is："+r.getCount());
    }

In the above code , All places where field names appear , You have to write the name of the table first , namely " Table name . Field name ", But not like SQL Omit the table name in that way . The syntax of anonymous functions is complex , As the amount of code increases , Complexity grows rapidly . Two anonymous functions form a nest , The code is more difficult to interpret . To implement a group summary function, you need to use multiple functions and classes , Include groupingBy、collect、Collectors、summarizingDouble、DoubleSummaryStatistics etc. , The cost of learning is not low . The result of group summary is Map, Instead of structured data types , If you want to continue the calculation , It is usually necessary to define new structured data types , And convert the type , The process is cumbersome . Two grouping fields are common in structured data computing , But the function grouping Only one grouping variable is supported , To make a variable represent two fields , You have to take some workarounds , For example, create a structured data type with two fields , Or spell the two fields with underscores , This makes the code more cumbersome .

Stream Insufficient computing power , The reason is that its basic language JAVA It's a compiled language , Unable to provide professional structured data objects , Lack of strong support from the bottom .

JAVA It's a compiled language , The structure of the return value must be defined in advance , When there are many intermediate steps , You need to define multiple data structures , This not only makes the code cumbersome , It also leads to inflexible parameter processing , To implement anonymous syntax with a complex set of rules . Interpretative languages naturally support dynamic structures , You can also easily specify parameter expressions as value parameters or function parameters , Provide simpler anonymous functions .

under these circumstances ,Kotlin emerge as the times require .Kotlin Is based on JAVA Modern development language , The so-called modern , The focus is on JAVA Grammar, especially Stream On the improvement of , namely Lambda The expression is more concise , Set functions are more abundant .

Kotlin Computing power is better than Stream

Sorting for example ：

var resutl=Orders.sortedBy{
    it.Amount}.sortedByDescending{
    it.Client}

The above code does not need to indicate the data type of the sorting field , There is no need to express the order in a function / The reverse , Direct reference it As the default parameter of anonymous function , Instead of deliberately defining , Overall ratio Stream A lot shorter .

Kotlin Not much improvement , Computing power is still insufficient

Still take sorting as an example ,Kotlin Although the it This default parameter , But in theory, just knowing the field name is enough , There's no need to bring the watch name （it）. The sort function can only sort one field , Cannot receive multiple fields dynamically .

Another example is group summary ：

data class Grp(var OrderYear:Int,var SellerId:Int)
data class Agg(var sumAmount: Double,var rowCount:Int)
var result=Orders.groupingBy{Grp(it.OrderDate.year+1900,it.SellerId)}
    .fold(Agg(0.0,0),{
        acc, elem -> Agg(acc.sumAmount + elem.Amount,acc.rowCount+1)
    })
.toSortedMap(compareBy<Grp> { it. OrderYear}.thenBy { it. SellerId})
result.forEach{println("group fields:${it.key.OrderYear}\t${it.key.SellerId}\t aggregate fields:${it.value.sumAmount}\t${it.value.rowCount}") }

In the above code , An action of grouping and summarizing , You need to use multiple functions , Including complex nested functions . Where fields are used, the table name should be brought . The result of grouping is not structured . Define the data structure of intermediate results in advance .

If we continue to investigate the collection 、 More calculations such as correlation , You'll find the same pattern ：Kotlin The code is really better than Stream Shorter , But most of them are insignificant quantitative changes , There has been no profound qualitative change , There should be a lot of steps .

Kotlin Nor does it support dynamic data structures , Unable to provide professional structured data objects , It's hard to really simplify Lambda grammar , Cannot reference field directly without table name , Dynamic multi field calculation cannot be directly supported （ For example, multi field sorting ）.

esProc SPL Appearance , Will completely change JAVA The dilemma of structured data processing under Ecology .

esProc SPL yes JVM Open source structured data computing language under , Provides professional structured data objects , Built in rich calculation functions , Flexible and concise grammar , Easy to integrate JDBC Interface , Good at simplifying complex calculations .

SPL Built in rich calculation functions to realize basic calculation

Sorting for example ：=Orders.sort(-Client, Amount)

SPL It is not necessary to specify the data type of the sorting field , There is no need to use a function to indicate the direction / The reverse , You don't need to attach a table name when using fields , One function can dynamically sort multiple fields .

Group summary ：=Orders.groups(year(OrderDate),Client; sum(Amount),count(1))

The above calculation results are still structured data objects , Can directly participate in the next calculation . When grouping or summarizing two fields , There is no need to define the data structure in advance . The whole code has no redundant functions ,sum and count The usage is simple and easy to understand , It's even hard to detect that this is a nested anonymous function .

More calculations are just as simple ：

duplicate removal ：=Orders.id(Client)

Fuzzy query ：=Orders.select(Amount*Quantity>3000 && like(Client,“S”))

relation ：=join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))

SPL Provides JDBC Interface , Can be JAVA The code seamlessly calls

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="=T(\"D:/Orders.xls\"). Orders.groups(year(OrderDate),Client; sum(Amount))";
ResultSet result = statement.executeQuery(str);

SPL The grammatical style is simple and flexible , With strong computing power .

SPL It can simplify step-by-step calculation 、 Ordered computing 、 Calculation with complex logic such as calculation after grouping , quite a lot SQL/ Calculations that are difficult to implement by stored procedures , use SPL It's easy to solve . such as , Find out the top half of the total sales n A big client , And sort by sales in descending order ：

	A	B
1	…	/ Take the data
2	=A1.sort(amount:-1)	/ Sales are sorted in reverse order
3	=A2.cumulate(amount)	/ Calculate the cumulative sequence
4	=A3.m(-1)/2	/ The final accumulation is the total amount
5	=A3.pselect(~>=A4)	/ More than half the position
6	=A2(to(A5))	/ Take value by location

In addition to computing power ,SPL In system architecture 、 data source 、 Intermediate data storage 、 There are also some unique advantages in computing performance , These advantages help SPL Calculate the structured data outside the library .

SPL Support computing, hot switching and external code , It can reduce the system coupling .

such as , The above SPL Save the code as a script file , And then JAVA The file name is called as a stored procedure in ：

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call getClient()");

SPL It's interpreted language , After modification, you can directly run , No need to compile , No need to restart. JAVA service .SPL The code is external to JAVA, Called by file name , Do not rely on JAVA Code , Low coupling .

SPL Support multiple data sources , Cross source calculation and cross database calculation can be carried out .

SPL Support all kinds of databases ,txt\csv\xls Wait for the documents ,MongoDB、Hadoop、redis、ElasticSearch、Kafka、Cassandra etc. NoSQL, Specially , And support WebService XML、Restful Json And so on ：

	A
1	=json(file("d:/Orders.json").read())
2	=json(A1).conj()
3	=A2.select(Amount>p_start && Amount<=p_end)

Cross source association between text files and databases ：

	A
1	=T("Employees.csv")
2	=mysql1.cursor("select SellerId, Amount from Orders order by SellerId")
3	=joinx(A2:O,SellerId; A1:E,EId)
4	=A3.groups(E.Dept;sum(O.Amount))

SPL Provides its own storage format , Data can be stored temporarily or permanently , And carry out high-performance calculation .

SPL Support btx Storage format , It is suitable for temporarily storing data from low-speed data sources , such as CSV：

	A	B
1	=[T("d:/orders1.csv"), T("d:/orders2.csv")][email protected]()	/ Merge records
2	file("d:/fast.btx")[email protected](A1)	/ Write set file

btx Small volume , Fast reading and writing , It can be calculated like an ordinary text file ：

=T(“D:/fast.btx”).sort(Client,- Amount)

If the btx Orderly storage , And high computing performance , For example, parallel computing 、 Two points search .SPL It also supports higher performance ctx Storage format , Support compression 、 Column to save 、 Bank deposit 、 Distributed computing 、 Large concurrent computing , Suitable for persistent storage of large amounts of data , And carry out high-performance calculation .

In the calculation of structured data outside the database ,Stream Made a breakthrough contribution ;Kotlin Strengthened this capacity , But the characteristics of compiled language make it unable to go further ; To completely solve the problem of computing outside the library , It also needs to be SPL This professional structured data computing language .

SPL Information

原网站

版权声明
本文为[Scholar's day 3 WYX]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/172/202206210634083350.html