当前位置：网站首页>Flink-- custom function

Flink-- custom function

2022-07-03 10:48:00 【Samooyou】

This article introduces you how to calculate in real time Flink Version custom scalar function （UDF）、 Custom aggregate functions （UDAF）、 Custom table valued functions （UDTF） Write business code and go online .

Custom scalar functions （UDF）

Definition

Custom scalar functions （UDF） take 0 individual 、1 One or more scalar values are mapped to a new scalar value .

To define scalar functions , Must be in org.apache.flink.table.functions Extend base classes in Scalar Function, And implement （ One or more ） evaluation （evaluation,eval） Method . The behavior of scalar functions is determined by the evaluation method , The evaluation method must be publicly declared and named eval（ direct def Statement , No, override）. Parameter types and return types of evaluation methods , The parameters and return types of scalar functions are determined .

Business code

UDF Need to be in ScalarFunction In the class implementation eval Method .open Methods and close The method is optional .

Be careful ：UDF By default, there will be the same output for the same input . If UDF The same output cannot be guaranteed , for example , stay UDF Calling external service , The same input value may return different results , It is recommended that you use override isDeterministic() Method , return False. Otherwise, under certain conditions , The output is not as expected . for example ,UDF The operator moves forward .

With Java For example , The sample code is as follows .

import org.apache.flink.table.functions.FunctionContext;

import org.apache.flink.table.functions.ScalarFunction;

public class StringLengthUdf extends ScalarFunction {

// Optional ,open Method can not write .

// If you write open Method needs to declare 'import org.apache.flink.table.functions.FunctionContext;'.

@Override

public void open(FunctionContext context) {

}

public long eval(String a) {

return a == null ? 0 : a.length();

}

public long eval(String b, String c) {

return eval(b) + eval(c);

}

// Optional ,close Method can not write .

@Override

public void close() {

}

Register to use

Registration details ： Register to use UDF.

To write SQL sentence

Register at UDF After completion , You can use UDF, Must be database.udf( Database name .udf) Use your UDF function .
In custom functions SQL An example of the use statement of is as follows .

create table sls_stream(

a int,

b int,

c varchar

) with (

type='sls',

endPoint='<yourEndpoint>',

accessKeyId='<yourAccessId>',

accessKeySecret='<yourAccessSecret>',

startTime = '2017-07-04 00:00:00',

project='<yourProjectName>',

logStore='<yourLogStoreName>',

consumerGroup='consumerGroupTest1'

);

create table rds_output(

id int,

len bigint,

content VARCHAR

) with (

type='rds',

url='yourDatabaseURL',

tableName='<yourDatabaseTableName>',

userName='<yourDatabaseUserName>',

password='<yourDatabasePassword>'

);

insert into rds_output

select

a,

databasename.stringLengthUdf(c),

c as content

from sls_stream;

Custom aggregate functions （UDAF）

Definition

User defined aggregate function （User-Defined Aggregate Functions,UDAGGs） You can put the data in a table , Aggregate into a scalar value . User defined aggregate functions , By inheritance AggregateFunction Abstract class implementation of .

AggregateFunction It works as follows .

First , It requires an accumulator , The data structure used to hold the intermediate results of aggregation （ state ）. You can call AggregateFunction Of createAccumulator() Method to create an empty accumulator .
And then , Call the function for each input line accumulate() Method to update the accumulator .
After processing all the lines , Will call the getValue() Method to calculate and return the final result .

AggregationFunction Require methods that must be implemented ：

createAccumulator()
accumulate()
getValue()

In addition to the above methods , There are also some alternative implementation methods . Some of these methods , It can make the system execute queries more efficiently , And other methods , Required for some scenarios . for example , If the aggregate function is applied to the session window （session group window） In the context of , be merge() Methods are necessary .

retract()
merge()
resetAccumulator()

AggregateFunction Core interface method , As shown below ：

createAccumulator and getValue Method

/*

* @param <T> UDAF Type of output result .

* @param <ACC> UDAF Of accumulator The type of .accumulator yes UDAF The data type used to store intermediate results in calculation . You can design each according to your needs UDAF Of accumulator.

*/

public abstract class AggregateFunction<T, ACC> extends UserDefinedFunction {

/*

* initialization AggregateFunction Of accumulator.

* The system is working on the first one aggregate Before calculation , Call this method once .

*/

public ACC createAccumulator();

/*

* The system works every time aggregate After calculation , Call this method .

*/

public T getValue(ACC accumulator);

}

explain

createAccumulator and getValue Can be defined in AggregateFunction Abstract class .
UDAF Must contain 1 individual accumulate Method .

accumulate Method

public void accumulate(ACC accumulator, ...[ User specified input parameters ]...);

explain

You need to implement a accumulate Method , To describe how to calculate the input data , And update the data to accumulator in .
accumulate The first argument to the method must be to use AggregateFunction Of ACC Type of accumulator. During the operation of the system ,runtime The code will put accumulator The historical status of and the upstream data you specified （ Support any number of , Any type of data ） As a parameter , Pass it on to accumulate Method .

retract and merge Method

createAccumulator、getValue and accumulate 3 Use it together , You can design a basic UDAF. But real-time computing Flink Some special scenarios need to be provided by you retract and merge There are two ways to do it .
Usually , The calculation is an advance observation of the infinite flow （early firing）. Since there are early firing, There will be modifications to the issued results , This operation is called withdrawal （retract）.SQL The translation optimizer will help you automatically determine which cases will produce withdrawn data , Which operations need to process data with withdrawal marks . But you need to implement a retract Method to process the withdrawn data .

public void retract(ACC accumulator, ...[ Input parameters you specified ]...);

explain

retract The method is accumulate Reverse operation of method . for example , Realization Count Functional UDAF, In the use of accumulate When the method is used , Every piece of data needs to be added 1; In the use of retract When the method is used , We have to reduce 1.
Be similar to accumulate Method ,retract Methods the first 1 Two parameters must be used AggregateFunction Of ACC Type of accumulator. During the operation of the system ,runtime The code will put accumulator The historical state of , And the upstream data you specify （ Any number , Any type of data ） Send to retract Calculation .

In real time computing Flink Some scenes in the version need to use merge Method , for example session window. Due to real-time computing Flink Version with out of order Characteristics of , The data entered after may be located in 2 Two separate ones session middle , This way 2 individual session combine 1 individual session. here , Need to use merge Method to put multiple accumulator combine 1 individual accumulator.

public void merge(ACC accumulator, Iterable<ACC> its);

explain

merge Methods the first 1 Parameters , Must be used AggregateFunction Of ACC Type of accumulator, And the first 1 individual accumulator yes merge After method completion , Where the status is stored .
merge Methods the first 2 The parameters are 1 individual ACC Type of accumulator Ergodic iterator , There may be something in it 1 One or more accumulator.

Write business logic code

With Java For example , The example code is as follows .

import org.apache.flink.table.functions.AggregateFunction;

public class CountUdaf extends AggregateFunction<Long, CountUdaf.CountAccum> {

// Define storage count UDAF State of accumulator The data structure of .

public static class CountAccum {

public long total;

}

// initialization count UDAF Of accumulator.

public CountAccum createAccumulator() {

CountAccum acc = new CountAccum();

acc.total = 0;

return acc;

}

//getValue Provides how to pass the storage status accumulator Calculation count UDAF The results of the method .

public Long getValue(CountAccum accumulator) {

return accumulator.total;

}

//accumulate Provides how to update... Based on the input data count UDAF In storage accumulator.

public void accumulate(CountAccum accumulator, Object iValue) {

accumulator.total++;

}

public void merge(CountAccum accumulator, Iterable<CountAccum> its) {

for (CountAccum other : its) {

accumulator.total += other.total;

}

Register to use

Registration details ： Register to use UDF.

To write SQL sentence

Register at UDF After completion , You can use UDF, Must be database.udf( Database name .udf) Use your UDF function .
In the custom aggregate function SQL An example of the use statement of is as follows .

create table sls_stream(

a int,

b bigint,

c varchar

) with (

type='sls',

endPoint='yourEndpoint',

accessKeyId='yourAccessId',

accessKeySecret='yourAccessSecret',

startTime='2017-07-04 00:00:00',

project='<yourPorjectName>',

logStore='stream-test2',

consumerGroup='consumerGroupTest3'

);

create table rds_output(

len1 bigint,

len2 bigint

) with (

type='rds',

url='yourDatabaseURL',

tableName='<yourDatabaseTableName>',

userName='<yourDatabaseUserName>',

password='<yourDatabasePassword>'

);

insert into rds_output

select

count(a),

databasenaem.countUdaf(a)

from sls_stream;

Custom table valued functions （UDTF）

Definition

User defined table aggregation functions （User-Defined Table Aggregate Functions,UDTAGGs）, You can put the data in a table , Aggregate into result tables with multiple rows and columns . This one AggregateFunction Very similar , Just before the aggregation result is a scalar value , Now it becomes a table . Similar to a custom scalar function , Custom table valued functions （UDTF） take 0 individual 、1 One or more scalar values as input parameters （ It can be a variable length parameter ）. Unlike scalar functions , Table valued functions can return any number of rows as output , Not just 1 It's worth . The returned line can be returned by 1 Columns or columns .

User defined table aggregation functions , By inheritance TableAggregateFunction Abstract class .

TableAggregateFunction It works as follows .

First , It also needs an accumulator （Accumulator）, It's a data structure that holds the intermediate results of aggregation . By calling TableAggregateFunction Of createAccumulator() Method to create an empty accumulator .
And then , Call the function for each input line accumulate() Method to update the accumulator .
After processing all the lines , Will call the emitValue() Method to calculate and return the final result .

AggregationFunction Require methods that must be implemented ：

createAccumulator()
accumulate()

In addition to the above methods , There are also some alternative implementation methods .

retract()
merge()
resetAccumulator()
emitValue()
emitUpdateWithRetract()

Business code

UDTF Need to be in TableFunction In the class implementation eval Method .open Methods and close The method is optional . With Java For example , The sample code is as follows .

import org.apache.flink.table.functions.FunctionContext;

import org.apache.flink.table.functions.TableFunction;

public class SplitUdtf extends TableFunction<String> {

// Optional ,open Methods are not written . If you write , You need to add a declaration 'import org.apache.flink.table.functions.FunctionContext;'.

@Override

public void open(FunctionContext context) {

// ... ...

}

public void eval(String str) {

String[] split = str.split("\\|");

for (String s : split) {

collect(s);

}

// Optional ,close Methods are not written .

@Override

public void close() {

// ... ...

}

Multiline return

UDTF You can call collect() The implementation will 1 The data of a row is converted to multiple rows and returned .

Multi column return

UDTF Not only can we do 1 Line to line , just so so 1 Column to multi column . If you need UDTF Return to multiple columns , Just declare the return value as Tuple or Row.Tuple or Row Explain the following ：

The return value is Tuple
Real time computing Flink Version supports the use of Tuple1 To Tuple25 , Definition 1 Fields to 25 A field . use Tuple3 Come back to 3 Of fields UDTF Examples are as follows .

import org.apache.flink.api.java.tuple.Tuple3;

import org.apache.flink.table.functions.TableFunction;

// Use Tuple As return value , Be sure to declare it explicitly Tuple Generic types of , for example ,String、Long and Integer.

public class ParseUdtf extends TableFunction < Tuple3 < String, Long, Integer >>

{

public void eval(String str)

{

String[] split = str.split(",");

// The following code is an example only , The actual business needs to add more verification logic .

String first = split[0];

long second = Long.parseLong(split[1]);

int third = Integer.parseInt(split[2]);

Tuple3 < String, Long, Integer > tuple3 = Tuple3.of(first, second, third);

collect(tuple3);

}

explain Use Tuple when , Field value cannot be null, And can only exist at most 25 A field .

The return value is Row
Use Row To return 3 Of fields UDTF Examples are as follows .

import org.apache.flink.table.types.DataType;

import org.apache.flink.table.types.DataTypes;

import org.apache.flink.table.functions.TableFunction;

import org.apache.flink.types.Row;

public class ParseUdtf extends TableFunction < Row >

{

public void eval(String str)

{

String[] split = str.split(",");

String first = split[0];

long second = Long.parseLong(split[1]);

int third = Integer.parseInt(split[2]);

Row row = new Row(3);

row.setField(0, first);

row.setField(1, second);

row.setField(2, third);

collect(row);

}@

Override

// If the return value is Row, Must be overloaded getResultType Method , Explicitly declare the returned field type .

public DataType getResultType(Object[] arguments, Class[] argTypes)

{

return DataTypes.createRowType(DataTypes.STRING, DataTypes.LONG, DataTypes.INT);

}

explain Row The field value of can be null, But if you need to use Row, Must overload implementation getResultType Method .

SQL grammar

UDTF Support cross join and left join, In the use of UDTF You need to add lateral and table keyword .

cross join

Each row of data in the left table will be associated with UDTF Each row of output data , If UDTF No data is produced , Then this 1 Rows will not be output .

select S.id, S.content, T.a, T.b, T.c

from input_stream as S,

lateral table(parseUdtf(content)) as T(a, b, c);

left join

Each row of data in the left table will be associated with UDTF Each row of output data , If UDTF No data is produced , Then this 1 Yes UDTF The field will be used null Value padding .

select S.id, S.content, T.a, T.b, T.c

from input_stream as S

left join lateral table(parseUdtf(content)) as T(a, b, c) on true;

Register to use

Registration details ： Register to use UDF.

To write SQL sentence

create table sls_stream(

a INT,

b BIGINT,

c VARCHAR

) with (

type='sls',

endPoint='yourEndpoint',

accessKeyId='yourAccessKeyId',

accessKeySecret='yourAccessSecret',

startTime = '2017-07-04 00:00:00',

project='yourProjectName',

logStore='yourLogStoreName',

consumerGroup='consumerGroupTest2'

);

-- take c Field in splitUdtf, Get multiple lines after segmentation 1 List of columns T(s).s Represents the field name .

create view v1 as

select a,b,c,s

from sls_stream,

lateral table(databasename.splitUdtf(c)) as T(s);

create table rds_output(

id INT,

len BIGINT,

content VARCHAR

) with (

type='rds',

url='yourDatabaseURL',

tableName='yourDatabaseTableName',

userName='yourDatabaseUserName',

password='yourDatabasePassword'

);

insert into rds_output

select

a,b,s

from v1;

原网站

版权声明
本文为[Samooyou]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/184/202207030930209114.html

当前位置：网站首页>Flink-- custom function

Flink-- custom function

Custom scalar functions （UDF）

Definition

Business code

Register to use

To write SQL sentence

Custom aggregate functions （UDAF）

Definition

AggregateFunction It works as follows .

AggregationFunction Require methods that must be implemented ：

AggregateFunction Core interface method , As shown below ：

Write business logic code

Register to use

To write SQL sentence

Custom table valued functions （UDTF）

Definition

Business code

Multiline return

Multi column return

SQL grammar

Register to use

To write SQL sentence

边栏推荐

猜你喜欢

随机推荐