当前位置:网站首页>Sparksql of spark
Sparksql of spark
2022-06-13 03:14:00 【Yiliubei】
1、SparkSQL Introduce
Hive yes shark The forerunner of ,Shark yes SparkSQL The forerunner of ,SparkSQL The root cause is to completely break away from Hive The limitation of .
2、SparkSQL establish DataFrame The way
In the use of SparkSQL when Scala2.0+ How to create the version of
val session: SparkSession = SparkSession
.builder()
.appName(“test”)
.enableHiveSupport()
.getOrCreate()
session.sparkContext.setLogLevel(“Error”)
2.1、 Read json Format file creation DataFrame
session.read.format(“json”).load("./data/json")
2.2、 adopt json Format RDD establish DataFrame
Implicit conversion is required when reading , Import before reading import session.imlicits._
json Data to be converted to DataSet Format data
2.3、 Not json Format RDD establish DataFrame
By way of reflection will not be json Format RDD convert to DataFrame( Not recommended )
2.4、 Dynamically create Schema Will not json Format RDD convert to DataFrame

2.5、 Read parquet File creation DataFrame
Read parquet When you file , First read json file , And then to parquet Write to , And then to session.read.format(“parquet”).load("./data/parquet")
Read json Format data and non json Formatted data , To convert data into DataSet The format of ,DataFrame yes DataSet Of Row edition
3、Spark on hive
3.1、 To configure
In your spark Of client Terminal spark Of conf Under the table of contents , establish hive-site.xml File writes the following configuration , Note that nodes are also hive Server side
hive.metastore.uris
thrift://mynode1:9083
4、SparkSQL UDF And UDAF The difference between
UDF: User defined functions .
You can customize the class to implement UDFX Interface .
UDAF: User defined aggregate function .
Realization UDAF If a function wants to customize a class, it must inherit
UserDefinedAggregateFunction class
Open the window function
row_number over(partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The non label within each group is continuous
rank()over(partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The non label within each group is discontinuous
dense_rank() over (partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The labels in each group are continuous and the same data labels are the same
边栏推荐
- Six special GPU products for domestic aircraft passed the appraisal and review
- Linked list: delete the penultimate node of the linked list
- Least recently used cache (source force deduction)
- Installing the IK word breaker
- Available types in C #_ Unavailable type_ C double question mark_ C question mark point_ C null is not equal to
- My practice of SOA architecture project based on WCF
- Linked list: adding numbers in the linked list
- Redis server configuration
- Vs 2022 new features_ What's new in visual studio2022
- Linked list: palindrome linked list
猜你喜欢

Binary tree initialization code

Mvcc and bufferpool (VI)

JVM virtual machine stack (III)
![HEAP[xxx.exe]: Invalid address specified to RtlValidateHeap( 0xxxxxx, 0x000xx)](/img/c9/884aa008a185a471dfe252c0756fc1.png)
HEAP[xxx.exe]: Invalid address specified to RtlValidateHeap( 0xxxxxx, 0x000xx)

2-year experience summary to tell you how to do a good job in project management

Vscode liveserver use_ Liveserver startup debugging

Radio design and implementation in IVI system

Wechat applet switch style rewriting

JVM JMM (VI)

QML connecting to MySQL database
随机推荐
SQL execution process in MySQL (3)
String: number of substring palindromes
Vs Code modify default terminal_ Modify the default terminal opened by vs Code
Linked list: the entry node of the link in the linked list
MySQL create user authorization remote access
Vscode liveserver use_ Liveserver startup debugging
Coordinate location interface of wechat applet (II) map plug-in
C # simple understanding - method overloading and rewriting
技术博客,经验分享宝典
PK of dotnet architecture
Control scanner in Delphi
Review notes of RS data communication foundation STP
Linked lists: rearranging linked lists
The extra money we made in those years
How to manage the IT R & D department?
Supervisor -- Process Manager
Sword finger offer2: queue summary
Prometheus node_exporter安装并注册为服务
Linked list: the first coincident node of two linked lists
Ijkplayer source code ---setdatasource