当前位置:网站首页>Sparksql of spark
Sparksql of spark
2022-06-13 03:14:00 【Yiliubei】
1、SparkSQL Introduce
Hive yes shark The forerunner of ,Shark yes SparkSQL The forerunner of ,SparkSQL The root cause is to completely break away from Hive The limitation of .
2、SparkSQL establish DataFrame The way
In the use of SparkSQL when Scala2.0+ How to create the version of
val session: SparkSession = SparkSession
.builder()
.appName(“test”)
.enableHiveSupport()
.getOrCreate()
session.sparkContext.setLogLevel(“Error”)
2.1、 Read json Format file creation DataFrame
session.read.format(“json”).load("./data/json")
2.2、 adopt json Format RDD establish DataFrame
Implicit conversion is required when reading , Import before reading import session.imlicits._
json Data to be converted to DataSet Format data
2.3、 Not json Format RDD establish DataFrame
By way of reflection will not be json Format RDD convert to DataFrame( Not recommended )
2.4、 Dynamically create Schema Will not json Format RDD convert to DataFrame
2.5、 Read parquet File creation DataFrame
Read parquet When you file , First read json file , And then to parquet Write to , And then to session.read.format(“parquet”).load("./data/parquet")
Read json Format data and non json Formatted data , To convert data into DataSet The format of ,DataFrame yes DataSet Of Row edition
3、Spark on hive
3.1、 To configure
In your spark Of client Terminal spark Of conf Under the table of contents , establish hive-site.xml File writes the following configuration , Note that nodes are also hive Server side
hive.metastore.uris
thrift://mynode1:9083
4、SparkSQL UDF And UDAF The difference between
UDF: User defined functions .
You can customize the class to implement UDFX Interface .
UDAF: User defined aggregate function .
Realization UDAF If a function wants to customize a class, it must inherit
UserDefinedAggregateFunction class
Open the window function
row_number over(partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The non label within each group is continuous
rank()over(partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The non label within each group is discontinuous
dense_rank() over (partition x1 order by x2) For the data in the table X1 grouping , according to X2 Sort , Label the data in each group , The label starts from... Within each group 1 Start , The labels in each group are continuous and the same data labels are the same
边栏推荐
- Linked list: delete the penultimate node of the linked list
- Introduction to Kestrel_ Introduction to kestrel web server
- Three ways to start WPF project
- Spoon database insert table operation
- Hash tables: metaphrases
- Control scanner in Delphi
- Technology blog, a treasure trove of experience sharing
- IOS development internal volume interview questions
- 二叉樹初始化代碼
- Entity framework extends the actual combat, small project reconfiguration, no trouble
猜你喜欢
The weight of the input and textarea components of the applet is higher than that of the fixed Z-index
Vs Code modify default terminal_ Modify the default terminal opened by vs Code
JS deconstruction assignment
Reading notes of effective managers
Keil removes annoying st link update tips
MySQL index optimization (4)
The extra money we made in those years
QML connecting to MySQL database
Data Governance Series 1: data governance framework [interpretation and analysis]
Age anxiety? How to view the 35 year old programmer career crisis?
随机推荐
C # simple understanding - method overloading and rewriting
JVM virtual machine stack (III)
Control scanner in Delphi
Using binary heap to implement priority queue
[JVM Series 2] runtime data area
C 10 new features_ C 10 new features
brew工具-“fatal: Could not resolve HEAD to a revision”错误解决
Wechat applet coordinate location interface usage (II) map interface
Mvcc and bufferpool (VI)
六款国产飞机专用GPU产品通过鉴定审查
Linked list: palindrome linked list
MySQL 8.0 installation free configuration method
Ijkplayer source code - remuxing
[JVM series 4] common JVM commands
Summary of the latest IOS interview questions in June 2020 (answers)
Introduction to redis (using redis, common commands, persistence methods, and cluster operations)
C simple understanding - generics
How to become a technological bull -- from the bull man
JVM class loader (2)
English语法_频率副词