当前位置:网站首页>Spark SQL chasing Wife Series (initial understanding)
Spark SQL chasing Wife Series (initial understanding)
2022-07-06 20:36:00 【Several storehouses of cabbage white】
Today is another day , Another day like you
Small talk
I haven't written an article for a long time , Although four or five days have passed , Writing an article today is very simple , Itchy hands are trying to write an article . Today's article is about Spark SQL Series of articles , About the last wife chasing series RDD Programming , Didn't give up , I will still write . The basic action operator and transformation operator have been written . Then there are accumulators, partitions, and other data types .
Today's update is Spark SQL, Why not write RDD Well , It's simple ,SQL To write well , Look for a job
Spark SQL What is it?
If nothing happens ,Spark SQL Will appear in the blog for a long time .
Let's first introduce what is Spark SQL Well
Spark SQl The predecessor was Shark, at that time Hadoop There are Hive, It can be used Hsql To replace mr Program to complete data analysis , Very convenient , The difficulty of development is greatly reduced , At that time Spark The ecosystem does not , So the predecessor was created Shark, later Shark No maintenance , Comprehensive in Spark Above to achieve Shark. That's what we have now Spark SQL.
Although I say Spark It has its own ecosystem , however Spark Most of it is in Hdfs above . As I said before ,MR Out of date , however hdfs This storage system is not out of date ,Spark It's just hdfs What's going on above .
Spark SQL Can do
After all, SQL Boy, Then from ETL To explain Spark SQL What can I do
- extract (Extract):Spark SQL From the file system (HDFS, The local system ), Get data from relational database or non relational data .Spark SQL The supported file types are csv,json,xml,Parquet,ORC,Avro etc. .
- transformation (transform): It is called data cleaning
- load (load): The processed data can be stored in different data sources .
Spark It is mainly used to deal with structured data , What is structured data ?
Structured data refers to a data set in which the record content has clear structural information and each record in the data set conforms to the structural specification , A data set logically expressed and implemented by a two-dimensional table structure . for instance , It refers to the fields, attributes, types and other information of the relational database .
Spark SQL Key points of
Data Frame. Can pass Data Frame Of API To analyze the data , be called DSL.
meanwhile , You can also put Data Frame Registration form , And then use SQL perhaps Hive SQL To do data analysis . For skilled use SQL We can get started more quickly
Now that we're here Data Frame, So let's introduce
Learn before RDD When ,RDD It is a collection of data , But I don't know what each piece of data is ,Data Frame Not so ,Data Frame It clearly stipulates that each piece of data consists of several named fields . For image comparison , To see pictures
RDD Only know what is stored Person Object of type ,Data Frame What's stored is each Person Object information .Data Frame Naked swimming ,RDD It's like wearing clothes . A glance , I can't see anything
Spark Session
Spark SQL The starting point of programming is Spark Session, Just beginning to learn Spark Core When , The entrance of programming is spark conf. Now? Spark SQL The entrance has changed .
Spark Session You can create Data Frame object , You can read external files and pass SQL Perform query analysis .
spark conf: Create context objects
spark Session: establish Spark Session object
establish Data Frame
By reading the json File to create df object . Have a look first json file
Start performing
Read people.json File to create Data Frame object
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Spark_Sql")
val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
// Read json file
val frame = sparkSession.read.json("date/people.json")
How to display the data of this file ?
frame.show()
You can see , And here it is , Just like the result of table query
summary
It will be updated tomorrow Spark SQL Of DSL and Sql Statement .
Update here today
I hope my subject 4 can be booked successfully
边栏推荐
- Anaconda安裝後Jupyter launch 沒反應&網頁打開運行沒執行
- Event center parameter transfer, peer component value transfer method, brother component value transfer
- 5. Wireless in vivo nano network: top ten "feasible?" problem
- Force deduction brush question - 98 Validate binary search tree
- Common doubts about the introduction of APS by enterprises
- [weekly pit] output triangle
- 报错分析~csdn反弹shell报错
- Tencent byte Alibaba Xiaomi jd.com offer got a soft hand, and the teacher said it was great
- Anaconda安装后Jupyter launch 没反应&网页打开运行没执行
- C language games - minesweeping
猜你喜欢
Why do novices often fail to answer questions in the programming community, and even get ridiculed?
Node. Js: express + MySQL realizes registration, login and identity authentication
知识图谱之实体对齐二
【每周一坑】信息加密 +【解答】正整数分解质因数
为什么新手在编程社区提问经常得不到回答,甚至还会被嘲讽?
[DIY]如何制作一款个性的收音机
Notes on beagleboneblack
Leetcode question 283 Move zero
Common doubts about the introduction of APS by enterprises
22-07-05 upload of qiniu cloud storage pictures and user avatars
随机推荐
Event center parameter transfer, peer component value transfer method, brother component value transfer
SSO single sign on
[weekly pit] calculate the sum of primes within 100 + [answer] output triangle
5. Nano - Net in wireless body: Top 10 "is it possible?" Questions
OLED屏幕的使用
逻辑是个好东西
【微信小程序】運行機制和更新機制
设计你的安全架构OKR
Gui Gui programming (XIII) - event handling
The mail command is used in combination with the pipeline command statement
[weekly pit] positive integer factorization prime factor + [solution] calculate the sum of prime numbers within 100
使用.Net分析.Net达人挑战赛参与情况
OLED屏幕的使用
2022 Guangdong Provincial Safety Officer C certificate third batch (full-time safety production management personnel) simulation examination and Guangdong Provincial Safety Officer C certificate third
C language games - three chess
Recyclerview not call any Adapter method :onCreateViewHolder,onBindViewHolder,
Unity load AB package
Catch ball game 1
【每周一坑】输出三角形
Pytest (3) - Test naming rules