当前位置:网站首页>Spark SQL chasing Wife Series (initial understanding)
Spark SQL chasing Wife Series (initial understanding)
2022-07-06 20:36:00 【Several storehouses of cabbage white】
Today is another day , Another day like you
Small talk
I haven't written an article for a long time , Although four or five days have passed , Writing an article today is very simple , Itchy hands are trying to write an article . Today's article is about Spark SQL Series of articles , About the last wife chasing series RDD Programming , Didn't give up , I will still write . The basic action operator and transformation operator have been written . Then there are accumulators, partitions, and other data types .
Today's update is Spark SQL, Why not write RDD Well , It's simple ,SQL To write well , Look for a job
Spark SQL What is it?
If nothing happens ,Spark SQL Will appear in the blog for a long time .
Let's first introduce what is Spark SQL Well
Spark SQl The predecessor was Shark, at that time Hadoop There are Hive, It can be used Hsql To replace mr Program to complete data analysis , Very convenient , The difficulty of development is greatly reduced , At that time Spark The ecosystem does not , So the predecessor was created Shark, later Shark No maintenance , Comprehensive in Spark Above to achieve Shark. That's what we have now Spark SQL.
Although I say Spark It has its own ecosystem , however Spark Most of it is in Hdfs above . As I said before ,MR Out of date , however hdfs This storage system is not out of date ,Spark It's just hdfs What's going on above .
Spark SQL Can do
After all, SQL Boy, Then from ETL To explain Spark SQL What can I do
- extract (Extract):Spark SQL From the file system (HDFS, The local system ), Get data from relational database or non relational data .Spark SQL The supported file types are csv,json,xml,Parquet,ORC,Avro etc. .
- transformation (transform): It is called data cleaning
- load (load): The processed data can be stored in different data sources .
Spark It is mainly used to deal with structured data , What is structured data ?
Structured data refers to a data set in which the record content has clear structural information and each record in the data set conforms to the structural specification , A data set logically expressed and implemented by a two-dimensional table structure . for instance , It refers to the fields, attributes, types and other information of the relational database .
Spark SQL Key points of
Data Frame. Can pass Data Frame Of API To analyze the data , be called DSL.
meanwhile , You can also put Data Frame Registration form , And then use SQL perhaps Hive SQL To do data analysis . For skilled use SQL We can get started more quickly
Now that we're here Data Frame, So let's introduce
Learn before RDD When ,RDD It is a collection of data , But I don't know what each piece of data is ,Data Frame Not so ,Data Frame It clearly stipulates that each piece of data consists of several named fields . For image comparison , To see pictures
RDD Only know what is stored Person Object of type ,Data Frame What's stored is each Person Object information .Data Frame Naked swimming ,RDD It's like wearing clothes . A glance , I can't see anything
Spark Session
Spark SQL The starting point of programming is Spark Session, Just beginning to learn Spark Core When , The entrance of programming is spark conf. Now? Spark SQL The entrance has changed .
Spark Session You can create Data Frame object , You can read external files and pass SQL Perform query analysis .
spark conf: Create context objects
spark Session: establish Spark Session object
establish Data Frame
By reading the json File to create df object . Have a look first json file
Start performing
Read people.json File to create Data Frame object
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Spark_Sql")
val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
// Read json file
val frame = sparkSession.read.json("date/people.json")
How to display the data of this file ?
frame.show()
You can see , And here it is , Just like the result of table query
summary
It will be updated tomorrow Spark SQL Of DSL and Sql Statement .
Update here today
I hope my subject 4 can be booked successfully
边栏推荐
- Detailed introduction of distributed pressure measurement system VIII: basic introduction of akka actor model
- rt-thread i2c 使用教程
- Kubernetes learning summary (20) -- what is the relationship between kubernetes and microservices and containers?
- 逻辑是个好东西
- Application layer of tcp/ip protocol cluster
- [diy] how to make a personalized radio
- Digital triangle model acwing 1018 Minimum toll
- Unity load AB package
- 小孩子學什麼編程?
- How to handle the timeout of golang
猜你喜欢
[weekly pit] information encryption + [answer] positive integer factorization prime factor
use. Net drives the OLED display of Jetson nano
Force deduction brush question - 98 Validate binary search tree
Detailed explanation of knowledge map construction process steps
【每周一坑】计算100以内质数之和 +【解答】输出三角形
Design your security architecture OKR
Intel 48 core new Xeon run point exposure: unexpected results against AMD zen3 in 3D cache
Utilisation de l'écran OLED
Leetcode question 283 Move zero
Rhcsa Road
随机推荐
Introduction of Xia Zhigang
【Yann LeCun点赞B站UP主使用Minecraft制作的红石神经网络】
5. 無線體內納米網:十大“可行嗎?”問題
2022 refrigeration and air conditioning equipment installation and repair examination contents and new version of refrigeration and air conditioning equipment installation and repair examination quest
PHP online examination system version 4.0 source code computer + mobile terminal
Catch ball game 1
Wechat applet common collection
8086 instruction code summary (table)
The mail command is used in combination with the pipeline command statement
Node. Js: express + MySQL realizes registration, login and identity authentication
RT thread I2C tutorial
(工作记录)2020年3月11日至2021年3月15日
Force deduction brush question - 98 Validate binary search tree
JS implementation force deduction 71 question simplified path
Value of APS application in food industry
use. Net drives the OLED display of Jetson nano
Intel 48 core new Xeon run point exposure: unexpected results against AMD zen3 in 3D cache
How to handle the timeout of golang
【每周一坑】输出三角形
Web security - payload