当前位置:网站首页>Spark SQL chasing Wife Series (initial understanding)
Spark SQL chasing Wife Series (initial understanding)
2022-07-06 20:36:00 【Several storehouses of cabbage white】
Today is another day , Another day like you
Small talk
I haven't written an article for a long time , Although four or five days have passed , Writing an article today is very simple , Itchy hands are trying to write an article . Today's article is about Spark SQL Series of articles , About the last wife chasing series RDD Programming , Didn't give up , I will still write . The basic action operator and transformation operator have been written . Then there are accumulators, partitions, and other data types .
Today's update is Spark SQL, Why not write RDD Well , It's simple ,SQL To write well , Look for a job
Spark SQL What is it?
If nothing happens ,Spark SQL Will appear in the blog for a long time .
Let's first introduce what is Spark SQL Well
Spark SQl The predecessor was Shark, at that time Hadoop There are Hive, It can be used Hsql To replace mr Program to complete data analysis , Very convenient , The difficulty of development is greatly reduced , At that time Spark The ecosystem does not , So the predecessor was created Shark, later Shark No maintenance , Comprehensive in Spark Above to achieve Shark. That's what we have now Spark SQL.
Although I say Spark It has its own ecosystem , however Spark Most of it is in Hdfs above . As I said before ,MR Out of date , however hdfs This storage system is not out of date ,Spark It's just hdfs What's going on above .
Spark SQL Can do
After all, SQL Boy, Then from ETL To explain Spark SQL What can I do
- extract (Extract):Spark SQL From the file system (HDFS, The local system ), Get data from relational database or non relational data .Spark SQL The supported file types are csv,json,xml,Parquet,ORC,Avro etc. .
- transformation (transform): It is called data cleaning
- load (load): The processed data can be stored in different data sources .
Spark It is mainly used to deal with structured data , What is structured data ?
Structured data refers to a data set in which the record content has clear structural information and each record in the data set conforms to the structural specification , A data set logically expressed and implemented by a two-dimensional table structure . for instance , It refers to the fields, attributes, types and other information of the relational database .
Spark SQL Key points of
Data Frame. Can pass Data Frame Of API To analyze the data , be called DSL.
meanwhile , You can also put Data Frame Registration form , And then use SQL perhaps Hive SQL To do data analysis . For skilled use SQL We can get started more quickly
Now that we're here Data Frame, So let's introduce
Learn before RDD When ,RDD It is a collection of data , But I don't know what each piece of data is ,Data Frame Not so ,Data Frame It clearly stipulates that each piece of data consists of several named fields . For image comparison , To see pictures
RDD Only know what is stored Person Object of type ,Data Frame What's stored is each Person Object information .Data Frame Naked swimming ,RDD It's like wearing clothes . A glance , I can't see anything
Spark Session
Spark SQL The starting point of programming is Spark Session, Just beginning to learn Spark Core When , The entrance of programming is spark conf. Now? Spark SQL The entrance has changed .
Spark Session You can create Data Frame object , You can read external files and pass SQL Perform query analysis .
spark conf: Create context objects
spark Session: establish Spark Session object
establish Data Frame
By reading the json File to create df object . Have a look first json file
Start performing
Read people.json File to create Data Frame object
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Spark_Sql")
val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
// Read json file
val frame = sparkSession.read.json("date/people.json")
How to display the data of this file ?
frame.show()
You can see , And here it is , Just like the result of table query
summary
It will be updated tomorrow Spark SQL Of DSL and Sql Statement .
Update here today
I hope my subject 4 can be booked successfully
边栏推荐
- 解剖生理学复习题·VIII血液系统
- Build your own application based on Google's open source tensorflow object detection API video object recognition system (IV)
- 小孩子學什麼編程?
- How to select several hard coded SQL rows- How to select several hardcoded SQL rows?
- Technology sharing | packet capturing analysis TCP protocol
- C language games - three chess
- Tencent byte and other big companies interview real questions summary, Netease architects in-depth explanation of Android Development
- Introduction of Xia Zhigang
- Le lancement du jupyter ne répond pas après l'installation d'Anaconda
- Intel 48 core new Xeon run point exposure: unexpected results against AMD zen3 in 3D cache
猜你喜欢
Ideas and methods of system and application monitoring
【每周一坑】正整数分解质因数 +【解答】计算100以内质数之和
数字三角形模型 AcWing 1018. 最低通行费
OLED屏幕的使用
(工作记录)2020年3月11日至2021年3月15日
Implementation of packaging video into MP4 format and storing it in TF Card
Leetcode question 283 Move zero
【微信小程序】運行機制和更新機制
[DIY]如何制作一款个性的收音机
HMS core machine learning service creates a new "sound" state of simultaneous interpreting translation, and AI makes international exchanges smoother
随机推荐
(工作记录)2020年3月11日至2021年3月15日
Continuous test (CT) practical experience sharing
OLED屏幕的使用
SQL injection 2
[weekly pit] output triangle
数字三角形模型 AcWing 1018. 最低通行费
PHP online examination system version 4.0 source code computer + mobile terminal
Logic is a good thing
[weekly pit] calculate the sum of primes within 100 + [answer] output triangle
Wechat applet common collection
Trends of "software" in robotics Engineering
7、数据权限注解
2022 construction electrician (special type of construction work) free test questions and construction electrician (special type of construction work) certificate examination
Common doubts about the introduction of APS by enterprises
Force deduction brush question - 98 Validate binary search tree
Utilisation de l'écran OLED
知识图谱之实体对齐二
Core principles of video games
APS taps home appliance industry into new growth points
Anaconda安装后Jupyter launch 没反应&网页打开运行没执行