当前位置:网站首页>Do you choose pandas or SQL for the top 1 of data analysis in your mind?
Do you choose pandas or SQL for the top 1 of data analysis in your mind?
2022-07-07 03:56:00 【Game programming】
author | Junxin
source | About data analysis and visualization
Today, Xiaobian is going to talk about Pandas
and SQL
Grammatical differences between , I believe for many data analysts , Whether it's Pandas
Module or SQL
, They are all very many tools used in daily study and work , Of course, we can also be in Pandas
From the module SQL
sentence , By calling read_sql()
Method .
Building a database
First we pass SQL
Statement is creating a new database , I'm sure everyone knows the basic grammar ,
CREATE TABLE Table name ( Field name data type ...)
Let's take a look at the specific code
import pandas as pdimport sqlite3connector = sqlite3.connect('public.db')my_cursor = connector.cursor()my_cursor.executescript("""CREATE TABLE sweets_types( id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY (id));... Limited space , Refer to the source code for details ...""")
At the same time, we also insert data into these new tables , The code is as follows
my_cursor.executescript("""INSERT INTO sweets_types(name) VALUES ('waffles'), ('candy'), ('marmalade'), ('cookies'), ('chocolate');... Limited space , Refer to the source code for details ...""")
We can view the new table through the following code , And convert it to DataFrame
Data set in format , The code is as follows
df_sweets = pd.read_sql("SELECT * FROM sweets;", connector)
output
We have built a total of 5 Data sets , It mainly involves desserts 、 Types of desserts and data of processing and storage , For example, the data set of desserts mainly includes the weight of desserts 、 Sugar content 、 Production date and expiration time 、 Cost and other data , as well as :
df_manufacturers = pd.read_sql("SELECT * FROM manufacturers", connector)
output
The data set of processing involves the main person in charge and contact information of the factory , The warehouse data set involves the detailed address of the warehouse 、 City location, etc .
df_storehouses = pd.read_sql("SELECT * FROM storehouses", connector)
output
And the dessert category data set ,
df_sweets_types = pd.read_sql("SELECT * FROM sweets_types;", connector)
output
Data screening
Screening of simple conditions
Next, let's do some data screening , For example, the weight of desserts is equal to 300 The name of dessert , stay Pandas
The code in the module looks like this
# Convert data type df_sweets['weight'] = pd.to_numeric(df_sweets['weight'])# Output results df_sweets[df_sweets.weight == 300].name
output
1 Mikus6 Soucus11 MacusName: name, dtype: object
Of course, we can also pass pandas
In the middle of read_sql()
Method to call SQL
sentence
pd.read_sql("SELECT name FROM sweets WHERE weight = '300'", connector)
output
Let's look at a similar case , The screening cost is equal to 100 The name of dessert , The code is as follows
# Pandasdf_sweets['cost'] = pd.to_numeric(df_sweets['cost'])df_sweets[df_sweets.cost == 100].name# SQLpd.read_sql("SELECT name FROM sweets WHERE cost = '100'", connector)
output
Milty
For text data , We can also further screen out the data we want , The code is as follows
# Pandasdf_sweets[df_sweets.name.str.startswith('M')].name# SQLpd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
MiltyMikusMiviMiMisaMaltikMacus
Of course. SQL
Wildcards in statements , %
Means to match any number of letters , and _
Means to match any letter , The specific differences are as follows
# SQLpd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M_'", connector)
output
Screening of complex conditions
Let's take a look at data filtering with multiple conditions , For example, we want the weight to be equal to 300 And the cost price is controlled at 150 The name of dessert , The code is as follows
# Pandasdf_sweets[(df_sweets.cost == 150) & (df_sweets.weight == 300)].name# SQLpd.read_sql("SELECT name FROM sweets WHERE cost = '150' AND weight = '300'", connector)
output
Mikus
Or the cost price can be controlled within 200-300 Dessert name between , The code is as follows
# Pandasdf_sweets[df_sweets['cost'].between(200, 300)].name# SQLpd.read_sql("SELECT name FROM sweets WHERE cost BETWEEN '200' AND '300'", connector)
output
If it comes to sorting , stay SQL
It uses ORDER BY
sentence , The code is as follows
# SQLpd.read_sql("SELECT name FROM sweets ORDER BY id DESC", connector)
output
And in the Pandas
What is called in the module is sort_values()
Method , The code is as follows
# Pandasdf_sweets.sort_values(by='id', ascending=False).name
output
11 Macus10 Maltik9 Sor8 Co7 Soviet6 Soucus5 Soltic4 Misa3 Mi2 Mivi1 Mikus0 MiltyName: name, dtype: object
Select the dessert name with the highest cost price , stay Pandas
The code in the module looks like this
df_sweets[df_sweets.cost == df_sweets.cost.max()].name
output
11 MacusName: name, dtype: object
And in the SQL The code in the statement , We need to first screen out which dessert is the most expensive , Then proceed with further processing , The code is as follows
pd.read_sql("SELECT name FROM sweets WHERE cost = (SELECT MAX(cost) FROM sweets)", connector)
We want to see which cities are warehousing , stay Pandas
The code in the module looks like this , By calling unique()
Method
df_storehouses['city'].unique()
output
array(['Moscow', 'Saint-petersburg', 'Yekaterinburg'], dtype=object)
And in the SQL
The corresponding sentence is DISTINCT
keyword
pd.read_sql("SELECT DISTINCT city FROM storehouses", connector)
Data grouping Statistics
stay Pandas
Group statistics in modules generally call groupby()
Method , Then add a statistical function later , For example, it is to calculate the mean value of scores mean()
Method , Or summative sum()
Methods, etc. , For example, we want to find out the names of desserts produced and processed in more than one city , The code is as follows
df_manufacturers.groupby('name').name.count()[df_manufacturers.groupby('name').name.count() > 1]
output
nameMishan 2Name: name, dtype: int64
And in the SQL
The grouping in the statement is also GROUP BY
, If there are other conditions later , It's using HAVING
keyword , The code is as follows
pd.read_sql("""SELECT name, COUNT(name) as 'name_count' FROM manufacturersGROUP BY name HAVING COUNT(name) > 1""", connector)
Data merging
When two or more datasets need to be merged , stay Pandas
Modules , We can call merge()
Method , For example, we will df_sweets
Data set and df_sweets_types
Merge the two data sets , among df_sweets
In the middle of sweets_types_id
Is the foreign key of the table
df_sweets.head()
output
df_sweets_types.head()
output
The specific data consolidation code is as follows
df_sweets_1 = df_sweets.merge(df_sweets_types, left_on='sweets_types_id', right_on='id')
output
We will further screen out chocolate flavored desserts , The code is as follows
df_sweets_1.query('name_y == "chocolate"').name_x
output
10 Misa11 SorName: name_x, dtype: object
and SQL
The sentence is relatively simple , The code is as follows
# SQLpd.read_sql("""SELECT sweets.name FROM sweetsJOIN sweets_types ON sweets.sweets_types_id = sweets_types.idWHERE sweets_types.name = 'chocolate';""", connector)
output
The structure of the data set
Let's take a look at the structure of the data set , stay Pandas
View directly in the module shape
Attribute is enough , The code is as follows
df_sweets.shape
output
(12, 10)
And in the SQL
In the sentence , It is
pd.read_sql("SELECT count(*) FROM sweets;", connector)
output
Looking back
Share, collect, like and watch
author :AI Technology base
Game programming , A game development favorite ~
If the picture is not displayed for a long time , Please use Chrome Kernel browser .
边栏推荐
- Lab1 configuration script
- 接口数据安全保证的10种方式
- 22. (ArcGIS API for JS) ArcGIS API for JS Circle Collection (sketchviewmodel)
- Index of MySQL
- [leetcode] 700 and 701 (search and insert of binary search tree)
- qt-线程等01概念
- 2022年上半年HIT行业TOP50
- Enter the rough outline of the URL question (continuously updated)
- Native MySQL
- QT 打开文件 使用 QFileDialog 获取文件名称、内容等
猜你喜欢
如何自定义Latex停止运行的快捷键
1200.Minimum Absolute Difference
再AD 的 界面顶部(菜单栏)创建常用的快捷图标
qt-线程等01概念
About Tolerance Intervals
枚举通用接口&枚举使用规范
Set WiFi automatic connection for raspberry pie
21. (article ArcGIS API for JS) ArcGIS API for JS rectangular acquisition (sketchviewmodel)
Machine learning notes - bird species classification using machine learning
ubuntu20安装redisjson记录
随机推荐
1200.Minimum Absolute Difference
ggplot 分面的细节调整汇总
史上最全学习率调整策略lr_scheduler
When QT uses qtooltip mouse to display text, the picture of the button will also be displayed and the prompt text style will be modified
QT 打开文件 使用 QFileDialog 获取文件名称、内容等
[leetcode] 450 and 98 (deletion and verification of binary search tree)
What is the experience of maintaining Wanxing open source vector database
19. (ArcGIS API for JS) ArcGIS API for JS line acquisition (sketchviewmodel)
Ggplot facet detail adjustment summary
MySQL的索引
Can the applet run in its own app and realize live broadcast and connection?
On file uploading of network security
Arduino droplet detection
tflite模型转换和量化
Gpt-3 is a peer review online when it has been submitted for its own research
Top 50 hit industry in the first half of 2022
浅谈网络安全之文件上传
CMB's written test - quantitative relationship
PIP download only, not install
Termux set up the computer to connect to the mobile phone. (knock the command quickly), mobile phone termux port 8022