当前位置:网站首页>[hard core dry goods] which company is better in data analysis? Choose pandas or SQL
[hard core dry goods] which company is better in data analysis? Choose pandas or SQL
2022-07-05 19:33:00 【Xinyi 2002】
Another week , Today, Xiaobian is going to talk about Pandas
and SQL
Grammatical differences between , I believe for many data analysts , Whether it's Pandas
Module or SQL
, They are all very many tools used in daily study and work , Of course, we can also be in Pandas
From the module SQL
sentence , By calling read_sql()
Method
Want to get the source code of this tutorial , It can be answered in the background of official account 【20220704】 Can get
Building a database
First we pass SQL
Statement is creating a new database , I'm sure everyone knows the basic grammar ,
CREATE TABLE Table name (
Field name data type ...
)
Let's take a look at the specific code
import pandas as pd
import sqlite3
connector = sqlite3.connect('public.db')
my_cursor = connector.cursor()
my_cursor.executescript("""
CREATE TABLE sweets_types
(
id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY (id)
);
... Limited space , Refer to the source code for details ...
""")
At the same time, we also insert data into these new tables , The code is as follows
my_cursor.executescript("""
INSERT INTO sweets_types(name) VALUES
('waffles'),
('candy'),
('marmalade'),
('cookies'),
('chocolate');
... Limited space , Refer to the source code for details ...
""")
We can view the new table through the following code , And convert it to DataFrame
Data set in format , The code is as follows
df_sweets = pd.read_sql("SELECT * FROM sweets;", connector)
output
We have built a total of 5 Data sets , It mainly involves desserts 、 Types of desserts and data of processing and storage , For example, the data set of desserts mainly includes the weight of desserts 、 Sugar content 、 Production date and expiration time 、 Cost and other data , as well as
df_manufacturers = pd.read_sql("SELECT * FROM manufacturers", connector)
output
The data set of processing involves the main person in charge and contact information of the factory , The warehouse data set involves the detailed address of the warehouse 、 City location, etc
df_storehouses = pd.read_sql("SELECT * FROM storehouses", connector)
output
And the dessert category data set ,
df_sweets_types = pd.read_sql("SELECT * FROM sweets_types;", connector)
output
Data screening
Screening of simple conditions
Next, let's do some data screening , For example, the weight of desserts is equal to 300 The name of dessert , stay Pandas
The code in the module looks like this
# Convert data type
df_sweets['weight'] = pd.to_numeric(df_sweets['weight'])
# Output results
df_sweets[df_sweets.weight == 300].name
output
1 Mikus
6 Soucus
11 Macus
Name: name, dtype: object
Of course, we can also pass pandas
In the middle of read_sql()
Method to call SQL
sentence
pd.read_sql("SELECT name FROM sweets WHERE weight = '300'", connector)
output
Let's look at a similar case , The screening cost is equal to 100 The name of dessert , The code is as follows
# Pandas
df_sweets['cost'] = pd.to_numeric(df_sweets['cost'])
df_sweets[df_sweets.cost == 100].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost = '100'", connector)
output
Milty
For text data , We can also further screen out the data we want , The code is as follows
# Pandas
df_sweets[df_sweets.name.str.startswith('M')].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
Milty
Mikus
Mivi
Mi
Misa
Maltik
Macus
Of course. SQL
Wildcards in statements ,%
Means to match any number of letters , and _
Means to match any letter , The specific differences are as follows
# SQL
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M_'", connector)
output
Screening of complex conditions
Let's take a look at data filtering with multiple conditions , For example, we want the weight to be equal to 300 And the cost price is controlled at 150 The name of dessert , The code is as follows
# Pandas
df_sweets[(df_sweets.cost == 150) & (df_sweets.weight == 300)].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost = '150' AND weight = '300'", connector)
output
Mikus
Or the cost price can be controlled within 200-300 Dessert name between , The code is as follows
# Pandas
df_sweets[df_sweets['cost'].between(200, 300)].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost BETWEEN '200' AND '300'", connector)
output
If it comes to sorting , stay SQL
It uses ORDER BY
sentence , The code is as follows
# SQL
pd.read_sql("SELECT name FROM sweets ORDER BY id DESC", connector)
output
And in the Pandas
What is called in the module is sort_values()
Method , The code is as follows
# Pandas
df_sweets.sort_values(by='id', ascending=False).name
output
11 Macus
10 Maltik
9 Sor
8 Co
7 Soviet
6 Soucus
5 Soltic
4 Misa
3 Mi
2 Mivi
1 Mikus
0 Milty
Name: name, dtype: object
Select the dessert name with the highest cost price , stay Pandas
The code in the module looks like this
df_sweets[df_sweets.cost == df_sweets.cost.max()].name
output
11 Macus
Name: name, dtype: object
And in the SQL The code in the statement , We need to first screen out which dessert is the most expensive , Then proceed with further processing , The code is as follows
pd.read_sql("SELECT name FROM sweets WHERE cost = (SELECT MAX(cost) FROM sweets)", connector)
We want to see which cities are warehousing , stay Pandas
The code in the module looks like this , By calling unique()
Method
df_storehouses['city'].unique()
output
array(['Moscow', 'Saint-petersburg', 'Yekaterinburg'], dtype=object)
And in the SQL
The corresponding sentence is DISTINCT
keyword
pd.read_sql("SELECT DISTINCT city FROM storehouses", connector)
Data grouping Statistics
stay Pandas
Group statistics in modules generally call groupby()
Method , Then add a statistical function later , For example, it is to calculate the mean value of scores mean()
Method , Or summative sum()
Methods, etc. , For example, we want to find out the names of desserts produced and processed in more than one city , The code is as follows
df_manufacturers.groupby('name').name.count()[df_manufacturers.groupby('name').name.count() > 1]
output
name
Mishan 2
Name: name, dtype: int64
And in the SQL
The grouping in the statement is also GROUP BY
, If there are other conditions later , It's using HAVING
keyword , The code is as follows
pd.read_sql("""
SELECT name, COUNT(name) as 'name_count' FROM manufacturers
GROUP BY name HAVING COUNT(name) > 1
""", connector)
Data merging
When two or more datasets need to be merged , stay Pandas
Modules , We can call merge()
Method , For example, we will df_sweets
Data set and df_sweets_types
Merge the two data sets , among df_sweets
In the middle of sweets_types_id
Is the foreign key of the table
df_sweets.head()
output
df_sweets_types.head()
output
The specific data consolidation code is as follows
df_sweets_1 = df_sweets.merge(df_sweets_types, left_on='sweets_types_id', right_on='id')
output
We will further screen out chocolate flavored desserts , The code is as follows
df_sweets_1.query('name_y == "chocolate"').name_x
output
10 Misa
11 Sor
Name: name_x, dtype: object
and SQL
The sentence is relatively simple , The code is as follows
# SQL
pd.read_sql("""
SELECT sweets.name FROM sweets
JOIN sweets_types ON sweets.sweets_types_id = sweets_types.id
WHERE sweets_types.name = 'chocolate';
""", connector)
output
The structure of the data set
Let's take a look at the structure of the data set , stay Pandas
View directly in the module shape
Attribute is enough , The code is as follows
df_sweets.shape
output
(12, 10)
And in the SQL
In the sentence , It is
pd.read_sql("SELECT count(*) FROM sweets;", connector)
output
NO.1
Previous recommendation
Historical articles
8 Cool visual charts , Quickly write the visual analysis report that the boss likes to see
【 Hard core dry goods 】Pandas Data type conversion in modules
use Python among Plotly.Express The module draws several charts , I was really amazed !!
Share 、 Collection 、 give the thumbs-up 、 I'm looking at the arrangement ?
边栏推荐
- Necessary skills for interview in large factories, 2022android will not die, I will not fall
- Hiengine: comparable to the local cloud native memory database engine
- Microwave radar induction module technology, real-time intelligent detection of human existence, static micro motion and static perception
- 【C语言】字符串函数及模拟实现strlen&&strcpy&&strcat&&strcmp
- JAD的安装、配置及集成IDEA
- 不愧是大佬,字节大牛耗时八个月又一力作
- Xaas trap: all things serve (possible) is not what it really needs
- 成功入职百度月薪35K,2022Android开发面试解答
- redis集群模拟消息队列
- 100million single men and women supported an IPO with a valuation of 13billion
猜你喜欢
太牛了,看这篇足矣了
【obs】QString的UTF-8中文转换到blog打印 UTF-8 char*
Successful entry into Baidu, 35K monthly salary, 2022 Android development interview answer
IBM has laid off 40 + year-old employees in a large area. Mastering these ten search skills will improve your work efficiency ten times
微波雷达感应模块技术,实时智能检测人体存在,静止微小动静感知
不愧是大佬,字节大牛耗时八个月又一力作
The relationship between temperature measurement and imaging accuracy of ifd-x micro infrared imager (module)
IFD-x 微型红外成像仪(模块)关于温度测量和成像精度的关系
Android interview, Android audio and video development
Microwave radar induction module technology, real-time intelligent detection of human existence, static micro motion and static perception
随机推荐
司空见惯 - 英雄扫雷鼠
How to choose the notion productivity tools? Comparison and evaluation of notion, flowus and WOLAI
Worthy of being a boss, byte Daniel spent eight months on another masterpiece
手机开户选择哪家券商公司比较好哪家平台更安全
PG基础篇--逻辑结构管理(用户及权限管理)
如何实现游戏中的在线计时器和离线计时器
手把手教你处理 JS 逆向之图片伪装
Can Leica capture the high-end market offered by Huawei for Xiaomi 12s?
618“低调”谢幕,百秋尚美如何携手品牌跨越“不确定时代”?
关于 Notion-Like 工具的反思和畅想
What is the core value of testing?
What does software testing do? What are the requirements for learning?
[Collection - industry solutions] how to build a high-performance data acceleration and data editing platform
通过POI追加数据到excel中小案例
5 years of experience, 27 days of Android programmer interview, 2022 programmer advanced classic
The basic grammatical structure of C language
Necessary skills for interview in large factories, 2022android will not die, I will not fall
【AI 框架基础技术】自动求导机制 (Autograd)
【合集- 行业解决方案】如何搭建高性能的数据加速与数据编排平台
安卓面试宝典,2022Android面试笔试总结