当前位置:网站首页>[hard core dry goods] which company is better in data analysis? Choose pandas or SQL
[hard core dry goods] which company is better in data analysis? Choose pandas or SQL
2022-07-05 19:33:00 【Xinyi 2002】
Another week , Today, Xiaobian is going to talk about Pandas
and SQL
Grammatical differences between , I believe for many data analysts , Whether it's Pandas
Module or SQL
, They are all very many tools used in daily study and work , Of course, we can also be in Pandas
From the module SQL
sentence , By calling read_sql()
Method
Want to get the source code of this tutorial , It can be answered in the background of official account 【20220704】 Can get
Building a database
First we pass SQL
Statement is creating a new database , I'm sure everyone knows the basic grammar ,
CREATE TABLE Table name (
Field name data type ...
)
Let's take a look at the specific code
import pandas as pd
import sqlite3
connector = sqlite3.connect('public.db')
my_cursor = connector.cursor()
my_cursor.executescript("""
CREATE TABLE sweets_types
(
id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY (id)
);
... Limited space , Refer to the source code for details ...
""")
At the same time, we also insert data into these new tables , The code is as follows
my_cursor.executescript("""
INSERT INTO sweets_types(name) VALUES
('waffles'),
('candy'),
('marmalade'),
('cookies'),
('chocolate');
... Limited space , Refer to the source code for details ...
""")
We can view the new table through the following code , And convert it to DataFrame
Data set in format , The code is as follows
df_sweets = pd.read_sql("SELECT * FROM sweets;", connector)
output
We have built a total of 5 Data sets , It mainly involves desserts 、 Types of desserts and data of processing and storage , For example, the data set of desserts mainly includes the weight of desserts 、 Sugar content 、 Production date and expiration time 、 Cost and other data , as well as
df_manufacturers = pd.read_sql("SELECT * FROM manufacturers", connector)
output
The data set of processing involves the main person in charge and contact information of the factory , The warehouse data set involves the detailed address of the warehouse 、 City location, etc
df_storehouses = pd.read_sql("SELECT * FROM storehouses", connector)
output
And the dessert category data set ,
df_sweets_types = pd.read_sql("SELECT * FROM sweets_types;", connector)
output
Data screening
Screening of simple conditions
Next, let's do some data screening , For example, the weight of desserts is equal to 300 The name of dessert , stay Pandas
The code in the module looks like this
# Convert data type
df_sweets['weight'] = pd.to_numeric(df_sweets['weight'])
# Output results
df_sweets[df_sweets.weight == 300].name
output
1 Mikus
6 Soucus
11 Macus
Name: name, dtype: object
Of course, we can also pass pandas
In the middle of read_sql()
Method to call SQL
sentence
pd.read_sql("SELECT name FROM sweets WHERE weight = '300'", connector)
output
Let's look at a similar case , The screening cost is equal to 100 The name of dessert , The code is as follows
# Pandas
df_sweets['cost'] = pd.to_numeric(df_sweets['cost'])
df_sweets[df_sweets.cost == 100].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost = '100'", connector)
output
Milty
For text data , We can also further screen out the data we want , The code is as follows
# Pandas
df_sweets[df_sweets.name.str.startswith('M')].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
Milty
Mikus
Mivi
Mi
Misa
Maltik
Macus
Of course. SQL
Wildcards in statements ,%
Means to match any number of letters , and _
Means to match any letter , The specific differences are as follows
# SQL
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M%'", connector)
output
pd.read_sql("SELECT name FROM sweets WHERE name LIKE 'M_'", connector)
output
Screening of complex conditions
Let's take a look at data filtering with multiple conditions , For example, we want the weight to be equal to 300 And the cost price is controlled at 150 The name of dessert , The code is as follows
# Pandas
df_sweets[(df_sweets.cost == 150) & (df_sweets.weight == 300)].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost = '150' AND weight = '300'", connector)
output
Mikus
Or the cost price can be controlled within 200-300 Dessert name between , The code is as follows
# Pandas
df_sweets[df_sweets['cost'].between(200, 300)].name
# SQL
pd.read_sql("SELECT name FROM sweets WHERE cost BETWEEN '200' AND '300'", connector)
output
If it comes to sorting , stay SQL
It uses ORDER BY
sentence , The code is as follows
# SQL
pd.read_sql("SELECT name FROM sweets ORDER BY id DESC", connector)
output
And in the Pandas
What is called in the module is sort_values()
Method , The code is as follows
# Pandas
df_sweets.sort_values(by='id', ascending=False).name
output
11 Macus
10 Maltik
9 Sor
8 Co
7 Soviet
6 Soucus
5 Soltic
4 Misa
3 Mi
2 Mivi
1 Mikus
0 Milty
Name: name, dtype: object
Select the dessert name with the highest cost price , stay Pandas
The code in the module looks like this
df_sweets[df_sweets.cost == df_sweets.cost.max()].name
output
11 Macus
Name: name, dtype: object
And in the SQL The code in the statement , We need to first screen out which dessert is the most expensive , Then proceed with further processing , The code is as follows
pd.read_sql("SELECT name FROM sweets WHERE cost = (SELECT MAX(cost) FROM sweets)", connector)
We want to see which cities are warehousing , stay Pandas
The code in the module looks like this , By calling unique()
Method
df_storehouses['city'].unique()
output
array(['Moscow', 'Saint-petersburg', 'Yekaterinburg'], dtype=object)
And in the SQL
The corresponding sentence is DISTINCT
keyword
pd.read_sql("SELECT DISTINCT city FROM storehouses", connector)
Data grouping Statistics
stay Pandas
Group statistics in modules generally call groupby()
Method , Then add a statistical function later , For example, it is to calculate the mean value of scores mean()
Method , Or summative sum()
Methods, etc. , For example, we want to find out the names of desserts produced and processed in more than one city , The code is as follows
df_manufacturers.groupby('name').name.count()[df_manufacturers.groupby('name').name.count() > 1]
output
name
Mishan 2
Name: name, dtype: int64
And in the SQL
The grouping in the statement is also GROUP BY
, If there are other conditions later , It's using HAVING
keyword , The code is as follows
pd.read_sql("""
SELECT name, COUNT(name) as 'name_count' FROM manufacturers
GROUP BY name HAVING COUNT(name) > 1
""", connector)
Data merging
When two or more datasets need to be merged , stay Pandas
Modules , We can call merge()
Method , For example, we will df_sweets
Data set and df_sweets_types
Merge the two data sets , among df_sweets
In the middle of sweets_types_id
Is the foreign key of the table
df_sweets.head()
output
df_sweets_types.head()
output
The specific data consolidation code is as follows
df_sweets_1 = df_sweets.merge(df_sweets_types, left_on='sweets_types_id', right_on='id')
output
We will further screen out chocolate flavored desserts , The code is as follows
df_sweets_1.query('name_y == "chocolate"').name_x
output
10 Misa
11 Sor
Name: name_x, dtype: object
and SQL
The sentence is relatively simple , The code is as follows
# SQL
pd.read_sql("""
SELECT sweets.name FROM sweets
JOIN sweets_types ON sweets.sweets_types_id = sweets_types.id
WHERE sweets_types.name = 'chocolate';
""", connector)
output
The structure of the data set
Let's take a look at the structure of the data set , stay Pandas
View directly in the module shape
Attribute is enough , The code is as follows
df_sweets.shape
output
(12, 10)
And in the SQL
In the sentence , It is
pd.read_sql("SELECT count(*) FROM sweets;", connector)
output
NO.1
Previous recommendation
Historical articles
8 Cool visual charts , Quickly write the visual analysis report that the boss likes to see
【 Hard core dry goods 】Pandas Data type conversion in modules
use Python among Plotly.Express The module draws several charts , I was really amazed !!
Share 、 Collection 、 give the thumbs-up 、 I'm looking at the arrangement ?
边栏推荐
- The binary string mode is displayed after the value with the field type of longtext in MySQL is exported
- 【AI 框架基础技术】自动求导机制 (Autograd)
- HiEngine:可媲美本地的云原生内存数据库引擎
- 【FAQ】华为帐号服务报错 907135701的常见原因总结和解决方法
- Inventory of the most complete low code / no code platforms in the whole network: Jiandao cloud, partner cloud, Mingdao cloud, Qingliu, xurong cloud, Jijian cloud, treelab, nailing · Yida, Tencent clo
- Xaas trap: all things serve (possible) is not what it really needs
- PHP利用ueditor实现上传图片添加水印
- Reflection and imagination on the notation like tool
- Microwave radar induction module technology, real-time intelligent detection of human existence, static micro motion and static perception
- How MySQL queries and modifies JSON data
猜你喜欢
【AI 框架基础技术】自动求导机制 (Autograd)
How about testing outsourcing companies?
Fuzor 2020 software installation package download and installation tutorial
Hiengine: comparable to the local cloud native memory database engine
webuploader文件上传 拖拽上传 进度监听 类型控制 上传结果监听控件
众昂矿业:2022年全球萤石行业市场供给现状分析
【FAQ】华为帐号服务报错 907135701的常见原因总结和解决方法
成功入职百度月薪35K,2022Android开发面试解答
Debezium系列之:记录mariadb数据库删除多张临时表debezium解析到的消息以及解决方法
Hiengine: comparable to the local cloud native memory database engine
随机推荐
Debezium系列之:IDEA集成词法和语法分析ANTLR,查看debezium支持的ddl、dml等语句
Hiengine: comparable to the local cloud native memory database engine
aggregate
测试外包公司怎么样?
okcc呼叫中心有什么作用
How to convert word into PDF? Word to PDF simple way to share!
14、用户、组和权限(14)
Inventory of the most complete low code / no code platforms in the whole network: Jiandao cloud, partner cloud, Mingdao cloud, Qingliu, xurong cloud, Jijian cloud, treelab, nailing · Yida, Tencent clo
Postman core function analysis - parameterization and test report
Apprentissage du projet MMO I: préchauffage
Go语言 | 02 for循环及常用函数的使用
Fundamentals of shell programming (Part 8: branch statements -case in)
flume系列之:拦截器过滤数据
40000 word Wenshuo operator new & operator delete
Worthy of being a boss, byte Daniel spent eight months on another masterpiece
开源 SPL 消灭数以万计的数据库中间表
C application interface development foundation - form control (6) - menu bar, toolbar and status bar controls
C# 语言的基本语法结构
Android interview, Android audio and video development
爬虫练习题(二)