当前位置:网站首页>R for Data Science (notes) -- data transformation (used by filter)
R for Data Science (notes) -- data transformation (used by filter)
2022-06-24 19:24:00 【Shengxin Xiaopeng】

tidy Streaming data is becoming more and more popular , I think it's inconsistent with the pipeline %>% Use , Data processing verb , Has a very important relationship .
In the least amount of time , Solve the most important 、 The most common problem , I call this efficiency ; The remaining difficulties , I call it improvement .
filter The use of Verbs
The first thing to be clear is
filter Aiming at That's ok The operation of , select Is an operation on a column
On this basis , Carry out actual combat
Use nycflights13 The data in the package
###1. Propose a line separately
Characteristics of observation data
flights
#> # A tibble: 336,776 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> # … with 336,770 more rows, and 11 more variables: arr_delay <dbl>,
#> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
#### The withdrawal month is 1, The date is 1 The data of
filter(flights, month == 1, day == 1)
#> # A tibble: 842 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> # … with 836 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
#> # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Or directly assign a value to a variable
jan1 <- filter(flights, month == 1, day == 1)
Insert an operation skill , Put parentheses around the variables , You can also directly display , result
(jan1 <- filter(flights, month == 1, day == 1))
###2. Add logical operators to extract multiple lines
“&” yes “ and ”,“|” yes “ or ”,“!" Refer to “ No ”
Add the logical operator , Match the pipe , You can select multiple conditions .
for example , The month to be selected is 11 and 12 Observation options for , There will be two ways to write
filter(flights, month == 11 | month == 12)
perhaps
nov_dec <- filter(flights, month %in% c(11, 12))
%in% Is a match , It is often used in judgment
#### Reference material
https://r4ds.had.co.nz/transform.html
边栏推荐
- 多云模式并非“万能钥匙”
- 通过SCCM SQL生成计算机上一次登录用户账户报告
- 企业网络管理员必备的故障处理系统
- What do I mean when I link Mysql to report this error?
- Do you have all the basic embedded knowledge points that novices often ignore?
- 优维低代码:构件渲染子构件
- Introduction and download of nine npp\gpp datasets
- How to select the ECS type and what to consider?
- The script implements the automated deployment of raid0
- Volcano becomes spark default batch scheduler
猜你喜欢

Introduction and tutorial of SAS planet software

Kubernetes集群部署

The sharp sword of API management -- eolink

Volcano成Spark默认batch调度器

The efficiency of okcc call center data operation

Huawei machine learning service speech recognition function enables applications to paint "sound" and color

一次 MySQL 误操作导致的事故,高可用都不顶不住!

特尔携手微软发挥边云协同势能,推动AI规模化部署

AI时代生物隐私如何保护?马德里自治大学最新《生物特征识别中的隐私增强技术》综述,全面详述生物隐私增强技术

This is not safe
随机推荐
Introduction and download tutorial of administrative division vector data
Why useevent is not good enough
智能合约安全审计入门篇 —— delegatecall (2)
试驾 Citus 11.0 beta(官方博客)
finkcdc支持sqlserver2008么?
Application DDoS attack principle and defense method
Volcano成Spark默认batch调度器
Interprétation de la thèse (SR - gnn) Shift Robust GNNS: Overcoming the Limits of Localized Graph Training Data
Does version 2.2.0 support dynamic addition of MySQL synchronization tables
Freeswitch使用originate转dialplan
全链路业务追踪落地实践方案
Game between apifox and other interface development tools
Do you have all the basic embedded knowledge points that novices often ignore?
Module V
The group offsets of the Kafka of the Flink SQL. If the specified groupid is not mentioned
一文理解OpenStack网络
BSS应用程序云原生部署的8大挑战
Multi cloud mode is not a "master key"
Download steps of STM32 firmware library
R语言 4.1.0软件安装包和安装教程