当前位置:网站首页>Super outline exercises
Super outline exercises
2022-06-27 23:45:00 【Shengxin skill tree】
Our introductory course of student letter and online live data mining course have a history of more than three years , We have cultivated wave after wave of excellent students and students . The content shared in this issue is not what we talked about in class , Instead, I gave a super outline exercise that I could do on tiptoe , Inspire students to take the initiative to learn , Instead of just waiting to be fed .
Let's take a look at the excellent student Xuqian's sharing :
R Language super outline exercises
( Xuqian, an excellent student of the student trust skill tree )
This is one of the online live courses of Marathon teaching at present R Expansion questions of language part , The latest issue is next Monday :
- data mining (GEO,TCGA, unicellular )2022 year 6 Lunar field , Get a quick look at some bioinformatics Application Charts
- Introduction to student letters -2022 year 6 Lunar field , Your first bioinformatics lesson
I see that some students in the class have written a super outline problem solving method , Great idea , But now it is no longer beyond the outline , Next GEO Teacher Xiaojie will certainly emphasize this routine operation , Here are two new ideas for reference :
One 、merge
No matter soft and exp Medium ID Have you been , Direct cross merge , No more %in% And row operation , The code is as follows :
exp <- read.csv("exp.csv")
anno <- read.table("soft.txt",header = T,sep = "\t")### The annotation file is consistent with the column name corresponding to the expression matrix probe name
colnames(exp)[1] <- "ID"
exp_new <- merge(exp,anno,by="ID")
## According to the same idea , Keep only the first gene name
exp_new <- exp_new[!duplicated(exp_new$GeneName),]
rownames(exp_new) <- exp_new$GeneName ### The gene name changes to the line name
exp_new <- as.matrix(exp_new[,2:7])### To a matrix
Two 、 Sacrifice our artifact tidyverse
Treatment of duplicate genes , In practice, I prefer to take the line with the largest average value . So you can use artifact dplyr And other packages to do , so to speak Hadley God changed R The ecology of . The specific code is as follows , You will know how to get there by running step by step , You can select the code before the pipe symbol , Press Ctrl+Enter, In this way, the unchecked code will not run , Add them one by one to see what the pipe symbol operates :
library(dplyr)
library(tibble)
colnames(exp)[1] <- "ID"
exp_new <- exp %>%
# Merge probe information
inner_join(anno,by="ID") %>%
# Remove superfluous information ,select Support to select by column name and column number at the same time
select(c(GeneName,2:7)) %>%
#· Add a row , The content is the average number of each line
mutate(rowMean =rowMeans(.[,-1])) %>%
# Rank the average value of expression quantity from the largest to the smallest
arrange(desc(rowMean)) %>%
# duplicate removal ,GeneName Leave the first
distinct(GeneName,.keep_all = T) %>%
#GeneName Convert to row name
column_to_rownames(var="GeneName") %>%
# Reverse the selection of the column that removes the average
select(-rowMean)
After the pipe symbol . It can represent the data passed in before the pipe symbol , If the tidyverse All functions of should be omitted , Default first parameter , If you call another function , use . Just replace it . give the result as follows :
Thought comes from the fruit teacher .
边栏推荐
- CUDA error:out of memory caused by insufficient video memory of 6G graphics card
- Technical implementation process of easycvr platform routing log function [code attached]
- 如何设置企业微信群机器人定时发消息?
- Started a natural language model bloom
- Cornernet understands from simple to profound
- Detailed explanation of MATLAB axis coordinate axis related settings
- N methods for obtaining effective length of genes
- apipost脚本使用讲解一~全局变量
- MySQL character set
- vivado 如何添加时序约束
猜你喜欢

c语言之字符串数组

【剑指Offer】47. 礼物的最大价值

c语言字符指针、字符串初始化问题

【Try to Hack】veil-evasion免杀

Stream + Nacos

Excel print settings public header

Discuz淘宝客网站模板/迪恩淘宝客购物风格商业版模板

【IDEA】IDEA 格式化 代码技巧 idea 格式化 会加 <p> 标签

Halcon's region: features of multiple regions (6)
![[Blue Bridge Cup training 100 questions] scratch digital calculation Blue Bridge Cup competition special prediction programming question collective training simulation exercise question No. 16](/img/7c/d4ea8747ce45fd2eb59a8f968653db.png)
[Blue Bridge Cup training 100 questions] scratch digital calculation Blue Bridge Cup competition special prediction programming question collective training simulation exercise question No. 16
随机推荐
Google Earth Engine(GEE) 03-矢量数据类型
c语言之字符串数组
企业架构师面试的100个问题
【tinyriscv verilator】分支移植到正点原子达芬奇开发板
c语言-日期格式化[通俗易懂]
fiddler 监听不到接口怎么办
【IDEA】IDEA 格式化 代码技巧 idea 格式化 会加 <p> 标签
【蓝桥杯集训100题】scratch数字计算 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第16题
【Try to Hack】veil-evasion免杀
The choice and trade-off between vector recall and literal recall
ICML 2022:ufrgs | optimistic linear support and subsequent features as the basis for optimal strategy transfer
文献综述如何挑选文献进行阅读,比如我的检索结果有200多篇根本看不完,如何进行文献挑选呢?...
零基础自学SQL课程 | CASE函数
居家办公竟比去公司上班还累?
Excel print settings public header
vivado 如何添加时序约束
C# Winform 读取Resources图片
First principles (optimal solution theory)
手把手教你移植 tinyriscv 到FPGA上
Google Earth engine (GEE) 03 vector data type