当前位置:网站首页>iMeta | German National Cancer Center Gu Zuguang published a complex heatmap visualization method

iMeta | German National Cancer Center Gu Zuguang published a complex heatmap visualization method

2022-08-04 10:46:00 Liu Yongxin Adam

点击蓝字 关注我们

Complex Heatmap Visualization

4e40cff204b2a04d77f5b408ce3f6aa5.png

https://doi.org/10.1002/imt2.43

PROTOCOL

●2022年8月,Gu Zuguang at the German cancer research centeriMeta在线发表了题为“Complex heatmap visualization”的方法类文章.

● The study systematically introduced ComplexHeatmap RPack of features and functions for complex heatmap visualization.

●  第一/通讯作者:Gu Zuguang ([email protected])

摘   要

Heatmaps are a widely used statistical visualization method for matrix data,It is used to reveal similar patterns present in the matrix.在 R 编程语言中,There are many packages for drawing heatmaps.其中,ComplexHeatmap Package provides the richest toolset for building highly customizable heatmaps.ComplexHeatmap Multiple heatmaps can be automatically stitched and adjusted and complex annotations added,Easily make connections between multiple sources of information,因此 ComplexHeatmap It is widely used in data analysis in many fields,especially bioinformatics,to uncover key structures hidden in data.在本文中,我们全面介绍了 ComplexHeatmap 的现状,包括模块化设计、Abundant function and extensive application.

关键词:热图,可视化,聚类,Bioconductor,R软件包

亮   点

f1ebd888d71e85885445aa0fcbcddf51.png

● Complex heatmaps are a powerful visualization method used to reveal complex relationships between multiple pieces of information;

● 我们开发了一个名为 ComplexHeatmap 的 R 软件包,which provides a large number of heatmap visualization tools,Widely used in the field of bioinformatics;

● 在这篇论文中,我们系统性地介绍了 ComplexHeatmap Package the current features and functions.

视频解读

Bilibili:https://www.bilibili.com/video/BV1AW4y117kU  

Youtube:https://youtu.be/Y6l29nUIrAs

中文翻译、PPT、中/英文视频解读等扩展资料下载

请访问期刊官网:http://www.imeta.science/

全文解读

引  言

Heat map is a widely used by color as the main graphic elements to matrix data visualization method.There are two main categories of heatmap visualizations:空间热图(spatial heatmap)and grid heatmaps(grid heatmap) [1].The first kind of visual data in the spatial distribution pattern,such as global temperature distribution,or a user's click activity on a webpage.a type of contour map(choropleth map) The graph uses a heatmap to visualize certain statistical characteristics of a geographic area.The second type of heatmap is just a 2D rectangular layout using colors as graphic elements,The two dimensions corresponding to two types of variables.在大多数情况下,We reorder the rows and columns of the heatmap using some method,so that rows and columns with similar patterns are clustered together on the heatmap.大多数情况下,The sorting of heatmaps is mainly done using hierarchical clustering,因此,A grid heat map is also known as cluster heat maps(cluster heatmap).在本文中,We only discuss grid heatmaps.

Heat map visualization can be traced back to 19 世纪,At the time it was used to visualize various sociological statistics in different parts of Paris [2].然而,as a statistical visualization method,直到 1990 It was widely used after it was applied to bioinformatics research in the 1990s.自从 1998 An earlier paper on heatmap visualization of gene expression data published in [3] 以来,Heatmaps have been a standard tool for visualizing various omics data,such as gene expression profiling or DNA 甲基化数据.如今,Heatmaps are also used in various studies in genomics,例如,Visualize 3D (3D) Genome-level regulation at scale [4]、at the genome level DNA methylation signal [5],or the distribution pattern of genomic signals around a specific genomic interval [6].Rectangular layout is the most commonly used layout for heatmap visualization,此外,There are other layouts for heatmaps,如圆形布局 [7]、Spiral layout [8] and Hilbert curve layout [9].They are useful in specific scenarios.

R is a popular programming language for data analysis and visualization.在 R 中,There are many packages for making heatmaps.stats 包中的 heatmap() Functions provide the most basic but limited functionality.gplots 包中的 heatmap.2() 函数是 heatmap() 的增强版本,It supports adding more graphics on the heatmap,e.g. legend with data distribution,and show the difference between the median of the column or row.ggplot2 包 [10] 中的 geom_tile() The function also provides a simple implementation of the heatmap.There are also packages that provide more flexible control over heatmaps,例如 pheatmap 包中的 pheatmap() 函数和 NMF 包中的 aheatmap() 函数 [11].

Along with the rapid growth of data on the size and dimension,especially in the field of genomics,There is an urgent need for an effective visualization tool for integrated analysis or multi-omics analysis to correlate multiple types of data,In order to reveal the relationship between multiple data easily.From a heatmap visualization perspective,It can be reflected in the following two points.The first is support for heatmap annotations.Heatmap annotations contain additional information associated with the main heatmap.例如,In a typical heatmap visualization of gene expression data,Heat map line corresponds to the gene,Columns correspond to patients.Patients often have additional clinical metadata,例如年龄、gender or whether the patient has certain DNA 突变.Via annotations attached to the heatmap,很容易识别,例如,Is a set of genes showing high expression associated with a certain age interval,or do they have a specific type of DNA 突变.heatmap() 和 heatmap.2() Only supports a single annotation of a numeric or character vector.pheatmap() 和 aheamtap() Allow multiple annotations.superheat [12] 和 heatmap3 [13] Package supports more types of annotation graphics,such as points or lines,More accurate visual mapping of data with these additional supported annotation graphics.The second point is directly achieved by integrating multiple heatmaps simultaneously“Complex Heatmap Visualization”,This allows direct comparison of correlation patterns between heatmaps.例如,in our previous research [6] 中,We apply complex heatmap visualization to gene expression、DNA Data on methylation and various histone modifications,to reveal patterns of transcriptional regulation across multiple human tissues.For complex annotation and heatmap visualization,我们开发了一个名为 ComplexHeatmap [14] heatmap package.It not only supports basic annotation graphics found in other packages,A large number of additional complex annotation graphics are also supported,Even allows users to customize their own annotation graphics.ComplexHeatmap Provides a simple syntax to integrate multiple heatmaps and annotations,All heat map rows or columns automatically adjust.Its ease of use and comprehensive functionality make ComplexHeatmap widely used in bioinformatics.

ComplexHeatmap 项目开始于 2015 年,The corresponding paper is in 2016 年发表 [14].从那时起,It is gradually becoming a popular tool in the field of bioinformatics.it has been downloaded more than 50 万次,并且有其他 104 个 CRAN/Bioconductor packages that directly depend on it(数据收集于 2022年6月22日).ComplexHeatmap It has been widely used in biological research,例如癌症 [15]、COVID-19 [16]、单细胞 [17]、免疫学 [18] and other fields such as oceanography [19]and ecology [20]等.在过去的六年中,We have been actively maintainingComplexHeatmap,并添加了许多新功能.We also reworked the documentation into a comprehensive book (https://jokergoo.github.io/ComplexHeatmap-reference/book/).在本文中,我们将全面介绍 ComplexHeatmap 的现状,including its modular design、Abundant function and extensive application.

结果与讨论

模块化设计

ComplexHeatmap Design in a modular and object-oriented way.ComplexHeatmap There are three main classes defined in:Heatmap Class defines a complete heatmap with multiple components,HeatmapAnnotation Class defines heatmap annotations with specific graphs,as well as managing multiple heatmaps and heatmap annotations HeatmapList 类.

A single heat map is composed of heat maps main body and various heat map component(图 1A).The heatmap body is a two-dimensional arrangement with a single color grid,where each cell corresponds to a specific value in the input matrix.Heatmap component contains title、系统树图(dendrogram)、Text labels for matrix rows and columns and heatmap annotations.These components can be placed on all four sides of the heatmap body,Each component isHeatmapManaged by a specific method defined by an object.此外,Heatmap bodies can be split in rows and columns.

Heatmap annotations contain additional information related to the rows or columns of the heatmap.ComplexHeatmap Provides rich support for setting up different annotation graphics and defining new annotation graphics.Heatmap annotations can be placed on its four sides as components of the heatmap,Can also be connected independently and with heatmaps.HeatmapAnnotation object contains a set of SingleAnnotation A single annotation for a class definition(图 1B),where each individual annotation contains a specific type of graphic,该图形由 AnnotationFunction Classes are further defined.AnnotationFunction Classes provide a flexible way to define new annotation graphics,更重要的是,Custom annotation graphs can be automatically according to the main heat maps and reorder and segmentation.

ComplexHeatmap Main characteristic is that it supports the horizontal or vertical connect a set of heat maps and annotations,To facilitate visualization of associations between different data sources.HeatmapList A class is a container for a set of heatmaps and annotations(图 1C),It automatically adjusts the correspondence of rows or columns in multiple heatmaps and annotations.

5c960327cc5a8d54c33f2e29bf9c8b2a.png

图1. ComplexHeatmap Modular design of software packages

(A) A single heatmap with multiple components.(B) A column of heatmap annotations.(C) a list of heatmaps.

单个热图

ComplexHeatmap To configure a single heat map provides rich functionality.构造函数 Heatmap() to generate a single heatmap,并且返回一个 Heatmap 类的对象.Heatmap() The only mandatory parameter in is a matrix,It can be numeric or character.Heatmap() Provides a large number of additional parameters for customizing the heatmap.In addition to common features also available in other heatmap packages,Heatmap() Also has these unique features listed below.

⬤ Flexible clustering and matrix sorting

During routine data analysis,Heatmaps are often accompanied by hierarchical clustering of their matrices,so that features with similar patterns are placed in close proximity,and can be easily identified from the colors on the heatmap.在 Heatmap() 中,Hierarchical clustering can be specified in several ways:1. Via a predefined distance method,例如“euclidean”或“pearson”,2. Through a distance function,It can calculate the distance from two vectors,Or directly from an input matrix calculating distance, 3. By taking a matrix as input and returning adendrogramObject clustering function,4. through a cluster object,例如 hclust 对象或 dendrogram 对象.The last method is especially useful,because the user can use files generated or edited by other packages dendrogram 对象.例如,使用 dendextend 包 [21]Render with different colors dendrogram branch to highlight subtrees,或者可以在 dendrogram Add specific symbols on the node,and then rendered dendrogram objects can be directly Heatmap() 中使用(图2A ).

818461205cb978048b581633795109b7.png

图2. The use of a single heat

(A) A heatmap with row and column annotations.The columns of the heatmap are3个组的kmean cluster split,row is divided by a categorical variable and2个组的kmean clustering comprehensive segmentation.(B) Personalize your heatmap.图 A 和 B 中的Data is randomly generated.(C) The heatmap does not perform row and column segmentation.(D) Heatmap for row and column segmentation.图 C 和 D The use of the same matrix.

The tree is usually expressed as a binary tree,The order of the two branches on a node is arbitrary.Rotating two branches on a node does not change its mathematical representation,但会影响 dendrogram Global ordering of elements in.因此,Choose a rotation dendrogram Appropriate way of branching,或者换句话说,重新排序 dendrogram ,Helps to place matrix rows or columns with similar patterns close to each other in a heatmap,To improve the visualization effect.在默认情况下, Heatmap() 使用 reorder.dendrogram() 函数根据 dendrogram mean pair of branched submatrices dendrogram 重新排序,例如,在 dendrogram 的每个节点上,A branch with a smaller mean is always placed to the left of the node.Heatmap() 也支持 dendrogram 对象,因此,其他 dendrogram Reorder the method can be easily integrated.The user can generate a first dendrogram 对象,then apply the specific dendrogram 重新排序方法,例如来自 dendsort 包 [22],and finally integrate them into Heatmap() 中.

Note that hierarchical clustering is just a special way of reordering the rows and columns of a heatmap.Of course other methods of calculating the row and column order of a matrix can also be used.Heatmap() Allows users to set numeric or character indices to reorder heatmaps.For sorting matrix of popular software packages seriation [23] 和 biclust [24].

⬤ Heatmap segmentation

Heatmap segmentation is an efficient way to highlight grouping patterns.Since in the process of hierarchical clustering,when the current dendrogram add new leaves(leaf)或子树(sub-dendrogram)时,Calculations are just based ondendrogram中已经存在的元素,Not all all elements of the matrix.If the groupings in some datasets are only moderately different,then this will weaken their visualization.Heatmap segmentation can greatly improve the distinguishability of grouping patterns.ComplexHeatmap Provides a variety of ways to split the heatmap into rows and columns“切片”(slice)(图 2A-B):1. 设置 k均值聚类,which supports repeated operation kmean clustering several times to obtain a consensusk均值聚类结果,to reduce the effects of randomness;2. Set a categorical variable containing predefined grouping information.variable can be a vector or a data frame,The heatmap is then segmented by the combination of all levels in the categorical variable;3. If hierarchical clustering has been applied on the heatmap,you can specify a single number,然后 cutree() function is used to cut dendrogram .For the first two split methods,If clustering is enabled,then first perform hierarchical clustering within each heatmap slice,Then apply a second clustering to the heatmap slices based on their mean,Used to display slice-level hierarchies.

例如,图 2C The heatmap in the Glioblastoma dataset with four groups [25] Genes with significantly differential expression in(as a top note).The four groups are clustered by consistency(consensus clustering)预测的,The analysis shows that the results of the classification are very stable [26].Stable classification results are also obtained t-SNE 分析的支持,Four categories can be well separated(图 S1).但是,If by directly applying hierarchical clustering to all samples,then the four groups are not as well separated as expected,group of them 3(蓝色)和 4(紫色)some samples in the mix.在图 2D 中,use the same matrix,First of all, according to the classification results for split heat maps,And hierarchical clustering is applied in each column section respectively.作为比较,It does improve the effect of grouping mode.此外,在图 2D 中,OK also passed kmean clustering for segmentation.Group-specific gene expression patterns can now be easily observed.

⬤ Rendering heat maps for raster images

when we make so-called“High quality graphics”时,We usually to save graphics for vector graphics,For example in format pdf 或 svg保存.Vector graphics store detailed information about each graphic element,因此,If you save heatmaps generated from huge matrices as vector graphics,then the file will be very large.The full image will take a long time to be rendered by the image viewer.Due to the limited size and resolution of graphics devices,For large heatmaps,Adjacent grids in the heatmap are actually merged into a single pixel.因此,Need a kind of effective method to reduce the original image,Without having to keep large heat map of all the details.

Rasterization is a method of converting an image to red-绿-蓝 (RGB) method of color matrix of values.Suppose the heatmap matrix has nr 行和 nc 列.When it is in a graphics device(such as screen devices)when drawing above,The corresponding heatmap bodies are used separately pr 和 pc pixels as rows and columns.当 nr > pr 和/或 nc > pc 时,Multiple values ​​in a matrix are mapped onto a single pixel,其中 nr 和/或 nc can be reduced in some way to pr 和/或 pc.ComplexHeatmap Three methods are provided to reduce the graphics in the heatmap by rasterizing:1. First of all to write a resolution for the heat map pr × pc 的临时 png 图像,Then read the temp image as a raster object and populate it back to the heatmap.这种方法其实在 png On-device image downscaling.2. First reduce the original matrix to pr × pc 的大小,Then a single value in the reduced matrix can correspond to a different pixel.A matrix can be reduced using some specific methods,e.g. taking average or random values ​​from a submatrix.3.First generate a resolution of nr × nc 的临时图像,然后使用 magick package shrinks the image to pr × pc 大小,Finally read the downscaled image as a raster object and fill it into the heatmap.magick package provides numerous methods for resizing images,并且ComplexHeatmap support these methods.在 ComplexHeatmap 一书的“Section 2.8 Heatmap as raster image”中,Readers can find a detailed visual comparison of different image reduction methods.

⬤ Heatmap personalization

默认情况下,The heatmap body is made up of cells with different colors.ComplexHeatmap Allows users to personalize heatmaps by adding new layers.Heatmap() 中的参数 cell_fun 和 layer_fun Can be used to add custom graphics to heatmap cells when drawing heatmaps(图 2A).These two parameters function basically the same,但是 layer_fun 可以看做是 cell_fun 的矢量化版本.If the heat is very big,使用 layer_fun makes drawing faster.ComplexHeatmap 还提供了decorate_*() 系列函数,例如 decorate_annotation(),These functions can add graphics to the after heat map drawing any heat map component.在 ComplexHeatmap 中,Each heatmap component has its own plot area,Heatmaps are still logged after they are drawn.decorate_*() Can return to a specific drawing area,Then add a custom graphics.

As the back part will introduce,3D 热图、oncoPrint 和 UpSet plot 都使用了 layer_fun 进行实现.The density distribution heatmap and the enrichment heatmap use decorate_heatmap_body() to partially enhance its visualization.

⬤ Flexibility to set colors and legends

热图中,Colors are the elements that are mainly used to map to data.ComplexHeatmap Allows setting a colormap function by,The breakpoint and color corresponding to the values in the matrix and the precise mapping between colors.例如,User can define a colormap function that is symmetric about zero,This helps to identify the expression of up- and down-regulated genes,Or the user can define the same colormap function for different heatmaps,to make colors comparable between heatmaps.ComplexHeatmap The flexible configuration also allows for heat map legend,such as multi-color scheme legends and legends with custom graphics.读者请参考 ComplexHeatmap 书中的“Chapter 5. Legends”以获得更多信息.

热图注释

Heatmap annotations are an important part of heatmaps.It not only shows additional information related to heatmap rows and columns,but also allows for more types of graphs to be visualized.ComplexHeatmap For built-in heat map annotation patterns as well as the new custom annotation provides a flexible support.在图 3A 中,我们展示了 ComplexHeatmap Part of the default support annotation graphs(从左到右):

1. Heatmap-like annotations.在 ComplexHeatmap 中它们被称为“简单注释”.It can visualize numeric or character vectors or matrices.

2. 图像注释.It supports images in many formats,例如 png、svg、pdf 或 jpg.

3. Scatter annotation.It supports a single numeric vector or numeric matrix.

4. Line annotation.It supports a single numeric vector or numeric matrix.

5. Smooth line annotation.通过 loess method to smooth one or more sets of scatter points.

6. Column chart annotation.It also supports stacked column charts.

7. Percent annotation.It contains both text and histogram.

8. Boxplot annotations.

9. 文字注释.它支持使用 gridtext package to build custom styled text.

10. Histogram annotation.

11. Violin Plot Notes.It is used to visualize a set of distributions.Distributions can also be visualized via density distribution plots or heatmaps.

12. Joy plot 注释.在其中,The peaks of the distribution can be extended into the adjacent plot area.

13. Horizon chart 注释 [27].

2c351038ec3750bef69702b0e8756b6b.png

图3. Various heatmap annotations

(A) ComplexHeatmap Some heatmap annotation graphics supported in.(B) 标记注释.(C) Connection Notes.(D) Text box annotation.图 A 至图 D The data in are randomly generated

All built-in annotation graphics are anno_ Prefix named function implementation,例如,for scatter annotation anno_points().In addition to the notes listed above,ComplexHeatmap More complex annotations are also supported.例如,anno_mark() 支持所谓的“标记注释”,It can add text tags corresponding to part of the row or column(图 3B).ComplexHeatmap 中的 anno_link() 支持所谓的“Connection Notes”,It can map a set of independent plot areas to rows or columns in a heatmap.Connection annotations provide a general solution,More custom graphics can be associated with hot rows or columns.在图 3C 中,我们创建了三个 ggplot2 Graph to show the distribution of values ​​in three column groups,but only in each group of selected rows.在图 3D 中,A list of words is associated with each line group,where the font size corresponds to the importance of the word.The functionality of this textbox comment can be accessed via the function anno_textbox() 实现,and has been used in simplifyEnrichment 软件包 [28] used to summarize the general biological function of a set of genes.

构造函数 HeatmapAnnotation() 接受名称-Multiple annotations for value pairs.Simple annotations can be set directly as vectors、矩阵或数据框.Other complex annotations should be passed through the function anno_*() 来指定.下面的例子中,We present a heatmap annotation with four different annotation graphs.

ha = HeatmapAnnotation(
    foo = runif(10),
    bar = sample(letters[1:4], 10, replace = TRUE),
    pt  = anno_points(runif(10)),
    txt = anno_text(month.name[1:10])
)

Line annotations for heatmaps should take an extra parameter which = "row" or use a helper function rowAnnotation() 来设置.ComplexHeatmap Plenty of annotated graphics provided,不过,ComplexHeatmap Also provides an additional interface for creating custom annotation graphics.读者可以参阅 ComplexHeatmap 文档中的“Section 3.20 Implement new annotation functions”以获取更多详细信息.

热图列表

ComplexHeatmap An important feature of is that it supports concatenating multiple heatmaps and heatmap annotations,in order to visualize the associations between multiple pieces of information.ComplexHeatmap provides a way to use the operator + Simple syntax for concatenating heatmaps.该表达式返回一个 HeatmapList 对象,直接执行 HeatmapList The object draws a heatmap.An example usage is as follows:

Heatmap(...) +
    Heatmap(...) +
    rowAnnotation(...)

We previously introduced heatmap annotations as a component of a single heatmap.如上面的代码所示,Comment line can also be DuLiLian received a heat map list.A list of heatmaps can be passed through the operator %v% Make vertical connections.The number of connected heatmaps and annotations can be arbitrary.The sorting and segmentation of all heatmaps is adjusted by the main heatmap.The main heatmap defaults to the first numeric heatmap,Or the user is free to specify.

Heatmap(...) %v%
    Heatmap(...) %v%
    HeatmapAnnotation(...)

⬤ 可视化DNAThe link between methylation and gene expression

图 4A Demonstrates complex heatmap visualizations on a randomly generated dataset based on patterns found in unpublished research.It's visualized DNA 甲基化、基因表达、Complex associations between enhancers and gene-related information.在热图中,Each row corresponds to a differentially methylated region(DMR,It is a genomic region with significant methylation differences between tumor and control samples)或与 DMR The corresponding other genome properties.在图 4A There are the following heatmaps and annotations from left to right in:

1. DMR Heatmap of methylation levels in.

2. Single-column heatmap showing direction of differential methylation changes.“Hyper”Indicates high levels of methylation in tumor samples,“hypo”Indicates low levels of methylation in tumor samples.

3. Heatmap of gene expression.These genes are DMR nearest neighbor gene.

4. DMR Methylation in and expression of corresponding genes Pearson 相关性检验的 p One-column heatmap of values.

5. One-column heatmap of genotypes,such as protein-coding genes or lincRNA?

6. DMR Single-column heatmap of positions in the genome,例如,in promoter or intergenic regions?

7. DMR to related genes TSS 之间的距离,Annotate with scatter heatmaps.

8. enhancer and DMR Heatmap of coverage between.This value measures that each enhancer is DMR coverage ratio.

在图 4A 中,A list of heatmaps consists of differential methylation directions andkmean clustering for segmentation.kMean clustering grouping is to distinguish hypermethylated groups from hypomethylated groups.Complex heatmap showing hypermethylated DMR Enriched in intergenic and intragenic regions,but rarely overlaps with enhancers(行组“2,hypo”和“2,hyper”),而相比之下,低甲基化的 DMRsenriched in promoters and enhancers( 行组“1,hypo”和“1,hyper”).This may imply that enhancers and hypomethylation are associated with,And enhancer of methylation changes could affect the change of related gene transcription activity.

f988508331defc26f9dc575f681aed48.png

图4. Complex Heatmap Visualization

(A) 对 DNA 甲基化,Visualization of associations between gene expression and associated genomic information.为简单起见,图 A Include only methylation changes inversely correlated with gene expression DMR.(B) Visualizing associations between various descriptive statistics in multi-omics studies.

⬤ Visualizing associations between multiple descriptive statistics in multi-omics studies

Integration of multi-omics studies from genomics、Transcriptomic or epigenomics data,to find new connections in biological systems at different levels.因此,It is very important to visualize the potential connections between these different data types correctly and efficiently.图 4B shows a typical global(landscape)Visualization of Statistics,Where different statistics based on a single data type or multiple data types are displayed as a set of heatmaps and annotated graphics.图 4B Based on a study of glioblastoma data [29],该研究利用 DNA 甲基化、Gene expression and histone modification data studied four isoforms(图 4B 中索引为 1 至 4)Epigenomic differences between.The study produced four groups DMR,其中每组 DMR Compare methylation in a subtype to a normal sample.图 4B Visualize with heatmaps and annotations DMR various genomic properties of,In addition, the heatmap is segmented by the direction of methylation change.图 4B From left to right there are the following graphics:

1. in tumor samples and normal samples DMR The average methylation of heat maps.

2. 每个类别中 DMR 数量,使用柱状图.

3. significantly correlated with the expression of the nearest neighbors DMR 的百分比,Use a stacked column chart.

4. DMR to the transcription start site of the nearest gene (TSS) 的距离,Use a stacked column chart.

5. overlapping with genes or intergenic regions DMR 的百分比,Use a stacked column chart.

6. 与 CpG 岛 (CGI) 或 CGI shore重叠的 DMR 的百分比,Use a stacked column chart.

7. DMR Heatmap enriched to genomic signature regions.A positive value means over-enrichment(over-representation).以 Jaccard Coefficients are statistics,by random rearrangementDMRposition in the genome to obtain under random conditionsJaccard 系数的分布.最终 z Values ​​as statistics on heatmaps,计算为(观测值 - 期望值)/ 标准差.

8. overlapping with various chromatin states DMR 的百分比,Use a stacked column chart.

9. DMR enriched to chromatin state(chromatin states)的热图.同样,z Values ​​as statistics on heatmaps.

在图 4B in the heatmap list,group-specific DMR 的不同特征.例如,hyper-DMRs More negative correlations between methylation and gene expression,hypo-DMRs More located in intergenic regions and in an inactive chromatin state.总而言之,This visualization can quickly reveal potential associations in complex studies.

High level graphics

ComplexHeatmap Flexibility allows the user to be able to on the data of similar matrix structure to achieve the new high level pattern.ComplexHeatmap Some advanced graphics features have been implemented,We will cover in the following subsections.All of these functions are basically specific forms of heatmaps, 它们本质上是 Heatmap 对象,So they can be linked to generic heatmaps and annotations to form complex visualizations.

⬤ 密度热图

In order to visualize data distribution in matrix or list,We usually use boxplots or violin plots.然而,when there are a large number of distributions,Boxplots or violin plots will not be effective visualizations.densityHeatmap() The function uses color to map the density values ​​of a distribution,So it is able to visualize a large number of distributions(图 5A).在 densityHeatmap() 中,Similarity between distributions can be used Kolmogorov-Smirnov 距离来衡量.

⬤ 3D heatmap

3D is generally not recommended (3D) 可视化 [30],but in certain scenarios,3DVisualization can be very useful.ComplexHeatmap Support for converting normal heatmaps to 3D 热图.3D Heatmaps can be accessed via Heatmap3D() 函数绘制,它接受与 Heatmap() same parameter set,只是2DThe lattice now becomes3D的柱子(Projection in two-dimensional space).图 5A The density distribution in can also be visualized as3D柱状图(图 5B).我们建议在3D heatmap visualization,Map both the color and the height of the column to the data at the same time.

⬤ oncoPrint

oncoPrint() Function to visualize genomic variant events present in a set of genes and across multiple patients,Such as single base mutations (SNV)、fragment insertion or deletion (Indels) or copy number variation (CNV).oncoPrint() provides a general solution,Among them, users can customize the graphics corresponding to various genomic mutations(图 5C).默认情况下,Genes are ordered by the total number of mutations in the patient,while the patients were reordered to show the mutual exclusivity of genes among patients.由于 oncoPrint() 返回一个 Heatmap 对象,It can directly and other heat map of genomic data set to connect,例如基因表达,to reveal more complex genomic associations.

⬤ UpSet plot

与传统方法(A Venn diagram)相比,UpSet 图 [31] Provides a more efficient way to visualize relationships between large collections.ComplexHeatmap 中的 UpSet() function provides a pairUpSet plot原始工具 [32] 的增强实现.此外,UpSet() Able to handle multiple genome range list,This helps to reveal such as tissue specificity of chromatin modification model(图 5D).

⬤ Genome-level graphics

Genome-level heatmaps are frequently used in genomics research,例如,For visualization of global copy number variation profiles [33].The key to making a genome-level heatmap is to intervalize the genome,and normalize various genomic signals into genomic intervals to form a matrix,It can then be visualized as a normal heatmap.在图 5E 中,We present a genome-level visualization,which contains two heatmaps and multiple graphs annotated as heatmaps.

360762a83889e83bb4ee344445b80dfc.png

图5. ComplexHeatmap High-level graphics supported in

(A) Distribution density heatmap.前10The random data for the column comes from a normal distribution,后10The random data for the column comes from a uniform distribution.(B) 3D Frequency heatmap.The data used is from Fig. A.(C) oncoPrint.使用了来自于cBioPortallung adenocarcinoma data.Due to image size limitations,We show only a part of the gene and part of the patient.(D) UpSet plot.six human organizationsH3K4me3 ChIP-seq 的数据都来自于 Roadmap 项目.(E) Genome-level graphics.Data is randomly generated

Integration with other packages

⬤ Enriched heatmap

Enrichment heatmap(Enriched heatmap)Specialized to visualize the enrichment of a specific type of genomic signal over a range of genomic features [34].例如,Studying how chromatin modification works in genes TSS enriched around,或者 DNA How methylation is there CGI show hypomethylation around.EnrichedHeatmap 包 [6] 建立在 ComplexHeatmap 之上,It provides a general solution for displaying the spatial relationship of two types of genomic features.It also implements a special heatmap annotation function anno_enriched(),It is able to plot the average enrichment of all genomic features.compared to other similar tools,EnrichedHeatmap的独特之处在于,It is capable of processing discrete genomic signals,e.g. genome-wide segregation according to chromatin state(chromatin segmentation).更重要的是,The enrichment heatmap is also aHeatmap对象,因此它支持 Heatmap 类的所有特征,Such as heat map segmentation and connected to more heat.

图 6 shows the chromatin state distribution,基因 TSS 周围的 DNA Methylation distribution,Complex visualization and related gene expression.数据来自于 Roadmap 项目 [35].The heatmap is divided into three groups by row,其中的 TSS 分别处于 active 状态、bivalent 状态和 inactive 状态.Via complex heatmaps,可以很容易地观察到 active TSS associated with hypomethylation,and the corresponding genes are highly expressed.Bivalent TSS,Although is low methylation,but the gene expression is low.作为比较,inactive TSS Almost entirely methylation,But the expression of the corresponding gene is usually silenced.

605f6d10d95757e825e4abade764daf4.png

图6. A set of enrichment heatmaps and regular heatmaps

Heat map of chromatin states from left to right,DNA Heatmap of methylation and heatmap of gene expression.数据来自于 Roadmap 项目

⬤ InteractiveComplexHeatmap

ComplexHeatmap Only generate static graphics.R包 InteractiveComplexHeatmap [36] Static heatmaps can be converted very easily into interactive ones Shiny 应用程序,在其中,Users can interact with the heatmap directly by clicking or by region selection.Conversion from static heatmaps to interactive heatmaps can be done by htShiny() 函数完成,In the static heat generated after,htShiny() Can be executed directly without any arguments.This function applies to all ComplexHeatmap Generated heatmap.此外,InteractiveComplexHeatmap Support flexible design interactive heat maps of the operation of the user interface and user on the heat response.

总  结

Complex heatmaps are a powerful visualization method used to reveal complex relationships between multiple pieces of information.在本篇论文中,We systematically demonstrates ComplexHeatmap Various functions of the package.我们相信,ComplexHeatmap will continue to be a powerful tool in bioinformatics and even data science,Used to effectively display the structure hidden behind massive data.

致谢

This study wasNational Center for Tumor Diseases (NCT) Molecular Precision Oncology Program(德国)支持.

利益冲突

The authors do not declare any conflict of interest.

作者贡献

Gu Zuguang:Research topic proposal and design, 软件编写,可视化,数据分析,Thesis writing,Revised and reviewed.

代码和数据可用性

ComplexHeatmap The stable version of is released at https://bioconductor.org/packages/ComplexHeatmap/

The developer version is released athttps://github.com/jokergoo/ComplexHeatmap,

文档发布在

https://jokergoo.github.io/ComplexHeatmap-reference/book/

Drawings in the thesis1到图6The code is posted at

https://github.com/jokergoo/ComplexHeatmap_v2_paper_code

ORCID

0000-0002-7395-8709 (Zuguang Gu)

作者简介

27f682255b31b290cf1d0824143099a6.png

Gu Zuguang(第一兼通讯作者)

●  德国癌症研究中心staff scientist

In bioinformatics related journals,例如Bioinformatics, Nucleic Acids Research, Briefings in Bioinformatics,Genomics Proteomics & Bioinformatics等,and other high-impact journals,例如Nature Immunology, Nature Communications, Cancer Discovery,发表论文30余篇,累计引用7千余次.Independent development of numerous biological software packages,总下载量超3百万次.

更多推荐

(▼ 点击跳转)

iMeta文章中文翻译+视频解读

iMeta封面 | 宏蛋白质组学分析一站式工具集iMetaLab Suite(加拿大渥太华大学Figeys组)

690252678fe3b82233dd13fa13aef738.png

▸▸▸▸

iMeta | 东农吴凤芝/南农韦中等揭示生物炭抑制作物土传病害机理

16cf68ec6edcc921b7e91a45ba32941c.png

▸▸▸▸

iMeta | 华南农大陈程杰/夏瑞等发布TBtools构造Circos图的简单方法

cfcec9005a5dd172bb10f61bb4b01911.png

▸▸▸▸

iMeta | 叶茂/时玉等综述环境微生物组中胞内与胞外基因的动态穿梭与生态功能

6afb2bae2e151748a8ad8cc871283d4a.png

▸▸▸▸

iMeta | 南农沈其荣团队发布微生物网络分析和可视化R包ggClusterNet

1efba13aa93afc979e3bc6c31e8bfd41.png

▸▸▸▸

iMeta | 华南师大王璋组综述人体肺部微生物组与人类健康和疾病之间的隐秘关联

88323704a5204340a5a99e0143eea97c.png

▸▸▸▸

iMeta | 南科大夏雨组纳米孔测序揭示微生物可减轻高海拔冻土温室气体排放

877e6c8bb1b9a2b61766d444703c8052.png

▸▸▸▸

iMeta | 北大陈峰/陈智滨等发表口腔微生物组研究中各部位取样的实验方法(Protocol)

705a22fc48f3dae846be1e99b5095712.jpeg

▸▸▸▸

iMeta | 华南农大曾振灵/熊文广等-家庭中宠物犬与主人耐药基因的共存研究

6e95ff892258a5ba91569682cf648a46.png

▸▸▸▸

iMeta | 深圳先进院马迎飞组开发基于神经网络分析肠道菌群的方法

97ee1397e8ca5245ed8b787d8dbf7254.png

▸▸▸▸

iMeta | 南医大陈连民等综述从基因组功能角度揭示肠菌对复杂疾病的潜在影响

79d877999526bdd3e059df6a160484ad.png

期刊简介

28e44c4a447d8422dd6ec5b4d73b7c36.png

“iMeta” 是由威立、肠菌分会和本领域数百位华人科学家合作出版的开放获取期刊,主编由中科院微生物所刘双江研究员和荷兰格罗宁根大学傅静远教授担任.目的是发表原创研究、方法和综述以促进宏基因组学、微生物组和生物信息学发展.目标是发表前10%(IF > 15)的高影响力论文.期刊特色包括视频投稿、可重复分析、图片打磨、青年编委、前3年免出版费、50万用户的社交媒体宣传等.2022年2月正式创刊发行!

联系我们

iMeta主页:http://www.imeta.science

出版社:https://onlinelibrary.wiley.com/journal/2770596x
投稿:https://mc.manuscriptcentral.com/imeta
邮箱:[email protected]

 微信公众号 

iMeta

 责任编辑 

微微 

原网站

版权声明
本文为[Liu Yongxin Adam]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/216/202208040958488938.html