当前位置:网站首页>Thesis learning record essay multi label lift
Thesis learning record essay multi label lift
2022-07-01 05:51:00 【LTA_ ALBlack】
original text : Share your understanding of the paper . See the original Zhang, M.-L., & Wu, L. (2015). LIFT: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 107–120.
Catalog
1. Understanding of original abstract
3.LIFT Algorithm details in method
4.LIFT Process and forecast of
1. Understanding of original abstract
In the last essay LSML Mentioned in "Label-Specific Features" Such a multi label neighborhood , However, when tracing the source of such problems , obviously , This article proposed by zhangminling LIFT yes "Label-Specific Features" The beginning of the multi tag application .
Papers mentioned , The problem of multi label learning processing is that a data row of each data set is composed of a single instance (feature vector) Express , At the same time, this instance corresponds to a group of tags . Most of the current multi label algorithms use the same scheme to learn from the same single feature data set . But the author believes that this is sub optimal .
However, this popular strategy might be suboptimal as each label is supposed to possess specific characteristics of its own
Each tag has its own characteristics , Therefore, using a single scheme will always violate the label " Specific Features " The possibility of .
This paper proposes a multi label algorithm using different label characteristics , This algorithm first clusters positive and negative instances , Then we train and test by querying the clustering results .
But this algorithm seems to overemphasize the characteristics of tags and ignore the possible correlation between tags , yes first-order Multi label . This is also LIFT The pain points . But many algorithms have been used since "Label-Specific Features" The tag correlation is introduced besides the feature of , This algorithm is supplemented .
First order (first-order) Strategy : Multi label classification uses one label to one label (label-by-label) Method handling , Then the coexistence of other tags is ignored , For example, multi label learning is decomposed into several independent binary classification problems . The main value of the first-order strategy lies in its conceptual simplicity 、 Efficient . On the other hand . The effectiveness of the results may not be the best
2. Basic symbols
| Symbol | dimension | meaning |
| \(n\) | Number of objects | |
| \(d\) | The eigenvector dimension of the original dataset | |
| \(q\) | Number of labels | |
| \(X\) | \(n \times d\) | Data matrix |
| \(Y\) | \(n \times q\) | Label matrix |
| \(\phi_{k}(\boldsymbol{x})\) | \(1 \times 2m_k\) | On the label \(l_k\) The new mapped feature space is constructed under the influence of positive and negative samples , Yes \(k \in [1,q]\) |
In addition, there are some symbols that represent the meaning of data sets , such as \(\mathcal{B}_{k}\) Indicates the basis label \(l_k\) The size of the constructed is \(n \times 2m_k\) The binary data set of , \(\mathcal{D}\) Represents the original dataset , \(\mathfrak{L}\) It is a two class learner , \(\boldsymbol{u}\) Represents the test data set .
3.LIFT Algorithm details in method
First select a label \(k\), Get a positive and negative sample The size of the composition is \(l \times 1\) Label column for , Where positive samples form a set \(\mathcal{P}_{k}\), Negative samples make up a set \(\mathcal{N}_{k}\), Yes :\[\begin{aligned}
\mathcal{P}_{k} &=\left\{\boldsymbol{x}_{i} \mid\left(\boldsymbol{x}_{i}, Y_{i}\right) \in \mathcal{D}, l_{k} \in Y_{i}\right\} \\ \mathcal{N}_{k} &=\left\{\boldsymbol{x}_{i} \mid\left(\boldsymbol{x}_{i}, Y_{i}\right) \in \mathcal{D}, l_{k} \notin Y_{i}\right\}\end{aligned} \tag{1}\] The data within these two sets are largely unbalanced (\(|\mathcal{P}_{k}|\neq |\mathcal{N}_{k}|\)), This is determined by the imbalance of the multi label matrix itself .
Then, cluster the two sets , This is for \(\mathcal{P}_{k}\) And \(\mathcal{N}_{k}\) The way to further mine information .
To gain insights on the properties o\(P_k\) and \(N_k\), LIFT chooses to employ clustering techniques which have been widely used as stand-alone tools for data analysis
Used by the original author K-Means. natural use K-Means We must consider a suitable \(k\) value , This choice is really a troublesome one , The original author adopted a heuristic algorithm :\[m_{k}^{+}=m_{k}^{-}=\left\lceil r \cdot \min \left\{\left|\mathcal{P}_{k}\right|,\left|\mathcal{N}_{k}\right|\right\}\right\rceil \tag{2}\] there \(m_{k}^{+}\) For the assembly \(\mathcal{P}_k\) Of K-Means Of \(k\) value , \(m_{k}^{-}\) For the assembly \(\mathcal{N}_k\) Of . there \(r\) It's a ratio, It is used to determine the proportion of clustering . Here, the number of cluster centers of positive and negative samples (\(k\)) It's the same , This solves the problem of class imbalance very well .
Multi-label learning tasks usually encounter the issue of class-imbalance [59], where the number of positive instances for each class label is much smaller than the number of negative ones, i.e. \(|P_{k}| ≪ |N_{k}|\). To mitigate potential risks brought by the class-imbalance problem, LIFT sets equivalent number of clusters for \(P_{k}\) and \(N_{k}\), i.e. \(m_{k}^{+}=m_{k}^{-}=m_{k}\). In this way, clustering information gained from positive instances as well as negative instances are treated with equal importance.
So we can get 2 The same number of cluster centers \(\{p^{k}_{1},p^{k}_{2},\dots,p^{k}_{m^{+}_k}\}\) And \(\{n^{k}_{1},n^{k}_{2},\dots,n^{k}_{m^{-}_k}\}\). Each value represents a \(d\) Dimension vector , Now let's arbitrarily take an instance from the original data set \(\boldsymbol{x}\), We can get the result here separately \(2m_k\) Distance between clustering centers , And then \(2m_k\) Cluster centers form a new \(1\times 2m_k\) About \(k\) Example \(\phi_{k}(\boldsymbol{x})\):\[
\phi_{k}(\boldsymbol{x})= {\left[d\left(\boldsymbol{x}, \boldsymbol{p}_{1}^{k}\right), \cdots, d\left(\boldsymbol{x}, \boldsymbol{p}_{m_{k}}^{k}\right), d\left(\boldsymbol{x}, \boldsymbol{n}_{1}^{k}\right), \cdots, d\left(\boldsymbol{x}, \boldsymbol{n}_{m_{k}}^{k}\right)\right]}
\tag{3} \] This set of constructed features is about labels \(l_k\) Of " label-specific features ", The original text modifies this structure : " which can be served as appropriate building blocks (prototypes) for the construction of label-specific features. "
notes : When you look up this sentence on the Internet , Get a few messages , Understand the "appropriate building blocks" It may be because the original features can be divided into one by one when mapping " block " mapping , When the last one " block " If the internal data volume is not enough for the number of a block, there is a related special processing . ( This detail may need to be understood and explained after looking at the code later )
Through this toss and turn, we can find , In the end, our \(n \times d\) Matrix \(\mathbf{X}\) It turns into \(n \times 2m_k\). At the same time, because we only selected the label \(l_k\), So the new label matrix only \(n \times 1\) So big , Look at it this way , We seem to have constructed a binary classification problem , To be exact, it is any original instance ( data row )\(x_{j}\) About labels \(l_k\) A temporary data set of is \(n \times 2m_k\) The dichotomous problem of .
Here the paper seems to use SVM Having two categories of learning .
In this new dichotomous problem , 2\(m_{k}\) Dimension compared to the original \(d\) The dimension carries more information .
Because the original dimension has only any selected \(x\) Each attribute characteristic of , But in the second category 2\(m_{k}\) Each column of the dimension represents the distance of the original data as a whole relative to a data center , This distance measure is a typical feature . The implication , Expand the data row that originally looked at only one dimension to reflect itself and all feature classes ( A collection of cluster centers ) The relationship between ( distance ) Two dimensional features of . But I'm sorry , The feature class in this article is only about the same tag \(k\) Feature class of , Therefore, there is no way to reflect the relevance of labels , meanwhile LIFT It seems to me that the relationship between a single tag and the central tag group is overemphasized and the basic characteristics of the tag itself are ignored ( primary \(d\) Dimensional information )
4.LIFT Process and forecast of

The input of the process is basically understandable , It's the training set , Heuristics to determine the number of clusters \(r\), The second classifier , Test matrix . The goal is to predict the test matrix .
There are two distinct aspects in the process for Loop handle LIFT There are two steps :
- label-specific features construction
- classifier induction
The first step has just been described , Simply speaking , \(1\)~\(q\) Go through all the tags , Identify each label \(l_k\) Then take out the positive and negative samples to form two sets \(\mathcal{P}_{k}\) And \(\mathcal{N}_{k}\), Then on these two sets, we use Eq.(2) Obtained in \(m_k\) For clustering \(k\) Values are clustered . And then get whatever you want \(\boldsymbol{x}_i \in \mathcal{D}\), Find this example to \(2m_k\) The distance between positive and negative samples , obtain \(\phi_{k}(\boldsymbol{x}_i) = \{ d\left(\boldsymbol{x}_i, \boldsymbol{p}_{1}^{k}\right),\dots,d\left(\boldsymbol{x}_i, \boldsymbol{p}_{2m_k}^{k}\right) \}\)
The second step , All that you get from this first step \(\phi_{k}\) Information , So it passed again \(1\)~\(q\) Go through all the tags , Identify different \(\phi_{k}\) Information , Again by \(n\) individual \(\phi_{k}(\boldsymbol{x}_i)\) Construct a two classifier \(\mathcal{B}_{k}\). Macroscopically speaking, it can also be said to be \(\mathcal{D}\) be based on \(l_k\) A two classifier is constructed \(\mathcal{B}_{k}\). The two classifiers satisfy :\[\begin{array}{r}
\mathcal{B}_{k}=\left\{\left(\phi_{k}\left(\boldsymbol{x}_{i}\right), Y_{i}(k)\right) \mid\left(\boldsymbol{x}_{i}, Y_{i}\right) \in \mathcal{D}\right\} \text { where } \\
Y_{i}(k)=+1 \text { if } l_{k} \in Y_{i} ; \text { Otherwise, } Y_{i}(k)=-1
\end{array} \tag{4}\] Then use two classifiers \(\mathfrak{L}\) Train each data set \(\mathcal{B}_{k}\), for example \(g_{k} \leftarrow \mathcal{L}\left(\mathcal{B}_{k}\right)\), This one. \(g_k\) It's about labels \(l_k\) A perfect classifier model , Can be used to predict . ( Any label \(l_k\) Corresponding data set \(\mathcal{B}_{k}\) Our training should also be based on \(n\) Based on training set samples )
Then any test set \(u \in \mathcal{X}\), We can call all \(q\) A classifier model , To test any sample of this test set \(\boldsymbol{x}\) To make predictions , Of course, pay attention to , Each sample in the test set should enter the model \(g_k\) Before forecast , Conventionally, we should follow the clustering process , Get basic \(\phi_{k}\) Data can only be fed into the model for prediction . Therefore, there is a final prediction output :\[Y=\left\{l_{k} \mid g_{k}\left(\phi_{k}(\boldsymbol{u})\right)>0,1 \leq k \leq q\right\} \tag{5}\]
( Follow up on the operation of the code and the follow-up of the paper Experiments With the use of metrics The details need to be improved by further reading the paper )
Simple summary
LIFT It is better to read than before LSML Take it easy , It can only be said that it deserves to be label-specific features The opening work , About label-specific features Its characteristics and theory are very clear , It's a kind of paper that can understand the general direction after reading it . It seems that the implementation is not so complicated , It is the kind of algorithm that can understand the code structure at a glance .
LIFT Mining the specific characteristics of each tag , Used K-Means And transform multiple labels into multiple binary classification problems , Very bold and groundbreaking , Although the beauty is not enough to only complete 了 first-order Multi label , But this kind of The embedded embedding Thinking should be the most valuable part of this article . As a reference for other papers .
I did worry about it when I first learned about it K-Means The effect of , But the effect seems heuristic \(r\) It is good to use . I hope I can see more parallel effects after I have access to more clustering algorithms in the future .
边栏推荐
猜你喜欢

What is the at instruction set often used in the development of IOT devices?

Why use huluer pie disk instead of U disk?

excel初级应用案例——杜邦分析仪

Geoffrey Hinton:我的五十年深度学习生涯与研究心法

Fixed height of the first column in El table dynamic header rendering

SystemVerilog学习-10-验证量化和覆盖率

Qt编写自定义控件-自绘电池

HCM 初学 ( 一 ) - 简介

从底层结构开始学习FPGA----RAM IP的定制与测试

excel动态图表
随机推荐
码蹄集 - MT3149 · AND - 数据不是很强,暴力剪枝就能骗AC
Know the future of "edge computing" from the Nobel prize!
What is the at instruction set often used in the development of IOT devices?
4GB大文件,如何实时远程传输和共享?
LeetCode 最大矩形,最大正方形系列 84. 85. 221. 1277. 1725. (单调栈,动态规划)
C语言初阶——牛客网精选好题
HCM 初学 ( 四 ) - 时间
这才是大学生必备软件 | 知识管理
云盘里资料被和谐了,怎么办?
On the first day of the new year, 3000 Apache servers went down
boot+jsp的高校社团管理系统(附源码下载链接)
论文学习记录随笔 多标签之LIFT
Crossing pie · pie pan + Mountain duck = local data management
健康照明中应用的LED照明灯
HCM 初学 ( 一 ) - 简介
Dear pie users, I want to confess to you!
基于微信小程序的青少年生理健康知识小助手(免费获取源码+项目介绍+运行介绍+运行截图+论文)
POL8901 LVDS转MIPI DSI 支持旋转图像处理芯片
TiDB单机模拟部署生产环境集群(闭坑实践,亲测有效)
This is the necessary software for college students 𞓜 knowledge management