当前位置：网站首页>The representation of time series analysis: is the era of learning coming?

The representation of time series analysis: is the era of learning coming?

2022-07-29 05:07:00 【fareise】

WeChat official account “ Round algorithm notes ”, Continuous updating NLP、CV、 Search and promote dry goods notes and interpretation of cutting-edge work in the industry ~

The background to reply “ communication ” Join in “ Round algorithm notes ” Communication group ; reply “ The time series “、” Multimodal “、” The migration study “、”NLP“、” Picture learning “、” It means learning “、” Meta learning “ Wait to get the dry goods algorithm notes in various fields ~

It means that learning is the core of deep learning , Recently, more and more have been applied to the field of time series , Time series analysis shows that the era of learning has come . This article brings you 2020 It has been the top meeting since 5 This time series represents the core work related to learning .

1.Unsupervised Scalable Representation Learning for Multivariate Time Series(NIPS'20)

The idea of time series representation learning method in this paper comes from the classical word vector model CBOW.CBOW The assumption in is , The context representation of a word should be close to that of the word , At the same time, it is far away from other randomly sampled words . This paper applies this idea to the learning of time series representation , First, we need to construct CBOW Context in （context） And random negative samples , The construction method is shown in the figure below . First select a time series xref, as well as xref A subsequence in xpos.,xref It can be seen as xpos Of context. meanwhile , Random from other time series , Or sampling multiple negative samples in other time segments of the current time series xneg. In this way, a structure similar to CBOW The loss function of , Give Way xref and xpos Be close to , At the same time let xref And other negative samples xneg Far away .

In the model structure , This paper adopts the structure of multilayer cavity convolution , This part of the model structure has been introduced in detail in the previous article , Interested students can refer to ：12 Top papers , Summary of classical schemes for deep learning time series prediction

2.Unsupervised representation learning for time series with temporal neighborhood coding(ICLR'21)

The method proposed in this paper is different from the previous article in the selection of positive and negative samples and the design of loss function . The first is the selection of positive and negative samples , For a moment t Centered time series , In this paper, a Gaussian distribution is used to delimit the sampling range of its positive samples . Gaussian distribution with t Centered , Another parameter is the range of the time window . Selection of time window range , In this paper, a new method is used ADF The test method selects the optimal window span . If the time window range is too long , Conditions that may cause the positive sample to be sampled to be irrelevant to the original sample ; If the time window is too small , It will cause too much overlap between the positive sample and the original sample .ADF The test can detect that the time series is in a stable time window , So as to select the most appropriate sampling range .

In terms of the loss function , This paper mainly solves the problem of false negative samples . If the samples outside the window selected above are regarded as negative samples , False negative samples are likely to appear , That is, it is originally related to the original sample , But because it is far away from the original sample, it is mistaken for a negative sample . For example, the time series is based on years , The time window is 1 Months , The series of the same period last year may be regarded as negative samples . This will affect model training , Make it difficult for the model to converge . To solve this problem , In this paper, the samples outside the window are not regarded as negative samples , But as no nothing label sample . In the loss function , Set a weight for each sample , This weight represents the probability that the sample is a positive sample . This method is also known as Positive-Unlabeled (PU) learning. The final loss function can be expressed in the following form ：

3. A transformer-based framework for multivariate time series representation learning(KDD'22)

This article draws on the pre training language model Transformer The idea of , It is hoped that the unsupervised method can be used in multivariate time series , With the help of Transformer Model structure , Learn good multivariate time series representation . This paper focuses on the unsupervised pre training task designed for multivariate time series . As shown on the right side of the figure below , For the input multivariate time series , Meeting mask Drop a certain proportion of subsequences （ It can't be too short ）, And each variable is mask, instead of mask Drop all variables at the same time . The optimization goal of pre training is to restore the whole multivariate time series . In this way , Let the model be mask When the missing part , Can consider both the front 、 The following sequence , It can also be considered that the same time period has not been mask Sequence .

The following figure shows the effect of unsupervised pre training time series model on time series prediction task . The figure on the left shows , Different have label Data volume , Whether to use unsupervised pre training RMSE Effect comparison . You can see , Whether there is label How much data is there , Adding unsupervised pre training can improve the prediction effect . The figure on the right shows that the larger the amount of unsupervised pre training data used , The better the fitting effect of the final time series prediction .

4. Time-series representation learning via temporal and contextual contrasting(IJCAI'21)

This paper adopts the method of comparative learning to express the learning of time series . First, for the same time series , Use strong and weak Two data enhancement methods generate two of the original sequence view.Strong Augmentation It means that the original sequence is divided into multiple segments and then the sequence is disrupted , Add some random disturbances ;Weak Augmentation It refers to scaling or translating the original sequence .

Next , take strong and weak Two enhanced sequences are input into a convolutional timing network , Get the representation of each sequence at each time . In this paper, we use Temporal Contrasting and Contextual Contrasting Two ways of comparative learning .Temporal Contrasting It means using a view Of context Predict another view In the future moment , The goal is to make this representation different from another view The corresponding real representation is closer , It's used here Transformer As the main model of time series prediction , The formula is as follows , among c Express strong view Of Transformer Output ,Wk It's a mapping function , Is used to c Mapping to predictions for the future ,z yes weak view The expression of the future moment ：

Contextual Contrasting It is the contrastive learning of the whole sequence , Narrow the two generated by the same sequence view Distance of , Let different sequences generate view Farther away , The formula is as follows , This is similar to the way of image contrast learning ：

5. TS2Vec: Towards Universal Representation of Time Series(AAAI'22)

TS2Vec The core idea is also unsupervised learning , Construct positive sample pairs through data enhancement , Through the optimization goal of comparative learning, the distance between positive sample pairs , The distance between negative samples is far . The core of this paper is mainly in two aspects , The first is the design of positive sample pair construction and comparative learning optimization objectives for the characteristics of time series , The second is the hierarchical comparative learning combined with the characteristics of time series .

For the construction method of positive sample pairs , This paper presents a method of constructing positive sample pairs suitable for time series ：Contextual Consistency.Contextual Consistency The core idea is , Time series of two different enhanced views , At the same time step, the distance is closer . Two structures are proposed in this paper Contextual Consistency The method of positive sample pair . The first is Timestamp Masking, After full connection , Random mask Vector representation of some time steps , Re pass CNN Extract the representation of each time step . The second is Random Cropping, Select two subsequences with common parts as positive sample pairs . Both methods make the vector representation of the same time step closer , As shown in the figure above .

TS2Vec Another core point of is hierarchical contrast learning . Time series and images 、 An important difference of natural language is , Through polymerization of different frequencies , Time series with different granularity can be obtained . for example , Time series of day granularity , By weekly aggregation, we can get the sequence of weekly granularity , According to monthly aggregation, we can get the sequence of monthly granularity . In order to integrate the hierarchy of time series into comparative learning ,TS2Vec Proposed the hierarchical comparative learning , The algorithm flow is as follows . For two time series that are mutually positive sample pairs , First through CNN Generate a vector representation of each time step , Then recycle maxpooling Aggregate in the time dimension , The aggregation window used in this article is 2. After each polymerization , Calculate the distance of the aggregation vector corresponding to the time step , Make the same time step closer . The particle size of polymerization keeps getting coarser , Finally, it is aggregated into the granularity of the whole time series , Gradually realized instance-level To learn .

‍ WeChat official account “ Round algorithm notes ”, Continuous updating NLP、CV、 Search and promote dry goods notes and interpretation of cutting-edge work in the industry ~ The background to reply “ communication ” Join in “ Round algorithm notes ” Communication group ; reply “ The time series “、” Multimodal “、” The migration study “、”NLP“、” Picture learning “、” It means learning “、” Meta learning “ Wait to get the dry goods algorithm notes in various fields ~

Backstage message ” communication “, Join the circle algorithm exchange group ~

Backstage message ” The paper “, Get the summary of papers from all directions ~

【 Historical dry goods algorithm notes 】

12 Top papers , Summary of classical schemes for deep learning time series prediction

How to build a time series prediction Transformer Model ？

Spatial-Temporal Summary of time series prediction modeling methods

newest NLP Prompt On behalf of the work organization ！ACL 2022 Prompt Direction paper analysis

The picture shows how to study classic work —— The basic chapter

The majestic ：14 A large summary of pre training language models

Vision-Language Multi modal modeling method

fancy Finetune Method summary

from ViT To Swin,10 Read the top paper Transformer stay CV The development of the field

原网站

版权声明
本文为[fareise]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290503515505.html