当前位置:网站首页>Thesis learning -- Analysis and Research on similarity query of hydrological time series

Thesis learning -- Analysis and Research on similarity query of hydrological time series

2022-07-01 07:38:00 Graduate students are not late

Write it at the front :《 hydrological 》;2009 year ;
author : Li Wei 、 Sun Honglin

1 Abstract

  1. Hydrological time series similarity query , It can be used for Rain flood process prediction 、 Environmental evolution analysis 、 Analysis of hydrological process law Other aspects .
  2. The most direct application is , Answer questions often asked in flood control command :“ The current hydrological process is equivalent to the same process in which period in history ”
  3. Introduce the theory and technology of data warehouse and data mining .

2 introduction

 Insert picture description here

3 Problem description

Traditional time series similarity search , It mainly emphasizes precise matching , But in data mining applications , Because of the huge amount of data , Generally, it is based on approximate matching “ Approximate search ”.

The key work of hydrological time series similarity mining is :

  1. Division of subsequences . In the National Hydrological Database , Flood engineering has been divided according to the theory of runoff generation , Form an excerpt of various elements .
    however , In the daily value class , It needs to be divided according to the type of problem to be solved , We need to make the partition rules It conforms to the hydrological theory , And suitable for computer processing .

  2. Sequence feature extraction . Generally, the sequence is transformed , For example, Fourier transform 、 Wavelet transform or piecewise average mapping to feature space .

  3. Determination of similarity measure . For hydrological processes , Different hydrological processes have different characteristics . Therefore, according to the characteristics of hydrological process , Determine the appropriate similarity measures .

4 Theoretical methods

Similarity query of hydrological time series , The data objects to be processed are based on hydrological data , The process can be divided into two main stages : Query preparation stage and Similarity query stage .

  1. Query preparation stage . Include Data preprocessing And Feature extraction of time series .
    ① In any data mining task , Data preprocessing is one of the essential key tasks , Data preprocessing in this model involves data integration 、 Data purification 、 Data selection and sequence regularization transformation ;
    ② Pattern representation of time series is a prerequisite for time series data mining , It is one of the key problems of hydrological time series similarity mining , Its effect directly affects the results of data mining .

  2. Similarity query stage . Users submit query requests , Based on the pattern representation, the system performs pattern matching according to the similarity measurement , And display the results visually to users .

Pattern matching ( Similarity measure )+ Pattern representation of time series It is also called the two cornerstones of time series similarity query .

5 Piecewise linear representation based on feature points

  • Time series pattern representation :
    This article USES : Piecewise linear representation based on feature points , As a pattern representation of time series .(PLR)

  • For the time series with obvious periodicity and frequent fluctuations of short-term patterns , It can effectively realize data compression , So as to grasp the change characteristics of the overall pattern of time series .

  • An example of segmentation is shown in the figure below :
     Insert picture description here

5.1 Piecewise linear representation

 Insert picture description here

5.2 Definition of characteristic points

 Insert picture description here

6 Similarity measure of time series

  • The definition of similarity measure of time series should meet the following conditions :
    (1) Similarity measures allow for imprecise matching , Support multiple deformations of time series ;
    (2) The calculation of similarity measure must be efficient ;
    (3) Similarity measures should support fast indexing ;
    (4) Similarity measure can be applied to other data mining fields , Such as clustering and classification of time series 、 Frequent pattern discovery and exception discovery, etc ;

  • Common similarity measures are :Minkowski distance 、 Dynamic time bending distance 、 Longest common substring, etc .

6.1 Dynamic pattern matching distance (DPM)

  1. DPM Distance is not calculated based on matching between points , They are matched by patterns .
  2. advantage : The definition of patterns is very flexible ; The average length of the pattern is generally much larger than 1, The dimension reduction of time series is realized ( The number of patterns in time series is much smaller than the length of time series )

6.2 Algorithm steps

  1. Defining patterns . Extracting pattern features from time series , Transform time series into feature space , Get the pattern representation of the time series .
    For piecewise linear representations , A pattern is an interpolated segment of a time series field , It can be characterized by the length of the line segment 、 Slope, etc ;

  2. Define the distance between patterns

原网站

版权声明
本文为[Graduate students are not late]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/182/202207010719060090.html