Highly comparative time-series analysis

Last update: Dec 21, 2022

Overview

〰️ hctsa 〰️ : highly comparative time-series analysis

hctsa is a software package for running highly comparative time-series analysis using Matlab (full support for versions R2018b or later).

The software provides a code framework that enables the extraction of thousands of time-series features from a time series (or a time-series dataset). It also provides a range of tools for visualizing and analyzing the resulting time-series feature matrix, including:

Normalizing and clustering the data,
Producing low-dimensional representations of the data,
Identifying and interpreting discriminating features between different classes of time series,
Learning multivariate classification models.

Feel free to email me for help with real-world applications of hctsa 🤓

Acknowledgement 👍

If you use this software, please read and cite these open-access articles:

B.D. Fulcher and N.S. Jones. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5, 527 (2017).
B.D. Fulcher, M.A. Little, N.S. Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface 10, 83 (2013).

Feedback, as email, github issues or pull requests, is much appreciated.

For commercial use of hctsa, including licensing and consulting, contact Engine Analytics.

Getting Started 😊

Documentation 📖

Comprehensive documentation for hctsa, from getting started through to more advanced analyses is on gitbook.

Downloading the repository ⬇️

For users unfamiliar with git, the current version of the repository can be downloaded by simply clicking the green Code button, and then clicking Download ZIP.

It is recommended to use the repository with git. For this, please make a fork of it, clone it to your local machine, and then set an upstream remote to keep it synchronized with the main repository e.g., using the following code:

git remote add upstream git://github.com/benfulcher/hctsa.git

(make sure that you have generated an ssh key and associated it with your Github account).

You can then update to the latest stable version of the repository by pulling the master branch to your local repository:

git pull upstream master

For analyzing specific datasets, we recommend working outside of the repository so that incremental updates can be pulled from the upstream repository. Details on how to merge the latest version of the repository with the local changes in your fork can be found here.

Related resources

CompEngine 💥

CompEngine is an accompanying web resource for this project. It is a self-organizing database of time-series data that allows users to upload, explore, and compare thousands of diverse types of time-series data. This vast and growing collection of time-series data can also be downloaded. Go have a play, read more about it in our 📙 paper, or watch a talk on YouTube.

catch22 2️⃣ 2️⃣

Is over 7000 just a few too many features for your application? Do you not have access to a Matlab license? catch22 has all of your faux-rhetorical questions covered. This reduced set of 22 features, determined through a combination of classification performance and mutual redundancy as explained in this paper, is available here as an efficiently coded C implementation with wrappers for python, R, and Julia.

hctsa datasets and example workflows 💾

There are a range of open datasets with pre-computed hctsa features, as well as some examples of hctsa workflows.

C. elegans movement speed data and associated analysis code.
Drosophila movement speed and associated analysis code.
1000 empirical time series

(If you have data to share and host, let me know and I'll add it to this list)

Running hctsa on a cluster 💻

Matlab code for computing features for an initialized HCTSA.mat file, by distributing the computation across a large number of cluster jobs (using pbs or slurm schedulers) is here.

Publications 📕

hctsa has been used by us and others to do new science in neuroscience, engineering, and biomedicine. An updated list of publications using hctsa is on this wiki page.

hctsa licenses

Internal licenses

There are two licenses applied to the core parts of the repository:

The framework for running hctsa analyses and visualizations is licensed as the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. A license for commercial use is available from Engine Analytics.
Code for computing features from time-series data is licensed as GNU General Public License version 3.

A range of external code packages are provided in the Toolboxes directory of the repository, and each have their own associated license (as outlined below).

External packages and dependencies

Many features in hctsa rely on external packages and Matlab toolboxes. In the case that some of them are unavailable, hctsa can still be used, but only a reduced set of time-series features will be computed.

hctsa uses the following Matlab Add-On Toolboxes: Statistics and Machine Learning, Signal Processing, Curve Fitting, System Identification, Wavelet, and Econometrics.

The following external time-series analysis code packages are provided with the software (in the Toolboxes directory), and are used by our main feature-extraction algorithms to compute meaningful structural features from time series:

TISEAN package for nonlinear time-series analysis, version 3.0.1 (GPL license).
TSTOOL package for nonlinear time-series analysis, version 1.2 (GPL license).
Joseph T. Lizier's Java Information Dynamics Toolkit (JIDT) for studying information-theoretic measures of computation in complex systems, version 1.3 (GPL license).
Time-series analysis code developed by Michael Small (unlicensed).
Max Little's Time-series analysis code (GPL license).
Sample Entropy code from Physionet (GPL license).
ARFIT Toolbox for AR model estimation (unlicensed).
gpml Toolbox for Gaussian Process regression model estimation, version 3.5 (FreeBSD license).
Danilo P. Mandic's delay vector variance code (GPL license).
Cross Recurrence Plot Toolbox (GPL license)
Zoubin Ghahramani's Hidden Markov Model (HMM) code (MIT license).
Danny Kaplan's Code for embedding statistics (GPL license).
Two-dimensional histogram code from Matlab Central (BSD license).
Various histogram and entropy code by Rudy Moddemeijer (unlicensed).

Other time-series analysis resources

A collection of good resources for time-series analysis (including in other programming languages like python and R) are on the wiki.

Acknowledgements 👋

Many thanks go to Romesh Abeysuriya for helping with the mySQL database set-up and install scripts, and Santi Villalba for lots of helpful feedback and advice on the software.

Comments

nn_prepare.m as a function is not supported and a whole host of other errors
I am using Ethoscope velocity data to distinguish between genotype. I don't have a license for the econometrics toolbox so I get a few errors about that. Additionally I get errors about functions not being supported. Are these issue with my installation or is it because the data is not amenable to that particular analysis?

Execution of script nn_prepare as a function is not supported: /home/luca/Toolboxes/OpenTSTOOL/tstoolbox/mex/nn_prepare.m
opened by posttenebre 8
Stochasticity

Some operations output results that depend on the random seed, and thus running the same operation on the same time series can produce different results if run multiple times. A solution to this is required, and could be done by allowing a random seed input to each non-deterministic function, to allow reproducible results. If none is provided, a default could be rng('default') at the start of each function. I should implement this as a priority going forward.
bug

opened by benfulcher 6

TS_GetIDs returns brace indexing error

Hi Ben,

I get this error when trying to GetIds of a part of name from the Operations.Name table.

When I leave the 'Name' flag out it searches the keywords field and it works fine.

>> OperationIDs = TS_GetIDs('mystring', myFile, 'ops', 'Name');
Loading data from....mat... Done.
Brace indexing is not supported for variables of this type.

Error in TS_GetIDs (line 114)
            cmatch = find(contains(theDataTable.Name,theMatchString{i}));

However it works when I use 'contains' below (partially copying the method used in keywords). I'm not sure the reason for the loop in Name using cmatch?

    case {'name','Name'}
        % The cell of comma-delimited keyword strings:
        theKeywordCell = theDataTable.Name;

        % Find objects with a keyword that matches the input string:
        matches = find(contains(theKeywordCell, theMatchString));

        % Return the IDs of the matches:
        IDs = theDataTable.ID(matches);

        % Check for empty:
        if isempty(IDs)
            warning('No matches to ''%s'' found in %s',theMatchString,theDataFile)
        end

opened by LJGz 5

TISEAN d2 leaves temporary files undeleted

On a side note, I think there is unlikely but possible danger of having temporary fn colisions when running on parallel (it happened to me already once). We probably should create those in the system temporary dir and make sure they are unique.

opened by sdvillal 5
Multivariate time series analysis

Hello,

Thank you for such an amazing tool. I'm wondering about how to approach multivariate time series using hctsa. Is there a special way of assigning the keywords before using TS_LabelGroups?

Thank you in advance, Konstantin

opened by smetanadvorak 4

database connection error

mysql_dbopen.m throws error even after including the appropriate connector via

javaaddpath('/home/philip/work/CompEngineMatlab/Database/mysql-connector-java-5.1.34-bin.jar')

% -- Error -- 
Error using mysql_dbopen (line 24)
Error with java database connector: Java exception occurred:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Unknown Source)

opened by Philiphorst 4

Not able to skip User input "y" in TS_Init

Hi,

I am trying to extract the temporal features from a large number of BOLD resting state fMRI signals using hctsa toolbox. Therefore, I have written a script which loads keyword matrices, labels, and timeseries in a loop and inputs them sequentially to TS_Init for creation of HCTSA.mat files. But, for each loop the algorithm requires a user input "y" to confirm the time series. I could not find the corresponding line of the code in "TS_Init.m" file of the toolbox to modify it and skip this step, and I was wondering if you could please guide me towards solving this issue.

Thank you for the great toolbox. Regards, Ali

opened by javanray 3
Wrong_TS_Classify

Hi Ben,

I have been using your HCTSA toolbox for sometime and had a question. HCTSA is an outstanding job. When I get start with "hctsa_phenotypingWorm-master" and "hctsa_phenotypingFly-master" project. Here was the question that plotconfusion(realLabels,predictLabels); plotconfusion (line 111) update_args = standard_args(args{:}); plotconfusion>standard_args (line 255) Value is not a matrix or cell array. I did not understand how to solve this problem. Can you give me some suggestions?

opened by ozone521 3
Multivariate TS classification

Dear Ben, first of all thank you for all the work and effort you put into the hctsa package. It has been really helpful so far!

In the beginning I was dealing with a single sensor to monitor a pressure ts. The classification is mostly a 2-class but can also be a multiple class problem. Now, I have multiple sensors available that monitor one and the same process. I would like to include possible relations and dependencies between the sampled variables into my analsis. Therefore, I like to ask if you have made any experience yet on how to implement a multivariate ts analysis in a smart way, still using all the beautiful functions within this package.

Thank you in advance and best regards, Alex

opened by zeisal 3
Re: Forecasting

Hi,

I'm generating features for a forecasting application. I require "causal" features, i.e. features that at any point in time do not use future information. I noted that there is a subset of features with the keyword 'forecasting.' Are these all causal? Are any features in other subsets also causal? Is there a simple way to extract only those sorts of features?

Thanks for any help on this matter, Gavin.

opened by xenmind 3
Update TS_plot_timeseries.m

Hi Ben, added the legend into the time series plot. The code may seem a little clunky so feel free to edit/make it a bit nicer :)

Also went through and added further options to TS_classify so classification rates are returned, and there is the option to suppress the confusion matrix.

Nic

opened by nic-barbara 3
Error using compile_mex

When I try to compile the install.m file I get this error:

Error using compile_mex An error occurred while compiling ML_Fastdfa_core C code. It appears that mex is not set up to work on this system (cf. 'doc mex' and 'mex -setup'). Get 'mex ML_fastdfa_core.c' to work, and then re-run compile_mex.m

In the code warns you of this but I do not know how to configure it 'Please make sure that mex is set up with the right compilers for this system.'

opened by Carlosrrn 1
compile_mex error from range_search.cpp file

Hi Ben,

I have encountered an error when compiling MEX as part of the installation process. When compiling the OpenTSTOOL, I get a series of warnings followed by an error. This is generated from the range_search.cpp file line 116 (see screenshot below):

I suspect that the error arises as the code is trying to compare a range to an integer, which is an invalid operation? When justifying that part of the code, the MEX is compiled successfully (see screenshot below):

I just wanted to share this and I would be grateful for any suggestions. FYI - I am using macOS Catalina v10.15.7 and Matlab R2021a.

Many thanks! Irene

opened by faimai00 1
TS_Classify produces an "Unrecognized function or variable 'foldLosses'" error when trying to save the classifier

Hi Ben

I'm just trying to save the classifier produced by TS_Classify, and I'm getting an error saying both Unrecognized function or variable 'foldLosses', and Unrecognized function or variable 'whatLoss'. It looks like these variables are no longer produced by TS_Classify (when I use the find function to look for the variables within TS_Classify, they are not produced as outputs on any line). When I comment out the lines that require these variables, the function works.

It looks like the aspect of the TS_Classify function that utilizes these variables might have been removed at some stage, because I've noticed the following in the output description (I was also interested in the doPCs option, but couldn't work out how to perform it):

%---OUTPUTS: % Text output on classification rate using all features, and if doPCs = true, also % shows how this varies as a function of reduced PCs (text and as an output plot) % foldLosses, the performance metric across repeats of cross-validation % nullStats, the performance metric across randomizations of the data labels % jointClassifier, details of the saved all-features classifier

Kind regards,

Neil

opened by NeilwBailey 1
Consider a cloud CI service

Given public open source projects have free support for cloud ci services, consider leveraging one of them here. 🎉

https://blogs.mathworks.com/developer/2020/12/15/cloud-ci-services/

opened by acampbel 1
Analysis methods mistreat missing values

Analysis methods, like TS_TopFeatures, assume that there are no errors in the data matrix (i.e., that all bad values have been filtered out of the dataset, using TS_normalize). There should be better checks on this, to avoid the zeros in TS_DataMat being treated as actual zeros (rather than error symbols in TS_Quality. Best solution would be to use data in TS_Quality to restrict the computation to good values (where meaningful analysis is possible), e.g., in the case of TS_TopFeatures.
bug

opened by benfulcher 0
Operations -> Features

Perhaps it's time to update the terminology, since Feature is more common usage than Operation. Could consider changing the name of the Operations data object to Features...

opened by benfulcher 0

Releases(v1.07)

v1.07(Jul 22, 2022)
TS_SimSearch supports interactive plots to swap between raw and ranked values in scatter plots

Better error messaging for computation

install script checks for required toolboxes

'svmBeta' option to score individual features on SVM beta weights: added to TS_TopFeatures

Better checks on input data type (e.g., singles or integers) in feature computation

Improved reporting of in-fold and out-of-fold accuracies in k-fold CV classification

Logistic regression added to classifiers, as well as svm-linear (high-dim) default (original available as svm-linear-lowdim)

TS_ClassifyLowDim expanded functionality to check for in-sample over-fitting

Null testing improved with option for simple shuffle-based nulls (under assumption that random in -> random out; rather than full model-based nulls)

Improved visualization settings in TS_TopFeatures (including Spearman correlations now default)

signed p-value test statistic from rank-sum test added in TS_TopFeatures

Improved syntax for TS_Init for selecting a feature set by name.

Progress bar option for new minimal output versions of TS_Compute as 'fast' and 'minimal' (in cases like with catch22 where full output makes the commandline unreadable.

Some feature keywords improved.

Source code(tar.gz)
Source code(zip)
v1.06(Aug 5, 2021)
Interactive plotting (annotations added in response to mouse clicks) in TS_LowDimInspect and TS_FeatureSummary

Histogram binning now done in a simpler and more transparent way, to match C implementation in catch22 (in DN_HistogramMode).

New histogram asymmetry features added via DN_HistogramAsymmetry.

Source code(tar.gz)
Source code(zip)
v1.05(Jan 8, 2021)
Clearer naming of the CO_HistogramAMI features

New nsadf feature

New features of first maximum of various self-correlation function (rather than just first min) added.

Various additional fixes

Source code(tar.gz)
Source code(zip)
v1.04(Jul 22, 2020)
Clearer naming of CO_HistogramAMI features

Ability to specify reduced feature subsets in cfnParams (that will be applied in TS_Classify, TS_ClassifyLowDim, TS_TopFeatures, TS_PlotLowDim)

Clearer keyword labeling of 'locDep' -> 'locationDependent', etc.

Source code(tar.gz)
Source code(zip)
v1.03(Jul 2, 2020)

Added continuous analogues of first crossings

CO_FirstZero and CO_f1ecac combined into CO_FirstCrossing, and added discrete (first time lag the ACF exceeds the threshold) and continuous (linear interpolation for the crossing estimate, between previous and subsequent time points). Also an issue with BF_PreProcess consistency and with 'absclose' being incorrectly set to 'saturate' causing an issue for DN_RemovePoints
Source code(tar.gz)
Source code(zip)
v1.02(May 17, 2020)

Consistent capitalization across functions, and clearer convention around z-scoring as x_z rather than y.
Source code(tar.gz)
Source code(zip)
v1.01(May 16, 2020)
DN_RemovePoints has some new options, with either 'remove' or 'saturate'.

Improved behavior and statistics from CO_AutoCorrShape

Source code(tar.gz)
Source code(zip)
v1.0(Jul 11, 2019)

Tweaked some features, fixed some errors. Semantic versioning practices are not quite valid for hctsa's feature library (where every slight change to any feature makes hctsa results inconsistent with every previous version). But it's about time we had a v1.0!
Source code(tar.gz)
Source code(zip)
v0.99(Feb 15, 2019)

Cleaning of poor-performing features, FourierPowerSpectrum keyword used, TS_plot_timeseries -> TS_PlotTimeSeries for consistent naming convention, improvement in Gaussian mutual information computation.
Source code(tar.gz)
Source code(zip)
0.98(Sep 3, 2018)
Metadata for TimeSeries, Operations, and MasterOperations are now stored in tables (rather than clunkier cell arrays).

TS_plot_pca renamed TS_PlotLowDim and now supports tSNE projections.

Added very simple function for reducing the number of features as TS_ReduceFeatureSet.

Source code(tar.gz)
Source code(zip)
v0.96(Mar 5, 2018)

Major working version.
Source code(tar.gz)
Source code(zip)
v0.92(Mar 5, 2018)

hctsa version used in: Fulcher, B. D., & Jones, N. S. (2017). hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Systems, 5(5), 527–531.e3.
Source code(tar.gz)
Source code(zip)

Owner

Ben Fulcher

I use methods from physics and statistics to understand the structure and dynamics of complex systems like the brain

GitHub Repository https://hctsa-users.gitbook.io/hctsa-manual

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

2 Oct 30, 2022

Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

803 Jan 06, 2023

Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Bayes-Newton Bayes-Newton is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Wil

165 Nov 27, 2022

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

55 Jan 01, 2023

Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021

Awesome-google-colab - Google Colaboratory Notebooks and Repositories

Unofficial Google Colaboratory Notebook and Repository Gallery Please contact me to take over and revamp this repo (it gets around 30k views and 200k

1.2k Jan 03, 2023

This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

DARS Code release for the paper "Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation", ICCV 2021

58 Jan 01, 2023

Code base for reproducing results of I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics. NeurIPS (2021)

Learning to Execute (L2E) Official code base for completely reproducing all results reported in I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learnin

3 May 18, 2022

Highly comparative time-series analysis

Related tags

Overview

〰️ hctsa 〰️ : highly comparative time-series analysis

Acknowledgement 👍

Getting Started 😊

Documentation 📖

Downloading the repository ⬇️

Related resources

CompEngine 💥

catch22 2️⃣ 2️⃣

hctsa datasets and example workflows 💾

Running hctsa on a cluster 💻

Publications 📕

hctsa licenses

Internal licenses

External packages and dependencies

Other time-series analysis resources

Acknowledgements 👋

Comments

Releases(v1.07)

v1.07(Jul 22, 2022)

v1.06(Aug 5, 2021)

v1.05(Jan 8, 2021)

v1.04(Jul 22, 2020)

v1.03(Jul 2, 2020)

Added continuous analogues of first crossings

v1.02(May 17, 2020)

v1.01(May 16, 2020)

v1.0(Jul 11, 2019)

v0.99(Feb 15, 2019)

0.98(Sep 3, 2018)

v0.96(Mar 5, 2018)

v0.92(Mar 5, 2018)

Owner

Ben Fulcher

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Real-time pose estimation accelerated with NVIDIA TensorRT

Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Repo for flood prediction using LSTMs and HAND

Awesome-google-colab - Google Colaboratory Notebooks and Repositories

This is the latest version of the PULP SDK

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

Code base for reproducing results of I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics. NeurIPS (2021)

A Python implementation of global optimization with gaussian processes.

Like Dirt-Samples, but cleaned up

Powerful and efficient Computer Vision Annotation Tool (CVAT)

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

🤗 Paper Style Guide

PlenOctree Extraction algorithm

Solutions and questions for AoC2021. Merry christmas!

A new test set for ImageNet