当前位置:网站首页>Indoor user time series data classification baseline of 2020 ccfbdci training competition
Indoor user time series data classification baseline of 2020 ccfbdci training competition
2020-11-10 11:27:00 【osc-u 2koojuzp】
Indoor user time series data classification
Introduction to the contest question
Title of competition : Indoor user movement time series data classification
Track : Training track
background : With the accumulation of data , The processing demand of massive time series information is becoming increasingly prominent . As one of the important tasks in time series analysis , Time series classification is widely used and diverse . The purpose of time series classification is to assign a discrete marker to the time series . Traditional feature extraction algorithm uses statistical information in time series as the basis of classification . In recent years , Time series classification based on deep learning has made great progress . Based on the end-to-end feature extraction method , Deep learning can avoid tedious artificial feature design . How to classify time series effectively , From a complex data set, a sequence with a certain form is assigned to the same set , It is of great significance for academic research and industrial application .
Mission : Based on the above actual needs and the progress of deep learning , This training competition aims to build a general time series classification algorithm . Establish an accurate time series classification model through this question , I hope you will explore a more robust representation of time series features .
Match Links :https://www.datafountain.cn/competitions/484
Data brief
The data is collated from open data sets on the Internet UCI( Desensitized ), The dataset covers 2 Class different time series , This kind of dataset is widely used in business scenarios of time series classification .
File category | file name | The contents of the document |
---|---|---|
Training set | train.csv | Training dataset tag file , label CLASS |
Test set | test.csv | Test dataset tag file , No label |
Field description | Field description .xlsx | Training set / Test set XXX Specific description of the fields |
Submit sample | Ssample_submission.csv | There are only two fields ID\CLASS |
Data analysis
This question is a question of dichotomy , By observing the training set data , It turns out that the amount of data is very small (210 individual ) And it has a lot of features (240 individual ), And for the tag value of training data ,0 and 1 It's very evenly distributed ( About half of each ). Based on this , The use of direct neural network model will lead to too many parameters to be trained, so as to obtain unsatisfactory results . And use the tree model , Some hyperparameters need to be adjusted to fit the data , It's also complicated . Comprehensive analysis above , In this paper, we consider using the simplest support vector machine for classification , The results show that good results have been obtained .
Baseline Program
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.svm import SVR
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
# Separate data sets
X_train_c = train.drop(['ID','CLASS'], axis=1).values
y_train_c = train['CLASS'].values
X_test_c = test.drop(['ID'], axis=1).values
nfold = 5
kf = KFold(n_splits=nfold, shuffle=True, random_state=2020)
prediction1 = np.zeros((len(X_test_c), ))
i = 0
for train_index, valid_index in kf.split(X_train_c, y_train_c):
print("\nFold {}".format(i + 1))
X_train, label_train = X_train_c[train_index],y_train_c[train_index]
X_valid, label_valid = X_train_c[valid_index],y_train_c[valid_index]
clf=SVR(kernel='rbf',C=1,gamma='scale')
clf.fit(X_train,label_train)
x1 = clf.predict(X_valid)
y1 = clf.predict(X_test_c)
prediction1 += ((y1)) / nfold
i += 1
result1 = np.round(prediction1)
id_ = range(210,314)
df = pd.DataFrame({
'ID':id_,'CLASS':result1})
df.to_csv("baseline.csv", index=False)
Submit results
Submit baseline, The score is 0.83653846154.
Because of the 50% discount on the data , So the score of the submitted results will fluctuate a little .
版权声明
本文为[osc-u 2koojuzp]所创,转载请带上原文链接,感谢
边栏推荐
- [C.NET] 11: the most basic thread knowledge
- How can computer major students avoid becoming low-level code farmers?
- How to view the establishment of the new retail business department of Alibaba cloud?
- 使用 c++ 模板显示实例化解决模板函数声明与实现分离的问题
- Do not understand the code, can type can build a station? 1111 yuan gift bag to help you with one stop!
- How to generate random data for interface testing
- Using C + + template display instantiation to solve the separation of template function declaration and Implementation
- 用例子理解递归
- 【高级测试工程师】新鲜出炉的三套价值18K的自动化测试面试(网易、字节跳动、美团)
- TCP Performance Analysis and optimization strategy
猜你喜欢
CentOS7本地源yum配置
[CCPC] 2020ccpc Changchun F - band memory | tree heuristic merge (DSU on a tree), chairman tree
One of the 10 Greatest formulas in the world is well known
On promiz
用”软删除“来删除数据库中的数据
C language uses random number to generate matrix to realize fast transposition of triples.
GNU assembly language uses inline assembly to extend ASM
How to generate random data for interface testing
Promiz初探
Class loading process
随机推荐
Swoole 如何使用 Xdebug 进行单步调试
SEO industry, what are the 10 pieces of good advice worth collecting?
阿里巴巴开发手册强制使用SLF4J作为门面担当的秘密,我搞清楚了
子线程调用invalidate()产生“Only the original thread that created a view hierarchy can touch its views.”原因分析
How to view the establishment of the new retail business department of Alibaba cloud?
How to generate random data for interface testing
Notes on Python cookbook 3rd (2.3): matching strings with shell wildcards
Cloud database as low as 20% off, have you seen it?
C++ 标准库头文件
Git (1) -- data model
How to use Xdebug for single step debugging
.MD语法入门
GNU assembly basic mathematical equations multiplication
Centos7 rsync+crontab 定时备份
【CCPC】2020CCPC长春 F - Strange Memory | 树上启发式合并(dsu on a tree)、主席树
Leetcode 5561. Get the maximum value in the generated array
Centos7 local source Yum configuration
Notes on Python cookbook 3rd (2.4): string matching and searching
店铺笔记
腾讯云TBase在分布式HTAP领域的探索与实践