当前位置:网站首页>Indoor user time series data classification baseline of 2020 ccfbdci training competition
Indoor user time series data classification baseline of 2020 ccfbdci training competition
2020-11-10 11:27:00 【osc-u 2koojuzp】
Indoor user time series data classification
Introduction to the contest question
Title of competition : Indoor user movement time series data classification
Track : Training track
background : With the accumulation of data , The processing demand of massive time series information is becoming increasingly prominent . As one of the important tasks in time series analysis , Time series classification is widely used and diverse . The purpose of time series classification is to assign a discrete marker to the time series . Traditional feature extraction algorithm uses statistical information in time series as the basis of classification . In recent years , Time series classification based on deep learning has made great progress . Based on the end-to-end feature extraction method , Deep learning can avoid tedious artificial feature design . How to classify time series effectively , From a complex data set, a sequence with a certain form is assigned to the same set , It is of great significance for academic research and industrial application .
Mission : Based on the above actual needs and the progress of deep learning , This training competition aims to build a general time series classification algorithm . Establish an accurate time series classification model through this question , I hope you will explore a more robust representation of time series features .
Match Links :https://www.datafountain.cn/competitions/484
Data brief
The data is collated from open data sets on the Internet UCI( Desensitized ), The dataset covers 2 Class different time series , This kind of dataset is widely used in business scenarios of time series classification .
File category | file name | The contents of the document |
---|---|---|
Training set | train.csv | Training dataset tag file , label CLASS |
Test set | test.csv | Test dataset tag file , No label |
Field description | Field description .xlsx | Training set / Test set XXX Specific description of the fields |
Submit sample | Ssample_submission.csv | There are only two fields ID\CLASS |
Data analysis
This question is a question of dichotomy , By observing the training set data , It turns out that the amount of data is very small (210 individual ) And it has a lot of features (240 individual ), And for the tag value of training data ,0 and 1 It's very evenly distributed ( About half of each ). Based on this , The use of direct neural network model will lead to too many parameters to be trained, so as to obtain unsatisfactory results . And use the tree model , Some hyperparameters need to be adjusted to fit the data , It's also complicated . Comprehensive analysis above , In this paper, we consider using the simplest support vector machine for classification , The results show that good results have been obtained .
Baseline Program
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.svm import SVR
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
# Separate data sets
X_train_c = train.drop(['ID','CLASS'], axis=1).values
y_train_c = train['CLASS'].values
X_test_c = test.drop(['ID'], axis=1).values
nfold = 5
kf = KFold(n_splits=nfold, shuffle=True, random_state=2020)
prediction1 = np.zeros((len(X_test_c), ))
i = 0
for train_index, valid_index in kf.split(X_train_c, y_train_c):
print("\nFold {}".format(i + 1))
X_train, label_train = X_train_c[train_index],y_train_c[train_index]
X_valid, label_valid = X_train_c[valid_index],y_train_c[valid_index]
clf=SVR(kernel='rbf',C=1,gamma='scale')
clf.fit(X_train,label_train)
x1 = clf.predict(X_valid)
y1 = clf.predict(X_test_c)
prediction1 += ((y1)) / nfold
i += 1
result1 = np.round(prediction1)
id_ = range(210,314)
df = pd.DataFrame({
'ID':id_,'CLASS':result1})
df.to_csv("baseline.csv", index=False)
Submit results
Submit baseline, The score is 0.83653846154.
Because of the 50% discount on the data , So the score of the submitted results will fluctuate a little .
版权声明
本文为[osc-u 2koojuzp]所创,转载请带上原文链接,感谢
边栏推荐
- GNU assembly basic mathematical equations multiplication
- 【goang】 sync.WaitGroup Detailed explanation
- STATISTICS STATS 380
- 2013-2019年“一带一路”沿线国家在华投资量 - 知乎
- MFC界面开发帮助文档——BCG如何在工具栏上放置控件
- Centos7 local source Yum configuration
- 使用 c++ 模板显示实例化解决模板函数声明与实现分离的问题
- Express learning notes (MOOC)
- 监控系统选型,这篇不可不读!
- ElasticSearch 集群基本概念及常用操作汇总(建议收藏)
猜你喜欢
双十一秒杀系统这你抢得过吗?
如何看待阿里云成立新零售事业部?
Taulia launches international payment terms database
Multibank group announced record financial results with gross profit of $94 million in the first three quarters of 2020
店铺笔记
Magicodes.IE 3.0重磅设计畅谈
Swoole v4.5.7 版本发布,新增--enable-swoole-json编译选项
Harbor项目高手问答及赠书活动火热进行中
To speed up the process of forming a global partnership between lifech and Alibaba Group
[operation tutorial] introduction and opening steps of easygbs subscription function of national standard gb28181 protocol security video platform
随机推荐
【iOS】苹果登录Sign in with Apple
Version 4.5.7 of swoole was released, and the -- enable swote JSON compilation option was added
Understanding of learning to estimate 3D hand pose from single RGB images
Thoroughly understand the prototype of JS prototype chain__ proto__ And constructor (2)
layer.prompt(options, yes) - 输入层
Understanding recursion with examples
Harbor项目高手问答及赠书活动火热进行中
Book City Project: phase 1
Do not understand the code, can type can build a station? 1111 yuan gift bag to help you with one stop!
专业之旅——GitHub 热点速览 Vol.45
How to view the establishment of the new retail business department of Alibaba cloud?
C语言使用随机数生成矩阵,实现三元组的快速转置。
Exploration and practice of Tencent cloud tbase in the field of distributed HTAP
腾讯云TBase在分布式HTAP领域的探索与实践
Cloud database as low as 20% off, have you seen it?
PAT_甲级_1032 Sharing
抖音Api:视频评论列表
2020即将过去,新的一年不考虑入手一个超好用的云速建站吗?
GNU assembly language uses inline assembly to extend ASM
他把闲鱼APP长列表流畅度翻了倍