当前位置:网站首页>股票价格走势的行业关联性
股票价格走势的行业关联性
2022-07-24 05:13:00 【手撕易拉罐】
前言:
市场行情概览一般是从各行业指数/ETF的涨跌开始研究,行业内部各股票价格涨跌有较强的一致性,只是强弱有所差异,本文对A股上市股票基于申万行业分级进行分组,对各组内的个股研究两两相关性,供股票交易参考。
股票申万分级:
申万分级是申万宏源证券对A股上市公司根据其经营业务进行的行业划分,划分粒度分为一二三级。具体分类方式隔几年有所微调,链接是已经爬取好的A股申万分级结果,无需积分C币即可下载:
https://download.csdn.net/download/weixin_37598719/85238607
A股上市公司股价获取:
可以通过安装tushare模块并注册,即可进行包括但不限于股价的上市公司各种资料的获取,具体注册、使用教程见tushare官网网站链接:https://tushare.pro/register?reg=510562
关联性分析:
主要包括
1.动态时间扭曲(DTW):可以对不同长度的序列分析(用距离d表示,d越小则相关性越强)
2.斯皮尔曼相关系数:对秩分析(值域[-1,1],为正表示正相关,越大则相关性越强,下同)
3.皮尔逊相关系数:对离群点敏感,先对数据进行标准化
以上方法的具体原理自行百度,python代码如下:
相关性实现函数:
def pearsonr_spearman_cor(df, fill=True, standard=True):
"""计算df列之间的pearsonr、spearman相关系数"""
# 空值填充(计算DTW可以不用填充)
if fill:
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
# 按列标准化(计算pearsonr要进行标准化)
if standard:
for col in df:
if col != "trade_date":
df[[col]] = StandardScaler().fit_transform(df[[col]])
# 计算相关系数
print(df.corr())
# sns.heatmap(df.corr(), annot=True)
# plt.show()
print(df.corr(method="spearman"))
# sns.heatmap(df.corr(method="spearman"), annot=True)
# plt.show()
def dtw_cor(df, fill=True, standard=True):
"""计算df列之间的dtw距离"""
# 空值填充(计算DTW可以不用填充)
if fill:
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
# 按列标准化(计算pearsonr要进行标准化)
if standard:
for col in df:
if col != "trade_date":
df[[col]] = StandardScaler().fit_transform(df[[col]])
# 计算相关系数
columns = [col for col in df.columns if col !="trade_date"]
res_lst = {
}
for col1, col2 in itertools.combinations(columns, 2):
d, cost_matrix, acc_cost_matrix, path = accelerated_dtw(df[[col1]], df[[col2]], dist='euclidean')
# plt.imshow(acc_cost_matrix.T, origin='lower', cmap='gray', interpolation='nearest')
# plt.plot(path[0], path[1], 'w')
# plt.xlabel(col1)
# plt.ylabel(col2)
# plt.title(f'DTW Minimum Path with minimum distance: {np.round(d,2)}')
# plt.show()
# print("{} and {} DTW distance:".format(col1, col2), round(d, 2))
res_lst.update({
"{}_{}".format(col1, col2): round(d, 2)})
# 将距离d从小到大缩放到[1,-1]
srcRange = (min(res_lst.values()), max(res_lst.values())) # 原始范围
dstRange = (1, -1) # 对应的目标范围
for key, val in res_lst.items():
res_lst[key] = (res_lst[key] - srcRange[0]) * (dstRange[1] - dstRange[0]) / (srcRange[1] - srcRange[0]) + dstRange[0]
# 以矩阵的形式打印
res_df = pd.DataFrame(columns=columns, index=columns)
for key, val in res_lst.items():
res_df.loc[key.split("_")[0], key.split("_")[1]] = val
res_df.loc[key.split("_")[1], key.split("_")[0]] = val
print(res_df)
完整代码如下:
import pandas as pd
import itertools
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from scipy import stats
import numpy as np
from dtw import dtw, accelerated_dtw
from industry_index import sw_class
# 读取日线数据
total_daily_df = pd.read_csv('daily_price.csv', usecols=["ts_code", "trade_date", "close"])
# 获取股票申万分级信息
# sw1_class sw2_class sw3_class code name
sw_filepath = "sw3.txt"
df_sw = sw_class(sw_filepath)
# merge
total_daily_df["code"] = total_daily_df["ts_code"].apply(lambda x: x[:-3])
total_daily_df = total_daily_df.merge(df_sw, on='code', how="left")
total_daily_df.sort_values(["ts_code", "trade_date"], inplace=True)
total_daily_df.drop_duplicates(subset=["ts_code", "trade_date"], inplace=True)
"""对每个行业进行相关性分析"""
for sw_code, daily_df in total_daily_df.groupby(["sw3_class"], as_index=False): # 此处按申万三级行业分类进行分组
# 数据行转列
daily_df_T = pd.DataFrame()
for ts_code, sub_df in daily_df.groupby(["ts_code"], as_index=False):
sub_df = sub_df[["trade_date", "close"]]
sub_df.rename(columns={
'close': ts_code}, inplace=True)
if daily_df_T.shape[0] == 0:
daily_df_T = sub_df
else:
daily_df_T = daily_df_T.merge(sub_df, on="trade_date", how="outer")
# 按时间排序
daily_df_T.sort_values(["trade_date"], inplace=True)
# 删除日期列
del daily_df_T["trade_date"]
try:
# pearsonr_spearman相关性分析
pearsonr_spearman_cor(daily_df_T, fill=True, standard=True)
# DTW相关性分析
dtw_cor(daily_df_T, fill=True, standard=True)
print("*********** end sw3 industry {} ***********".format(sw_code))
except:
continue
边栏推荐
- Chiitoitsu(期望dp)
- [Huang ah code] Introduction to MySQL - 3. I use select *, and the boss directly rushed me home by train, but I still bought a station ticket
- Bear market bottoming Guide
- 编译型语言和解释型语言的区别
- MapReduce介绍
- Hcip-- review the homework for the next day
- Support complex T4 file systems such as model group monitoring and real-time alarm. e
- What programmer is still being grabbed by the company at the age of 35? Breaking the "middle-aged crisis" of programmers
- 1. Input a 100 point score from the keyboard and output its grade according to the following principles: score ≥ 90, Grade A; 80 ≤ score < 90, grade B; 70 ≤ score < 80, grade C; 60 ≤ score < 70, grade
- MySQL transaction and its problems and isolation level
猜你喜欢
![[Huang ah code] Introduction to MySQL - 3. I use select *, and the boss directly rushed me home by train, but I still bought a station ticket](/img/60/23fc79cf0e399265b4bd75159ad4d1.png)
[Huang ah code] Introduction to MySQL - 3. I use select *, and the boss directly rushed me home by train, but I still bought a station ticket

Heavy! The 2022 China open source development blue book was officially released
![[postgraduate entrance examination vocabulary training camp] day 10 - capital, expand, force, adapt, depand](/img/9a/a218c46806cf286f0518a72809e084.png)
[postgraduate entrance examination vocabulary training camp] day 10 - capital, expand, force, adapt, depand

Hcip-- review the homework for the next day

Chapter III encog workbench

Event extraction and documentation (2020-2021)

In his early 30s, he became a doctoral director of Fudan University. Chen Siming: I want to write both codes and poems

Jiang Xingqun, senior vice president of BOE: aiot technology enables enterprise IOT transformation

Recursive cascade network: medical image registration based on unsupervised learning

Event extraction and documentation (2019)
随机推荐
The x-fkgom supporting the GOM engine key.lic is authorized to start
MySQL transaction and its problems and isolation level
PSO and mfpso
Learning pyramid context encoder network for high quality image painting paper notes
)的低字节来反馈给应用层或者成多种格式文档:
Crazy God redis notes 09
The opening ceremony of the 2022 Huawei developer competition in China kicked off!
Kingbase v8r6 cluster installation and deployment case - script online one click capacity reduction
postgresql:在Docker中运行PostgreSQL + pgAdmin 4
网NN计算能主机系统资e提供的NTCP
Yum to see which installation package provides a command
How to solve the engine prompt alias herodb and game engine startup exceptions?
Chapter 0 Introduction to encog
Kingbase v8r6 cluster installation and deployment case - script online one click expansion
Drools 开发决策表
Markov random field: definition, properties, maximum a posteriori probability problem, energy minimization problem
Two methods of modifying configuration files in PHP
[Huang ah code] Introduction to MySQL - 3. I use select *, and the boss directly rushed me home by train, but I still bought a station ticket
Format problem handling
SHP building contour data restoration, 3D urban white film data production