当前位置：网站首页>Sklearn Feature Engineering (summary)

Sklearn Feature Engineering (summary)

2022-06-28 05:51:00 【bingbangx】

1、 Feature Engineering

Dictionary feature extraction

from sklearn.feature_extraction import DictVectorizer # Feature extracted packages

Text feature extraction and jieba participle

Text feature extraction , For example, document classification 、 Spam classification and news classification . Text classification is based on whether words exist 、 And the probability of words （ Importance ） To express .

If you want to count the number of Chinese words , It is necessary to segment Chinese words first .jieba

tf-idf Text extraction

It is a commonly used weighting technique for information retrieval and text mining , This statistical method , Used to evaluate the importance of a word in a document .

from sklearn.feature_extraction.text import TfidfVectorizer

Feature Engineering ~ normalization

normalization

X=（x-min)/(max-min)

among ,max and min Are the maximum and minimum values of a column respectively ,x Is the value before normalization .

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
scaler =MinMaxScaler()
data =[
[180,75,35],[175,80,17],[159,50,46],[149,79,45]
]
result =scaler.fit_transform(data)
print(result)

Standardization

from sklearn.preprocessing import StandardScaler # Standardization
scaler=StandardScaler()
result=scaler.fit_transform(data)
print(result)

Feature Engineering - Data dimension reduction

Principal component analysis

Principal component analysis , It is a statistical method . Through orthogonal transformation, a group of variables that may have correlation variables are transformed into a group of linearly uncorrelated variables , The transformed set of variables is called principal component .

The principal components need to remember two things ：

The covariance between the features after dimensionality reduction is 0, Indicates that each feature relationship is independent , Each feature will not change regularly with the change of other features .
The variance of each feature should be as large as possible .

from sklearn.decomposition import PCA
def pca_decomposition():
    pca=PCA(n_components=2)#1、0~1 Between , Scale of dimensions -1;2、 plastic ： Specific dimensions , It has to be for min(n_samples,n_features) within
    result =pca.fit_transform(
    [
        [4,2,76,9],
        [1,192,1,56],
        [34,5,20,90]
    ])
    print(result)
pca_decomposition()

原网站

版权声明
本文为[bingbangx]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206280523024035.html

当前位置：网站首页>Sklearn Feature Engineering (summary)

Sklearn Feature Engineering (summary)

边栏推荐

猜你喜欢

随机推荐