当前位置:网站首页>Feature of sklearn_ extraction. text. CountVectorizer / TfidVectorizer
Feature of sklearn_ extraction. text. CountVectorizer / TfidVectorizer
2022-07-06 12:00:00 【Want to be a kite】
sklearn.feature_extraction: feature extraction
The sklearn.feature_extraction Module processing extracts features from raw data . It currently includes methods of extracting features from text and images .
User guide : For more information , See the feature extraction section .
feature_extraction.DictVectorizer(*[, ...])
Convert the eigenvalue mapping list to a vector .
feature_extraction.FeatureHasher([...])
Implement feature hashing , That is, the hashing technique .
From image
The sklearn.feature_extraction.image The sub module collects utilities to extract features from images .
feature_extraction.image.extract_patches_2d(...)
take 2D The image is reshaped into a set of patches
feature_extraction.image.grid_to_graph(n_x, n_y)
Pixel to pixel connection diagram .
feature_extraction.image.img_to_graph( picture ,*)
Pixel to pixel gradient connection diagram .
feature_extraction.image.reconstruct_from_patches_2d(...)
Rebuild the image from all its patches .
feature_extraction.image.PatchExtractor(*[, ...])
Extract patches from image sets .
From text
The sklearn.feature_extraction.text The sub module collects utilities to build feature vectors from text documents .
feature_extraction.text.CountVectorizer(*[, ...])
Convert a collection of text documents into a token count matrix .
feature_extraction.text.HashingVectorizer(*)
Convert a collection of text documents into a matrix of mark occurrences .
feature_extraction.text.TfidfTransformer(*)
Convert the counting matrix into a standardized tf or tf-idf Express .
feature_extraction.text.TfidfVectorizer(*[, ...])
Convert the original document collection to TF-IDF Characteristic matrix .
sklearn.feature_selection: feature selection
The sklearn.feature_selection The module implements the feature selection algorithm . It currently includes univariate filter selection method and recursive feature elimination algorithm .
User guide : For more information , Please refer to the function selection section .
feature_selection.GenericUnivariateSelect([...])
Univariate feature selector with configurable policy .
feature_selection.SelectPercentile([...])
Select features according to the percentile of the highest score .
feature_selection.SelectKBest([score_func, k])
according to k Highest score selection feature .
feature_selection.SelectFpr([score_func, alpha])
filter : according to FPR Test selection below alpha Of pvalues.
feature_selection.SelectFdr([score_func, alpha])
filter : Choose for the estimated error detection rate p value .
feature_selection.SelectFromModel( It is estimated that ,*)
Meta converter for selecting features based on importance weight .
feature_selection.SelectFwe([score_func, alpha])
filter : Choice and Family-wise error rate Corresponding p value .
feature_selection.SequentialFeatureSelector(...)
A converter that performs sequential feature selection .
feature_selection.RFE( estimator ,*[,...])
Feature ranking with recursive feature elimination .
feature_selection.RFECV( estimator ,*[,...])
Use cross validation for recursive feature elimination to select the number of features .
feature_selection.VarianceThreshold([ critical point ])
Delete the feature selector for all low variance features .
feature_selection.chi2(X, y)
Calculate chi square statistics between each nonnegative feature and class .
feature_selection.f_classif(X, y)
Calculate ANOVA F value .
feature_selection.f_regression(X, y, *[, ...])
return F Statistics and p Univariate linear regression test of value .
feature_selection.r_regression(X, y, *[, ...])
Calculate for each feature and target Pearson Of r.
feature_selection.mutual_info_classif(X, y, *)
Estimate the mutual information of discrete target variables .
feature_selection.mutual_info_regression(X, y, *)
Estimate the mutual information of continuous target variables .
feature_extraction.text.TfidVectorizer
Example :
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = CountVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> vectorizer.get_feature_names_out()
array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',
'this'], ...)
>>> print(X.toarray())
[[0 1 1 1 0 0 1 0 1]
[0 2 0 1 0 1 1 0 1]
[1 0 0 1 1 0 1 1 1]
[0 1 1 1 0 0 1 0 1]]
>>> vectorizer2 = CountVectorizer(analyzer='word', ngram_range=(2, 2))
>>> X2 = vectorizer2.fit_transform(corpus)
>>> vectorizer2.get_feature_names_out()
array(['and this', 'document is', 'first document', 'is the', 'is this',
'second document', 'the first', 'the second', 'the third', 'third one',
'this document', 'this is', 'this the'], ...)
>>> print(X2.toarray())
[[0 0 1 1 0 0 1 0 0 0 0 1 0]
[0 1 0 1 0 1 0 1 0 0 1 0 0]
[1 0 0 1 0 0 0 0 1 1 0 1 0]
[0 0 1 0 1 0 1 0 0 0 0 0 1]]

sklearn.feature_extraction.text.TfidfVectorizer
Example :
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> vectorizer.get_feature_names_out()
array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',
'this'], ...)
>>> print(X.shape)
(4, 9)
边栏推荐
猜你喜欢
随机推荐
mysql实现读写分离
分布式事务的实现方案
I2C总线时序详解
[template] KMP string matching
Contiki源码+原理+功能+编程+移植+驱动+网络(转)
Word排版(小計)
MongoDB
arduino UNO R3的寄存器写法(1)-----引脚电平状态变化
ESP8266通过Arduino IDE连接Onenet云平台(MQTT)
Reading notes of difficult career creation
[NPUCTF2020]ReadlezPHP
几个关于指针的声明【C语言】
arduino获取数组的长度
Selective sorting and bubble sorting [C language]
Variable star user module
Vert. x: A simple login access demo (simple use of router)
A possible cause and solution of "stuck" main thread of RT thread
Pytoch temperature prediction
There are three iPhone se 2022 models in the Eurasian Economic Commission database
OPPO VOOC快充电路和协议
![[template] KMP string matching](/img/f9/cd8b6f8e2b0335c2ec0a76fc500c9b.jpg)







