当前位置:网站首页>[semi supervised classification] semi supervised web page classification based on K-means and label+propagation
[semi supervised classification] semi supervised web page classification based on K-means and label+propagation
2022-06-10 03:56:00 【FPGA and MATLAB】
1. Software version
matlab2013b
2. Theoretical knowledge of this algorithm
First “K Mean algorithm ” and “ Based on local and global consistency algorithm ” Integration of , It is not a simple patchwork of two algorithms , here , It actually combines “K Mean algorithm ” and “ Based on local and global consistency algorithm ” The idea of both algorithms . According to the algorithm idea you provided , The basic steps of the algorithm are :
-----------------------------------------------------------------------------------------------------
Input : Data sets ( Training samples and test samples respectively account for a certain proportion ) And images in which a small number of samples have been labeled as categories , And each class is marked with at least one training sample .
-----------------------------------------------------------------------------------------------------
Step1: Calculate the mean value of a small number of labeled samples , obtain c( Number of categories ) Initial cluster centers ;
Step2: Use Euclidean distance to calculate unlabeled data to c The distance between two initial centers , Assign unlabeled samples to the category closest to the center point , Divide into c A cluster of ;
Step3: Use Geodesic distance The similarity measure of , choice Similarity in each cluster Greater than or equal to 0.9 Of
individual ( The number in each cluster is different ) sample , Find their mean , As c A new central point and get c Average radius ;
Step4: loop (2)(3), until c The center points are fixed ;
Step5: Yes
Samples and samples within the radius from each center point are marked ;
Step6: The remaining unlabeled samples are marked with local and global consistency algorithms , Where marked data only uses c A central point ;( There are ready-made procedures )
Step7: After all samples are marked , And then calculate the c A central point .
Step8: For new test data , By calculating the similarity between the test data and each central point , Select the one with the highest confidence to mark .
-----------------------------------------------------------------------------------------------------
Output : The data set is divided into marked and unlabeled data sets and test data sets , Test data set accounts for 30% The proportion of , Marked and unlabeled shares 70%. use 10 Fold and cross validation method , Output F1-measure The results of each indicator , Output classified images and index results . Take the marked data as the training set , Ensure that each category has a tagged training set , Then expand the training set according to different proportions , Of a dataset precision and recall The test results are the mean values of unlabeled data and test data . Test the data set according to the different proportion of marked data .
3. Introduction to programming
here , According to each step , Write the corresponding code , And output the final result :
Step1: Calculate the mean value of a small number of labeled samples , obtain c( Number of categories ) Initial cluster centers ;

This step in the code is step1 Main content
Step2: Use Euclidean distance to calculate unlabeled data to c The distance between two initial centers , Assign unlabeled samples to the category closest to the center point , Divide into c A cluster of ;

Through this step , You can make a preliminary classification of the data .
Step3: Use Geodesic distance The similarity measure of , choice Similarity in each cluster Greater than or equal to 0.9 Of () individual ( The number in each cluster is different ) sample , Find their mean , As c A new central point and get c Average radius ;
Step4: loop (2)(3), until c The center points are fixed ;
Because of the cosine similarity in this paper , So both of these methods are done .
Step5: Yes () Samples and samples within the radius from each center point are marked ;
About this step , The radius is calculated in the code , But for the convenience of the later process , Mark the classified data with the cluster number , Those without classification are numbered as 0;

Step6: The remaining unlabeled samples are marked with local and global consistency algorithms , Where marked data only uses c A central point ;

Step7: After all samples are marked , And then calculate the c A central point .

Step8: For new test data , By calculating the similarity between the test data and each central point , Select the one with the highest confidence to mark .

The running results are as follows :


The classification results and classification accuracy of the test set .


First, input the picture into the picture , Feature extraction , Then classify the features , Finally, classify the pictures to be tested , And calculate the classification accuracy .
The operation results are as follows :


A09-18
边栏推荐
- 作为软件测试工程师,给年轻时的自己的建议(上)
- yolov5目标检测神经网络——损失函数计算原理
- 【PyTorch预训练模型修改、增删特定层】
- Pytorch CPU/GPU 安装方法。
- Error code of text broadcast diagram
- [loss calculation in yolov3]
- Code writing method of wechat applet search box
- 外观设计产品用途
- Business card wechat applet error version 2
- Find and replace good articles from the Internet in vim/vi
猜你喜欢

用80%的图表满足日常工作,用剩下20%的图表建立核心竞争力!
![[cloud native | kubernetes] learn more about ingress](/img/b8/854f862e03241f6c2d55a901da6146.png)
[cloud native | kubernetes] learn more about ingress

工作8年月薪8000四年未涨,跳槽拿15000,原老板招不到人苦求回去

ACL 2022 | the latest hot research in NLP field, you must not miss it!
![[calculation method]](/img/59/7488d25f72ffa642de76d9cf743196.png)
[calculation method]
![[actual combat] redis cluster (Part 2) - system version support](/img/dd/c4863d4c34ccc37dad9ed6d0b3413d.jpg)
[actual combat] redis cluster (Part 2) - system version support

Opencv_ 100 questions_ Chapter I (1-5)

Some vulnerabilities and testing methods of security testing
![[MySQL] several methods of multi table Association and foreign key problems, multi table query and sub query](/img/02/2346df3c1ba983db7aa15c43eec3f4.png)
[MySQL] several methods of multi table Association and foreign key problems, multi table query and sub query

反欺诈体系与设备指纹
随机推荐
Biological probe and intelligent verification code based on user behavior
[pytorch modifies the pre training model: there is little difference between the measured loading pre training model and the random initialization of the model]
[loss calculation in yolov3]
“阿里/字节“大厂自动化测试面试题一般会问什么?以及技巧和答案
【计算方法】
【PyTorch预训练模型修改、增删特定层】
Decision engine system & real-time index calculation & risk situation awareness system & risk data list system & fraud intelligence system
From ancient literature to cloud technology
[pytorch model pruning example tutorial 3 (multi parameter and global pruning)]
[pytorch model pruning example tutorial 2 (structured pruning)]
【MySQl】多表关联的几种方式和外键问题、多表查询与子查询
[model compression pruning / quantification / distillation /automl]
Twitter agreed to open the database for musk to check
ACL 2022 | the latest hot research in NLP field, you must not miss it!
Google Earth Engine(GEE)——GPWv411:平均行政单位面积数据集
How to customize ThreadPoolExecutor thread pool gracefully
Microsoft build release - 7 major directions of technical updates concerned by developers
【TFLite, ONNX, CoreML, TensorRT Export】
"Chinese characteristics" of Web3
[paper notes | deep reading] struc2vec: learning node representations from structural identity