当前位置:网站首页>Simple example of logistic regression for machine learning
Simple example of logistic regression for machine learning
2022-06-11 22:06:00 【Chrn morning】
logistic Is a linear classifier , For the linear separable problem . utilize logistic The main idea of regression classification is : According to the existing data, the regression formula is established for the classification boundary line , This is used for classification . there “ Return to ” The term comes from best fit , Means to find the best fitting parameter set , therefore ,logistic The way to train the classifier is to find the best fitting parameter , The optimization method is used .
for example : In the case of two classes , Output function 0 or 1, This function is a binary classifier sigmoid function ;


As illustrated x by 0 when ,sigmoid The value of is 0.5, With x The increase of , Corresponding sigmoid Function valued approximation 1, With x Reduction of ,sigmoid The value of a function approximates 0
So in order to implement a logistic Regression classifier , You can multiply each feature by a regression coefficient , Then add up all the values , Substitute this sum into sigmoid Function to get a 0-1 The number of ranges , Any more than 0.5 The data is classified as 1 class , Then less than 0.5 The data is classified as 0 class ,logistic It can be regarded as a kind of probability estimation .
for example : In function f(x)=a*x+b in , To compress the entire target value into (0,1) in , introduce logistic function , So there is 
logistic The general steps of regression : collecting data , Prepare the data , Analyze the data , Training algorithm , The test algorithm , Usage algorithm .
The following is used Python The code is good / Cancer prediction practice .
The original data download address is :https://archive.ics.uci.edu/ml/machine-learing-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data
import pandas as pd
import numpy as np
# Create a feature list
column_names=['Sample code number','Clump Thickness','Unigormity og Cell Size',
'Uniformity of Cell Shape','Marginal Adhesion','Single Epithlital CellSize',
'Bare Nuclei','Bland Chromation','Normal Nucleoli','Mitoses','Class']
data=pd.read_csv('https://archive.ics.uci.edu/ml/machine-learing-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data',names=column_names)
data.describle()
# Data preprocessing
# use no Replace with standard missing value
data=data.replace(to_replace='no',value=np.nan)
# Delete data with missing values
data=data.dropna()
# Descriptive analysis of data
data.describle()
# use 25% As a parameter set ,75% As a training set
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(data[column_names[1:10]],data[column_names[10]],
test_size=0.25,random_state=33)
# Query the number and category of training samples
y_train.value_counts()
# Query the number and category of test samples
y_test.value_counts()
# use Loistic Regression training on the above data
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# Standardize data
#StandardScaler function : Make each column of data in the dataset ( That is, each characteristic data ) Are unified and standardized
s=StandardScaler()
x_train=s.fit_transform(x_train)
x_test=s.fit_transform(x_test)
# Initializing the regressor SGDClassifier
lr=LogisticRegression()
# Training models
lr.fit(x_train,y_train)
# forecast
lr_pred=lr.predict(x_test)
# Model performance evaluation
from sklearn.metrics import classification_report
#classification_report The function of is to display the text report of the main classification indicators
# Use the built-in scoring function
print('Accuracy of LR Classifier:',lr.score(x_test,y_test))
边栏推荐
- Daily question - Roman numeral to integer
- STM32 development note 113:ads1258 drive design - reading temperature value
- Learning bit segment (1)
- [Yu Yue education] basic engineering English of Zhejiang industrial and Commercial University (wuyiping) reference materials
- 揭秘爆款的小程序,为何一黑到底
- C language to achieve eight sorts (2)
- 超标量处理器设计 姚永斌 第2章 Cache --2.2 小节摘录
- 自定义实现offsetof
- 67. 二进制求和
- [today in history] June 11: the co inventor of Monte Carlo method was born; Google launched Google Earth; Google acquires waze
猜你喜欢

SVN本地部署server和cleint 并用阿里云盘自动备份

Internet of things development practice 18 scenario linkage: how does an intelligent light perceive light? (I) (learning notes)

超标量处理器设计 姚永斌 第2章 Cache --2.4 小节摘录

C language implements eight sorts of sort merge sort

图的基本操作(C语言)
![[Yu Yue education] calculus of Zhejiang University in autumn and winter 2021 (I) reference materials](/img/0a/58df3fd771d58c66245397d131fa53.png)
[Yu Yue education] calculus of Zhejiang University in autumn and winter 2021 (I) reference materials

On the night of the joint commissioning, I beat up my colleagues

详解异步任务:函数计算的任务触发去重

How to view the installation date of the win system

One question per day -- verifying palindrome string
随机推荐
Look for leap years and see how many leap years I have had since I was born (I have had five)
Take off efficiently! Can it be developed like this?
The shortcomings of the "big model" and the strengths of the "knowledge map"
高考结束,人生才刚刚开始,10年职场老鸟给的建议
All inherited features
Nmap进行主机探测出现网段IP全部存活情况分析
Daily question - Roman numeral to integer
如果重来一次高考,我要好好学数学!
[niuke.com] DP30 [template] 01 Backpack
STM32 development note 113:ads1258 drive design - reading temperature value
Explain asynchronous tasks in detail: the task of function calculation triggers de duplication
大学三年应该这样过
Custom implementation offsetof
Tkinter学习笔记(二)
STM32 Development Notes 112:ads1258 driver design - read register
206.反转链表
Nmap performs analysis of all network segment IP survivals in host detection
One question per day -- verifying palindrome string
快速排序的优化
C language implements eight sorts of sort merge sort