当前位置:网站首页>Generalized linear model (logistic regression, Poisson regression)
Generalized linear model (logistic regression, Poisson regression)
2022-06-26 04:50:00 【I am a little monster】
The linear regression model is not suitable for all cases , Some results may contain metadata ( For example, positive and negative ) Or counting data , Generalized linear models can be used to interpret such data , The linear combination of independent variables is still used .
Catalog
Logical regression
When the response variable is binary , Logistic regression is often used to model data .
The following data comes from pandas Make use of the data provided , Download here if necessary https://download.csdn.net/download/qq_57099024/79301082
import pandas as pd
d=pd.read_csv('D:/pandas Flexible use /pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)# Output special symbols to distinguish between two outputs
print(d.head())
''' Here is the output :
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
'HeatingFuel', 'Insurance', 'Language'],
dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople \
0 1-10 150 Married 4 1 3
1 1-10 180 Female Head 3 2 4
2 1-10 280 Female Head 4 0 2
3 1-10 330 Female Head 2 1 2
4 1-10 330 Male Head 3 1 2
NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt \
0 9 Single detached 1 0 Mortgage 1950-1959
1 6 Single detached 2 0 Rented Before 1939
2 8 Single detached 3 1 Mortgage 2000-2004
3 4 Single detached 1 0 Rented 1950-1959
4 5 Single attached 1 0 Mortgage Before 1939
HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language
0 1800 90 No Gas 2500 English
1 850 90 No Oil 0 English
2 2600 260 No Oil 6600 Other European
3 1800 140 No Oil 0 English
4 860 150 No Gas 660 Spanish '''The following for FamilyIncome Carry out box splitting operation :
d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
d['income_15w']=d['income_15w'].astype(int)Use cut Split operation , Create a binary response variable _ I am a little monster blog -CSDN Blog
Use statsmodels
import statsmodels.formula.api as smf
model=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)
results=model.fit()
print(results.summary())Optimization terminated successfully.
Current function value: 0.391651
Iterations 7
Logit Regression Results
==============================================================================
Dep. Variable: income_15w No. Observations: 22745
Model: Logit Df Residuals: 22737
Method: MLE Df Model: 7
Date: Sat, 05 Feb 2022 Pseudo R-squ.: 0.2078
Time: 08:46:18 Log-Likelihood: -8908.1
converged: True LL-Null: -11244.
Covariance Type: nonrobust LLR p-value: 0.000
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept -5.8081 0.120 -48.456 0.000 -6.043 -5.573
OwnRent[T.Outright] 1.8276 0.208 8.782 0.000 1.420 2.236
OwnRent[T.Rented] -0.8763 0.101 -8.647 0.000 -1.075 -0.678
FamilyType[T.Male Head] 0.2874 0.150 1.913 0.056 -0.007 0.582
FamilyType[T.Married] 1.3877 0.088 15.781 0.000 1.215 1.560
HouseCosts 0.0007 1.72e-05 42.453 0.000 0.001 0.001
NumWorkers 0.5873 0.026 22.393 0.000 0.536 0.639
NumBedrooms 0.2365 0.017 13.985 0.000 0.203 0.270
==================================================================================
Use sklearn
predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)
from sklearn import linear_model
lr=linear_model.LogisticRegression()
results=lr.fit(X=predictors,y=d['income_15w'])
print(results.coef_)
print('-*-'*10)
print(results.intercept_)[[ 5.86894916e-04 7.32489391e-01 2.86764784e-01 7.17542587e-02 -2.13282748e+00 -1.03910262e+00 2.63647146e-01]] -*--*--*--*--*--*--*--*--*--*- [-4.86108187]
Poisson's return
It is often used for counting data analysis
Use statsmodels
results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()
print(results.summary())Optimization terminated successfully.
Current function value: nan
Iterations 1
Poisson Regression Results
==============================================================================
Dep. Variable: NumChildren No. Observations: 22745
Model: Poisson Df Residuals: 22739
Method: MLE Df Model: 5
Date: Sat, 05 Feb 2022 Pseudo R-squ.: nan
Time: 09:05:28 Log-Likelihood: nan
converged: True LL-Null: -30977.
Covariance Type: nonrobust LLR p-value: nan
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept nan nan nan nan nan nan
FamilyType[T.Male Head] nan nan nan nan nan nan
FamilyType[T.Married] nan nan nan nan nan nan
OwnRent[T.Outright] nan nan nan nan nan nan
OwnRent[T.Rented] nan nan nan nan nan nan
FamilyIncome nan nan nan nan nan nan
==================================================================================Negative binomial regression
If the assumption of Poisson regression is not ideal ( For example, the data is excessively discrete ), Negative binomial regression can be used instead of
statsmodels Of GLM The document is listed and can be passed in GLM Many distribution families of parameters , Can be found in sm.familiese.<FAMILY>.links Find connection function under ::
Binomial( Binomial distribution )
Gamma( Gamma distribution )
InverseGaussian( Inverse Gaussian distribution )
NegativeBinomial( Negative binomial distribution )
Poisson( Poisson distribution )
Tweedie Distribution
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
model=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))
results=model.fit()
print(results.summary()) Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: NumChildren No. Observations: 22745
Model: GLM Df Residuals: 22739
Model Family: NegativeBinomial Df Model: 5
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -29749.
Date: Sat, 05 Feb 2022 Deviance: 20731.
Time: 10:06:21 Pearson chi2: 1.77e+04
No. Iterations: 6
Covariance Type: nonrobust
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept -0.3345 0.029 -11.672 0.000 -0.391 -0.278
FamilyType[T.Male Head] -0.0468 0.052 -0.905 0.365 -0.148 0.055
FamilyType[T.Married] 0.1529 0.029 5.200 0.000 0.095 0.211
OwnRent[T.Outright] -1.9737 0.243 -8.113 0.000 -2.450 -1.497
OwnRent[T.Rented] 0.4164 0.030 13.754 0.000 0.357 0.476
FamilyIncome 5.398e-07 9.55e-08 5.652 0.000 3.53e-07 7.27e-07
=================================================================================
边栏推荐
- Statsmodels Library -- linear regression model
- LISP programming language
- Genius makers: lone Rangers, technology giants and AI | ten years of the rise of in-depth learning
- Tips for using idea
- DBeaver 安装及配置离线驱动
- Rsync common error messages (common errors on the window)
- 2022 talent strategic transformation under the development trend of digital economy
- Yapi cross domain request plug-in installation
- #微信小程序# 在小程序里面退出退出小程序(navigator以及API--wx.exitMiniProgram)
- 1.13 learning summary
猜你喜欢

ModuleNotFoundError: No module named ‘numpy‘

Use fill and fill in Matplotlib_ Between fill the blank area between functions

Thinkphp6 implements a simple lottery system

图像翻译/GAN:Unsupervised Image-to-Image Translation with Self-Attention Networks基于自我注意网络的无监督图像到图像的翻译

pycharm 导包错误没有警告

PHP small factory moves bricks for three years - interview series - my programming life

Advanced learning of MySQL (learning from Shang Silicon Valley teacher Zhou Yang)

Illustration of ONEFLOW's learning rate adjustment strategy

Why do many Shopify independent station sellers use chat robots? Read industry secrets in one minute!

ROS notes (07) - Implementation of client and server
随机推荐
Multipass Chinese document - share data with instances
Interpretation of yolov5 training results
Zhimeng CMS will file a lawsuit against infringing websites
Floyd
Multipass中文文档-设置驱动
#微信小程序# 在小程序里面退出退出小程序(navigator以及API--wx.exitMiniProgram)
File upload and security dog
2.8 learning summary
ModuleNotFoundError: No module named ‘numpy‘
1.13 learning summary
torchvision_transform(图像增强)
Multipass中文文档-移除实例
Create alicloud test instances
Motivational skills for achieving goals
Sklearn Library -- linear regression model
Thymeleaf data echo, single selection backfill, drop-down backfill, time frame backfill
How to use the configured slave data source for the scheduled task configuration class scheduleconfig
广和通联合安提国际为基于英伟达 Jetson Xavier NX的AI边缘计算平台带来5G R16强大性能
1.11 learning summary
为什么许多shopify独立站卖家都在用聊天机器人?一分钟读懂行业秘密!