当前位置:网站首页>Statsmodels Library -- linear regression model

Statsmodels Library -- linear regression model

2022-06-26 04:44:00 I am a little monster

Catalog

Simple linear regression model

Multiple linear regression model


Yes statsmodels The linear regression model collation of the library can be compared with sklearn The arrangement of the linear regression model of the library can be used for reference ​​​​​​​https://blog.csdn.net/qq_57099024/article/details/122324764icon-default.png?t=LBL2https://blog.csdn.net/qq_57099024/article/details/122324764

Simple linear regression model

import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')# download seaborn Native data set tips
print(tips.head())# View the acquired dataset tips The first five elements of 
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
# Specify the model , To the left of the wave line is the response variable , On the right is the independent variable 
model=smf.ols(formula='tip~total_bill',data=tips)
# Use fit Methods to fit the model 
results=model.fit()
# Use summary Method to view the results of the fitted model 
print(results.summary())
OLS Regression Results                            
==============================================================================
Dep. Variable:                    tip   R-squared:                       0.457
Model:                            OLS   Adj. R-squared:                  0.454
Method:                 Least Squares   F-statistic:                     203.4
Date:                Wed, 05 Jan 2022   Prob (F-statistic):           6.69e-34
Time:                        14:34:42   Log-Likelihood:                -350.54
No. Observations:                 244   AIC:                             705.1
Df Residuals:                     242   BIC:                             712.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.9203      0.160      5.761      0.000       0.606       1.235
total_bill     0.1050      0.007     14.260      0.000       0.091       0.120
==============================================================================
Omnibus:                       20.185   Durbin-Watson:                   2.151
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               37.750
Skew:                           0.443   Prob(JB):                     6.35e-09
Kurtosis:                       4.711   Cond. No.                         53.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The results include the Intercept( intercept ) and total_bill. With these parameters, we can get the linear equation y=0.105x+0.920. These figures can be interpreted as :total_bill Every additional unit ( That is, the amount of each consumption increase 1 dollar ), Consumption increases 0.105 Units if you only need a coefficient , Can end results Of params Property to get .

print(results.params)

Intercept     0.920270
total_bill    0.105025
dtype: float64

Multiple linear regression model

statsmodels Will automatically create a dummy variable , And remove reference variables to avoid multicollinearity , For example, gender is divided into male and female , Then the system will select the first male as the reference variable , After deletion, the male column will not be converted into a dummy variable , It will not become a factor that affects the response variable

import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')
print(tips.head())
print('----'*10)# Output horizontal lines to distinguish output 
# Use the plus sign to pass multiple arguments into 
model=smf.ols(formula='tip~total_bill+size+sex+smoker+day+time',data=tips)
results=model.fit()
print(results.summary())
print('----'*10)
print(results.params)
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
----------------------------------------
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    tip   R-squared:                       0.470
Model:                            OLS   Adj. R-squared:                  0.452
Method:                 Least Squares   F-statistic:                     26.06
Date:                Wed, 05 Jan 2022   Prob (F-statistic):           1.20e-28
Time:                        16:27:12   Log-Likelihood:                -347.48
No. Observations:                 244   AIC:                             713.0
Df Residuals:                     235   BIC:                             744.4
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.5908      0.256      2.310      0.022       0.087       1.095
sex[T.Female]      0.0324      0.142      0.229      0.819      -0.247       0.311
smoker[T.No]       0.0864      0.147      0.589      0.556      -0.202       0.375
day[T.Fri]         0.1623      0.393      0.412      0.680      -0.613       0.937
day[T.Sat]         0.0408      0.471      0.087      0.931      -0.886       0.968
day[T.Sun]         0.1368      0.472      0.290      0.772      -0.793       1.066
time[T.Dinner]    -0.0681      0.445     -0.153      0.878      -0.944       0.808
total_bill         0.0945      0.010      9.841      0.000       0.076       0.113
size               0.1760      0.090      1.966      0.051      -0.000       0.352
==============================================================================
Omnibus:                       27.860   Durbin-Watson:                   2.096
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               52.555
Skew:                           0.607   Prob(JB):                     3.87e-12
Kurtosis:                       4.923   Cond. No.                         281.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
----------------------------------------
Intercept         0.590837
sex[T.Female]     0.032441
smoker[T.No]      0.086408
day[T.Fri]        0.162259
day[T.Sat]        0.040801
day[T.Sun]        0.136779
time[T.Dinner]   -0.068129
total_bill        0.094487
size              0.175992
dtype: float64
原网站

版权声明
本文为[I am a little monster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180510154185.html