tetsunosukeのnotebook

tetsunosukeのメモです

[pandas] pandas で 回帰分析

Rで回帰分析 - tetsunosukeのnotebook を pandasでやってみた

特にファイルを読み込む部分がRっぽく書ける

import pandas as pd
>>> data = pd.read_csv("2-1.csv")
>>> data.describe()
          degree       amount
count  12.000000    12.000000
mean   16.391667  1310.833333
std     7.607587   414.355323
min     5.100000   772.000000
25%     9.725000  1127.500000
50%    16.700000  1231.000000
75%    22.750000  1355.000000
max    27.500000  2389.000000

[8 rows x 2 columns]

あとはRのnlsのように...

>>> model = pd.ols(y=data["amount"], x=data["degree"], intercept=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\site-packages\pandas\stats\interface.py", line 135, in o
ls
    return klass(**kwargs)
  File "c:\Python27\lib\site-packages\pandas\stats\ols.py", line 53, in __init__

    import scikits.statsmodels.api as sm
ImportError: No module named scikits.statsmodels.api


ols関数で回帰分析しようとしたら落ちた!

追ってみると、Patsy が入っていなかったのでインストールして再実行

>>> model

-------------------------Summary of Regression Analysis-------------------------


Formula: Y ~ <x> + <intercept>

Number of Observations:         12
Number of Degrees of Freedom:   2

R-squared:         0.6119
Adj R-squared:     0.5731

Rmse:            270.7228

F-stat (1, 10):    15.7685, p-value:     0.0026

Degrees of Freedom: model 1, resid 10

-----------------------Summary of Estimated Coefficients------------------------

      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%

--------------------------------------------------------------------------------

             x    42.6066    10.7296       3.97     0.0026    21.5766    63.6365

     intercept   612.4407   192.4569       3.18     0.0098   235.2251   989.6562

---------------------------------End of Summary---------------------------------

x, interceptの値、42.6066と612.4407を得ることが出来ました。


>>> model.summary_as_matrix
                 x   intercept
beta     42.606567  612.440687
p-value   0.002639    0.009782
std err  10.729551  192.456913
t-stat    3.970955    3.182222

[4 rows x 2 columns]

beta値のところに求めたい値が出ています。

R-Style

Rっぽくモデルを書くことができるので、それを試してみます。

>>> import statsmodels.formua.api as sm
>>> res = sm.ols(formula="amount ~ degree", data=csv).fit()
>>> res.summary2()
<class 'statsmodels.iolib.summary2.Summary'>
"""
               Results: Ordinary least squares
============================================================
Model:                 OLS     AIC:                 170.2930
Dependent Variable:    amount  BIC:                 171.2628
No. Observations:      12      Log-Likelihood:      -83.146
Df Model:              1       F-statistic:         15.77
Df Residuals:          10      Prob (F-statistic):  0.00264
R-squared:             0.612   Scale:               73291.
Adj. R-squared:        0.573
------------------------------------------------------------
           Coef.   Std.Err.   t    P>|t|   [0.025    0.975]
------------------------------------------------------------
Intercept 612.4407 192.4569 3.1822 0.0098 183.6200 1041.2614
degree     42.6066  10.7296 3.9710 0.0026  18.6996   66.5135
------------------------------------------------------------
Omnibus:             4.269      Durbin-Watson:         2.667
Prob(Omnibus):       0.118      Jarque-Bera (JB):      1.424
Skew:                0.700      Prob(JB):              0.491
Kurtosis:            3.943      Condition No.:         44
============================================================