[pandas] pandas で 回帰分析
Rで回帰分析 - tetsunosukeのnotebook を pandasでやってみた
特にファイルを読み込む部分がRっぽく書ける
import pandas as pd >>> data = pd.read_csv("2-1.csv") >>> data.describe() degree amount count 12.000000 12.000000 mean 16.391667 1310.833333 std 7.607587 414.355323 min 5.100000 772.000000 25% 9.725000 1127.500000 50% 16.700000 1231.000000 75% 22.750000 1355.000000 max 27.500000 2389.000000 [8 rows x 2 columns]
あとはRのnlsのように...
>>> model = pd.ols(y=data["amount"], x=data["degree"], intercept=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\Python27\lib\site-packages\pandas\stats\interface.py", line 135, in o ls return klass(**kwargs) File "c:\Python27\lib\site-packages\pandas\stats\ols.py", line 53, in __init__ import scikits.statsmodels.api as sm ImportError: No module named scikits.statsmodels.api
ols関数で回帰分析しようとしたら落ちた!
追ってみると、Patsy が入っていなかったのでインストールして再実行
>>> model -------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 12 Number of Degrees of Freedom: 2 R-squared: 0.6119 Adj R-squared: 0.5731 Rmse: 270.7228 F-stat (1, 10): 15.7685, p-value: 0.0026 Degrees of Freedom: model 1, resid 10 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x 42.6066 10.7296 3.97 0.0026 21.5766 63.6365 intercept 612.4407 192.4569 3.18 0.0098 235.2251 989.6562 ---------------------------------End of Summary---------------------------------
x, interceptの値、42.6066と612.4407を得ることが出来ました。
>>> model.summary_as_matrix x intercept beta 42.606567 612.440687 p-value 0.002639 0.009782 std err 10.729551 192.456913 t-stat 3.970955 3.182222 [4 rows x 2 columns]
beta値のところに求めたい値が出ています。
R-Style
Rっぽくモデルを書くことができるので、それを試してみます。
>>> import statsmodels.formua.api as sm >>> res = sm.ols(formula="amount ~ degree", data=csv).fit() >>> res.summary2() <class 'statsmodels.iolib.summary2.Summary'> """ Results: Ordinary least squares ============================================================ Model: OLS AIC: 170.2930 Dependent Variable: amount BIC: 171.2628 No. Observations: 12 Log-Likelihood: -83.146 Df Model: 1 F-statistic: 15.77 Df Residuals: 10 Prob (F-statistic): 0.00264 R-squared: 0.612 Scale: 73291. Adj. R-squared: 0.573 ------------------------------------------------------------ Coef. Std.Err. t P>|t| [0.025 0.975] ------------------------------------------------------------ Intercept 612.4407 192.4569 3.1822 0.0098 183.6200 1041.2614 degree 42.6066 10.7296 3.9710 0.0026 18.6996 66.5135 ------------------------------------------------------------ Omnibus: 4.269 Durbin-Watson: 2.667 Prob(Omnibus): 0.118 Jarque-Bera (JB): 1.424 Skew: 0.700 Prob(JB): 0.491 Kurtosis: 3.943 Condition No.: 44 ============================================================