머신러닝
kaggle 타이타닉 가공데이터 상관관계 분석
느린 개미
2018. 12. 21. 18:33
앞서 EDA 한 Data 를 pickle 로 불러오기
Out[56]:
| PassengerId | Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value |
---|
0 | 1 | 3 | 0 | 1 | 0 | 0 | 0.0 | 4 | 0 |
---|
1 | 2 | 1 | 1 | 1 | 0 | 1 | 2.0 | 7 | 3 |
---|
2 | 3 | 3 | 1 | 0 | 0 | 0 | 1.0 | 5 | 1 |
---|
3 | 4 | 1 | 1 | 1 | 0 | 0 | 2.0 | 6 | 3 |
---|
4 | 5 | 3 | 0 | 0 | 0 | 0 | 0.0 | 6 | 1 |
---|
Out[57]:
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch | Embarked | name_code | Fare_value |
---|
0 | 1 | 0 | 3 | 0 | 22.0 | 1 | 0 | 0 | 0.0 | 0 |
---|
1 | 2 | 1 | 1 | 1 | 38.0 | 1 | 0 | 1 | 2.0 | 3 |
---|
2 | 3 | 1 | 3 | 1 | 26.0 | 0 | 0 | 0 | 1.0 | 1 |
---|
3 | 4 | 1 | 1 | 1 | 35.0 | 1 | 0 | 0 | 2.0 | 3 |
---|
4 | 5 | 0 | 3 | 0 | 35.0 | 0 | 0 | 0 | 0.0 | 1 |
---|
total_set 에서 train_data, test_data 분리
Out[58]:
| PassengerId | Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value |
---|
886 | 887 | 2 | 0 | 0 | 0 | 0 | 0.0 | 5 | 1 |
---|
887 | 888 | 1 | 1 | 0 | 0 | 0 | 1.0 | 3 | 2 |
---|
888 | 889 | 3 | 1 | 1 | 2 | 0 | 1.0 | 4 | 2 |
---|
889 | 890 | 1 | 0 | 0 | 0 | 1 | 0.0 | 5 | 2 |
---|
890 | 891 | 3 | 0 | 0 | 0 | 2 | 0.0 | 6 | 0 |
---|
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:337: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Out[59]:
| PassengerId | Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value | Survived |
---|
886 | 887 | 2 | 0 | 0 | 0 | 0 | 0.0 | 5 | 1 | 0 |
---|
887 | 888 | 1 | 1 | 0 | 0 | 0 | 1.0 | 3 | 2 | 1 |
---|
888 | 889 | 3 | 1 | 1 | 2 | 0 | 1.0 | 4 | 2 | 0 |
---|
889 | 890 | 1 | 0 | 0 | 0 | 1 | 0.0 | 5 | 2 | 1 |
---|
890 | 891 | 3 | 0 | 0 | 0 | 2 | 0.0 | 6 | 0 | 0 |
---|
Out[60]:
| PassengerId | Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value |
---|
413 | 1305 | 3 | 0 | 0 | 0 | 0 | 0.0 | 5 | 1 |
---|
414 | 1306 | 1 | 1 | 0 | 0 | 1 | 4.0 | 7 | 3 |
---|
415 | 1307 | 3 | 0 | 0 | 0 | 0 | 0.0 | 7 | 0 |
---|
416 | 1308 | 3 | 0 | 0 | 0 | 0 | 0.0 | 5 | 1 |
---|
417 | 1309 | 3 | 0 | 1 | 1 | 1 | 3.0 | 5 | 2 |
---|
train_data 의 passengerID 열은 단순 index 이므로, drop 한다.
test_data 의 passenger ID 열 kaggle 결과물 제출 시 필요하므로 그대로 둔다.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
Train data 의 corr 을 살펴본다. Survived 와의 상관관계가 0.0XXX 로 나오는 것은 feature 을 변형해보고자 한다.
Out[62]:
| Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value | Survived |
---|
Pclass | 1.000000 | -0.131900 | 0.083081 | 0.018443 | 0.045702 | -0.105629 | -0.400822 | -0.644180 | -0.338481 |
---|
Sex | -0.131900 | 1.000000 | 0.114631 | 0.245489 | 0.116569 | 0.618390 | -0.122521 | 0.239812 | 0.543351 |
---|
SibSp | 0.083081 | 0.114631 | 1.000000 | 0.414838 | -0.059961 | 0.312732 | -0.250593 | 0.378720 | -0.035322 |
---|
Parch | 0.018443 | 0.245489 | 0.414838 | 1.000000 | -0.078665 | 0.388093 | -0.185674 | 0.374659 | 0.081629 |
---|
Embarked | 0.045702 | 0.116569 | -0.059961 | -0.078665 | 1.000000 | 0.036684 | -0.037095 | -0.094764 | 0.106811 |
---|
name_code | -0.105629 | 0.618390 | 0.312732 | 0.388093 | 0.036684 | 1.000000 | -0.227037 | 0.337250 | 0.466655 |
---|
Age_value | -0.400822 | -0.122521 | -0.250593 | -0.185674 | -0.037095 | -0.227037 | 1.000000 | 0.123883 | -0.073381 |
---|
Fare_value | -0.644180 | 0.239812 | 0.378720 | 0.374659 | -0.094764 | 0.337250 | 0.123883 | 1.000000 | 0.306855 |
---|
Survived | -0.338481 | 0.543351 | -0.035322 | 0.081629 | 0.106811 | 0.466655 | -0.073381 | 0.306855 | 1.000000 |
---|
SibSp 와 Parch 를 합하여 Family_size 열을 생성해준다.
Alone : Family_size 가 0 이면 Alone 은 1, 아니면 0
BigFamily : SibSp 나 Parch 가 3 이상이면 1, 아니면 0
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Out[65]:
| Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value | Survived | Family_size |
---|
0 | 3 | 0 | 1 | 0 | 0 | 0.0 | 4 | 0 | 0 | 1 |
---|
1 | 1 | 1 | 1 | 0 | 1 | 2.0 | 7 | 3 | 1 | 1 |
---|
2 | 3 | 1 | 0 | 0 | 0 | 1.0 | 5 | 1 | 1 | 0 |
---|
3 | 1 | 1 | 1 | 0 | 0 | 2.0 | 6 | 3 | 1 | 1 |
---|
4 | 3 | 0 | 0 | 0 | 0 | 0.0 | 6 | 1 | 0 | 0 |
---|
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
혼자일 경우 생존율이 낮음을 확인할 수 있다.
위에서 Age_value 와 Survived 의 상관관계가 낮았기 때문에, Age value 를 조정해보고자 한다.
위에를 보면 0~5 세(0)의 생존률이 높고, 65세 이상(13,14,15)의 생존률은 낮다. 15는 1명이다.
나머지 나이대의 생존율은 대략 비슷한 값을 가지고 있다고 보인다.
따라서 0 -> 0, 1~12 -> 1, 13~15 -> 2 의 값으로 재조정해준다.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\generic.py:3924: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._update_inplace(new_data)
Out[77]:
| Pclass | Sex | Embarked | name_code | Age_value | Fare_value | Survived | Alone | BigFamily |
---|
0 | 3 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 |
---|
1 | 1 | 1 | 1 | 2.0 | 1 | 3 | 1 | 0 | 0 |
---|
2 | 3 | 1 | 0 | 1.0 | 1 | 1 | 1 | 1 | 0 |
---|
3 | 1 | 1 | 0 | 2.0 | 1 | 3 | 1 | 0 | 0 |
---|
4 | 3 | 0 | 0 | 0.0 | 1 | 1 | 0 | 1 | 0 |
---|
최종적으로 가공한 train_data 를 pickle로 저장한다.
Out[78]:
| PassengerId | Pclass | Sex | SibSp | Parch | Embarked | name_code | Age_value | Fare_value |
---|
0 | 892 | 3 | 0 | 0 | 0 | 2 | 0.0 | 6 | 0 |
---|
1 | 893 | 3 | 1 | 1 | 0 | 0 | 2.0 | 9 | 0 |
---|
2 | 894 | 2 | 0 | 0 | 0 | 2 | 0.0 | 12 | 1 |
---|
3 | 895 | 3 | 0 | 0 | 0 | 0 | 0.0 | 5 | 1 |
---|
4 | 896 | 3 | 1 | 1 | 1 | 0 | 2.0 | 4 | 1 |
---|
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\hyejin\Anaconda2\envs\py36\lib\site-packages\pandas\core\generic.py:3924: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._update_inplace(new_data)
Out[85]:
| PassengerId | Pclass | Sex | Embarked | name_code | Age_value | Fare_value | Alone | BigFamily |
---|
0 | 892 | 3 | 0 | 2 | 0.0 | 1 | 0 | 1 | 0 |
---|
1 | 893 | 3 | 1 | 0 | 2.0 | 1 | 0 | 0 | 0 |
---|
2 | 894 | 2 | 0 | 2 | 0.0 | 1 | 1 | 1 | 0 |
---|
3 | 895 | 3 | 0 | 0 | 0.0 | 1 | 1 | 1 | 0 |
---|
4 | 896 | 3 | 1 | 0 | 2.0 | 1 | 1 | 0 | 0 |
---|