数据切分-连续数据离散化

1、

参考:https://www.kaggle.com/kabure/titanic-eda-model-pipeline-keras-nn

#df_train.Age = df_train.Age.fillna(-0.5)

#creating the intervals that we need to cut each range of ages
interval = (0, 5, 12, 18, 25, 35, 60, 120) 

#Seting the names that we want use to the categorys
cats = ['babies', 'Children', 'Teen', 'Student', 'Young', 'Adult', 'Senior']

# Applying the pd.cut and using the parameters that we created 
df_train["Age_cat"] = pd.cut(df_train.Age, interval, labels=cats)

2、

3、

标签: train、df、cut、age、cats、面试
  • 回复
隐藏