intro-to-machine-learning
Pandas
pd: pandas 库
pd.read_csv(path_name) 用于读取 .csv comma-separated-values 格式的文件,返回一个对象。
对于这个对象可以用 .describe() 来粗略浏览。
用 .columns 获得各列名,用 .列名 来获得某一列。
用 .head() 查看前若干行。
对于只认识 list 的普通 python 代码,可以用 .tolist() 来转化。
还有一个很 python 的语法
| Python |
|---|
| # Create the list of features below
feature_names = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
# Select data corresponding to features in feature_names
X = home_data[ feature_names ]
|
DecisionTreeRegressor
决策树回归模型
建立对象:
| Python |
|---|
| from sklearn.tree import DecisionTreeRegressor
# specify the model.
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = DecisionTreeRegressor(random_state = 1)
# Fit the model
iowa_model.fit(X, y)
|
其中 .fit(X, y) 就是尝试建立从 \(X\) 到 \(y\) 的“映射”。
然后利用对象:
| Python |
|---|
| predictions = iowa_model.predict(X)
print(predictions)
|
为了防止过拟合,可以考虑在定义对象时限制树深度:
| Python |
|---|
| model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
|
train_test_split
将一组数据划分一下,一些用于训练,一些用于 validation。
| Python |
|---|
| # Import the train_test_split function and uncomment
from sklearn.model_selection import train_test_split
# fill in and uncomment
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 1)
|
视方便,有不同的写法来划分训练集和测试集、X 和 y。
| Python |
|---|
1
2
3
4
5
6
7
8
9
10
11
12 | import pandas as pd
concrete = pd.read_csv('../input/dl-course-data/concrete.csv')
df = concrete.copy()
df_train = df.sample(frac=0.7, random_state=0)
df_valid = df.drop(df_train.index)
X_train = df_train.drop('CompressiveStrength', axis=1)
X_valid = df_valid.drop('CompressiveStrength', axis=1)
y_train = df_train['CompressiveStrength']
y_valid = df_valid['CompressiveStrength']
|
mean_absolute_error
缩写 mae。
| Python |
|---|
| from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(val_y, preds_val)
|
RandomForest
原理上来说就是搞好几棵决策树,组成所谓森林,然后每次取一下它们分别预测结果的平均值。
| Python |
|---|
| from sklearn.ensemble import RandomForestRegressor
# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state = 1)
# fit your model
rf_model.fit(train_X, train_y)
# Calculate the mean absolute error of your Random Forest model on the validation data
rf_val_mae = mean_absolute_error(rf_model.predict(val_X), val_y)
|