天天干天天射天天日,国产无码小视频,天天搞天天干在线视频,在线日韩在线,久久无码性爱视频,超碰最新网址,日p视频在线观看,欧美风情第一页

公眾號(hào)：尤而小屋
作者：Peter
編輯：Peter

大家好，我是Peter~

本文是kaggle案例分享的第3篇，賽題的名稱(chēng)是：Mushroom Classification，Safe to eat or deadly poison?

數(shù)據(jù)來(lái)自UCI：https://archive.ics.uci.edu/ml/datasets/mushroom

kaggle源碼地址：https://www.kaggle.com/nirajvermafcb/comparing-various-ml-models-roc-curve-comparison

排名

下面是kaggle上針對(duì)本題的排名。第一名側(cè)重點(diǎn)是特征選擇，沒(méi)有用到本題的數(shù)據(jù)，我個(gè)人感覺(jué)跑偏了；第二名側(cè)重點(diǎn)是基于貝葉斯理論的分類(lèi)，能力有限，貝葉斯這塊學(xué)習(xí)好了專(zhuān)門(mén)再說(shuō)。

所以，選擇了第三名的notebook源碼來(lái)學(xué)習(xí)。作者將6種監(jiān)督學(xué)習(xí)的方法在本數(shù)據(jù)集上的建模、模型評(píng)估等過(guò)程進(jìn)行了比較。

數(shù)據(jù)集

這份數(shù)據(jù)集是UCI捐獻(xiàn)給kaggle的?？倶颖緮?shù)為8124，其中6513個(gè)樣本做訓(xùn)練，1611個(gè)樣本做測(cè)試；并且，其中可食用有4208樣本，占51.8%；有毒的樣本為3916，占48.2%。每個(gè)樣本描述了蘑菇的22個(gè)屬性，比如形狀、氣味等。

誤食野生蘑菇中毒事件時(shí)有發(fā)生，且蘑菇形態(tài)千差萬(wàn)別，對(duì)于非專(zhuān)業(yè)人士，無(wú)法從外觀、形態(tài)、顏色等方面區(qū)分有毒蘑菇與可食用蘑菇，沒(méi)有一個(gè)簡(jiǎn)單的標(biāo)準(zhǔn)能夠?qū)⒂卸灸⒐胶涂墒秤媚⒐絽^(qū)分開(kāi)來(lái)。要了解蘑菇是否可食用，必須采集具有不同特征屬性的蘑菇是否有毒進(jìn)行分析。

對(duì)蘑菇的22種特征屬性進(jìn)行分析，從而得到蘑菇可使用性模型，更好的預(yù)測(cè)出蘑菇是否可食用。

下面是UCI顯示的具體數(shù)據(jù)信息：

屬性特征的解釋?zhuān)?/span>

數(shù)據(jù)EDA

導(dǎo)入數(shù)據(jù)

import?pandas?as?pd
import?numpy?as?np

import?plotly_express?as?px
from?matplotlib?import?pyplot?as?plt
import?seaborn?as?sns

#?忽略警告
import?warnings
warnings.filterwarnings('ignore')

原始數(shù)據(jù)有8124條記錄，23個(gè)屬性；并且不存在缺失值

有無(wú)毒對(duì)比

統(tǒng)計(jì)有毒和無(wú)毒的數(shù)量對(duì)比：

可視化分析

菌蓋顏色

首先我們討論下菌蓋的顏色：每種菌蓋顏色的次數(shù)

fig?=?px.bar(cap,x="color",
?????????????y="number",
?????????????color="number",
?????????????text="number",
?????????????color_continuous_scale="rainbow")

#?fig.update_layout(text_position="outside")
fig.show()

到底有毒的蘑菇是哪幾種顏色較多了？統(tǒng)計(jì)有毒和無(wú)毒下的顏色分布：

?fig?=?px.bar(cap_class,x="color",
??????????????y="number",
??????????????color="class",
??????????????text="number",
??????????????barmode="group",
?????????????)

fig.show()

小結(jié)：顏色n、g、e在有毒p情況是比較多的。

菌的氣味

統(tǒng)計(jì)每種氣味的數(shù)量：

fig?=?px.bar(odor,
?????????????x="odor",
?????????????y="number",
?????????????color="number",
?????????????text="number",
?????????????color_continuous_scale="rainbow")

fig.show()

上面是針對(duì)整體數(shù)據(jù)的情況，下面分有毒和無(wú)毒來(lái)繼續(xù)討論：

?fig?=?px.bar(odor_class,
??????????????x="odor",
??????????????y="number",
??????????????color="class",
??????????????text="number",
??????????????barmode="group",
?????????????)

fig.show()

小結(jié)：從上面的兩張圖中，我們看出來(lái)：f這種氣味是最容易造成有毒

特征相關(guān)性

將特征之間的相關(guān)性系數(shù)繪制成熱力圖，查看分布情況：

corr?=?data.corr()
sns.heatmap(corr)

plt.show()

特征工程

特征轉(zhuǎn)換

原數(shù)據(jù)中的特征都是文本類(lèi)型，我們將其轉(zhuǎn)成數(shù)值型，方便后續(xù)分析：

1、轉(zhuǎn)換前

2、實(shí)施轉(zhuǎn)換

from?sklearn.preprocessing?import?LabelEncoder??#?類(lèi)型編碼
labelencoder?=?LabelEncoder()

for?col?in?data.columns:
????data[col]?=?labelencoder.fit_transform(data[col])

#?轉(zhuǎn)換后
data.head()

3、查看部分屬性的轉(zhuǎn)換結(jié)果

數(shù)據(jù)分布

查看數(shù)據(jù)轉(zhuǎn)換編碼后的數(shù)據(jù)分布情況：

ax?=?sns.boxplot(x='class',?
?????????????????y='stalk-color-above-ring',
?????????????????data=data)

ax?=?sns.stripplot(x="class",?
???????????????????y='stalk-color-above-ring',
???????????????????data=data,?
???????????????????jitter=True,
???????????????????edgecolor="gray")

plt.title("Class?w.r.t?stalkcolor?above?ring",fontsize=12)

plt.show()

分離特征和標(biāo)簽

X?=?data.iloc[:,1:23]??#?特征
y?=?data.iloc[:,?0]??#?標(biāo)簽

數(shù)據(jù)標(biāo)準(zhǔn)化

#?歸一化（Normalization）、標(biāo)準(zhǔn)化（Standardization）

from?sklearn.preprocessing?import?StandardScaler
scaler?=?StandardScaler()
X?=?scaler.fit_transform(X)
X

主成分分析PCA

PCA過(guò)程

原始數(shù)據(jù)中22個(gè)屬性可能并不是特征都是有效數(shù)據(jù)，或者說(shuō)某些屬性本身就存在一定的關(guān)系，造成了特征屬性的重疊。我們采用主成分分析，先找出關(guān)鍵的特征：

#?1、實(shí)施pca
from?sklearn.decomposition?import?PCA
pca?=?PCA()
pca.fit_transform(X)

#?2、得到相關(guān)系數(shù)
covariance?=?pca.get_covariance()

#?3、得到每個(gè)變量對(duì)應(yīng)的方差值
explained_variance=pca.explained_variance_
explained_variance

通過(guò)繪圖來(lái)展示每個(gè)主成分的得分關(guān)系：

with?plt.style.context("dark_background"):??#?背景
????plt.figure(figsize=(6,4))??#?大小
????
????plt.bar(range(22),??#?主成分個(gè)數(shù)
???????????explained_variance,??#?方差值
????????????alpha=0.5,??#?透明度
????????????align="center",
????????????label="individual?explained?variance"??#?標(biāo)簽
???????????)
????plt.ylabel('Explained?variance?ratio')??#?軸名稱(chēng)和圖例
????plt.xlabel('Principal?components')
????plt.legend(loc="best")
????plt.tight_layout()??#?自動(dòng)調(diào)整子圖參數(shù)

結(jié)論：從上面的圖形中看出來(lái)最后的4個(gè)主成分方差之和很小；前面的17個(gè)占據(jù)了90%以上的方差，可作為主成分。

We can see that the last 4 components has less amount of variance of the data.The 1st 17 components retains more than 90% of the data.

2個(gè)主成分下的數(shù)據(jù)分布

然后我們利用基于2個(gè)屬性的數(shù)據(jù)來(lái)實(shí)施K-means聚類(lèi)：

1、2個(gè)主成分下的原始數(shù)據(jù)分布

N?=?data.values
pca?=?PCA(n_components=2)
x?=?pca.fit_transform(N)

plt.figure(figsize=(5,5))
plt.scatter(x[:,0],x[:,1])
plt.show()

2、實(shí)施聚類(lèi)建模后的分布：

from?sklearn.cluster?import?KMeans
km?=?KMeans(n_clusters=2,random_state=5)

N?=?data.values??#?numpy數(shù)組形式
X_clustered?=?km.fit_predict(N)??#?建模結(jié)果0-1

label_color_map?=?{0:"g",??#?分類(lèi)結(jié)果只有0和1，進(jìn)行打標(biāo)
??????????????????1:"y"}
label_color?=?[label_color_map[l]?for?l?in?X_clustered]

plt.figure(figsize=(5,5))
#?x?=?pca.fit_transform(N)
plt.scatter(x[:,0],x[:,1],?c=label_color)
plt.show()

基于17主成分下的建模

這個(gè)地方自己也沒(méi)有看懂：總共是22個(gè)屬性，上面選取了4個(gè)特征，為什么這里是基于17個(gè)主成分的分析？？

先做了基于17個(gè)主成分的轉(zhuǎn)換：

數(shù)據(jù)集的劃分：訓(xùn)練集和測(cè)試集占比為8-2

from?sklearn.model_selection?import?train_test_split
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.2,?random_state=4)

下面開(kāi)始是6種監(jiān)督學(xué)習(xí)方法的具體過(guò)程：

模型1：邏輯回歸

from?sklearn.linear_model?import?LogisticRegression??#?邏輯回歸（分類(lèi)）
from?sklearn.model_selection?import?cross_val_score??#?交叉驗(yàn)證得分
from?sklearn?import?metrics??#?模型評(píng)價(jià)

#?建立模型
model_LR?=?LogisticRegression()
model_LR.fit(X_train,?y_train)

查看具體的預(yù)測(cè)效果：

model_LR.score(X_test,y_pred)

#?結(jié)果
1.0??#?效果很好

邏輯回歸下的混淆矩陣：

confusion_matrix?=?metrics.confusion_matrix(y_test,?y_pred)
confusion_matrix

#?結(jié)果?
array([[815,??30],
???????[?36,?744]])

具體的auc值：

auc_roc?=?metrics.roc_auc_score(y_test,?y_pred)??#?測(cè)試紙和預(yù)測(cè)值
auc_roc

#?結(jié)果
0.9591715976331362

真假陽(yáng)性

from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,thresholds?=?roc_curve(y_test,?y_prob)

roc_auc?=?auc(false_positive_rate,true_positive_rate)
roc_auc

#?結(jié)果
0.9903474434835382

ROC曲線(xiàn)

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))
plt.title("ROC")??#?Receiver?Operating?Characteristic
plt.plot(false_positive_rate,
?????????true_positive_rate,
?????????color="red",
?????????label="AUC?=?%0.2f"%roc_auc
????????)

plt.legend(loc="lower?right")
plt.plot([0,1],[0,1],linestyle="--")
plt.axis("tight")
#?真陽(yáng)性：預(yù)測(cè)類(lèi)別為1的positive；預(yù)測(cè)正確True
plt.ylabel("True?Positive?Rate")?
#?假陽(yáng)性：預(yù)測(cè)類(lèi)別為1的positive；預(yù)測(cè)錯(cuò)誤False
plt.xlabel("False?Positive?Rate")??

下面是對(duì)邏輯回歸模型進(jìn)行校正。這里的校正主要就是采取網(wǎng)格搜索的方法來(lái)選取最佳的參數(shù)，然后進(jìn)行下一步的建模。網(wǎng)格搜索的過(guò)程：

from?sklearn.linear_model?import?LogisticRegression
from?sklearn.model_selection?import?cross_val_score
from?sklearn?import?metrics

#?未優(yōu)化的模型
LR_model=?LogisticRegression()
#?待確定的參數(shù)
tuned_parameters?=?{"C":[0.001,0.01,0.1,1,10,100,1000],
????????????????????"penalty":['l1','l2']??#?選擇不同的正則方式，防止過(guò)擬合
???????????????????}
#?網(wǎng)格搜索模塊
from?sklearn.model_selection?import?GridSearchCV
#?加入網(wǎng)格搜索功能
LR?=?GridSearchCV(LR_model,?tuned_parameters,cv=10)
#?搜索之后再建模
LR.fit(X_train,?y_train)

#?確定參數(shù)
print(LR.best_params_)

{'C':?100,?'penalty':?'l2'}

查看優(yōu)化后的預(yù)測(cè)情況：

混淆矩陣和AUC情況：

ROC曲線(xiàn)情況：

from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)

#roc_auc?=?auc(false_positive_rate,?true_positive_rate)

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))
plt.title("ROC")??#?Receiver?Operating?Characteristic
plt.plot(false_positive_rate,
?????????true_positive_rate,
?????????color="red",
?????????label="AUC?=?%0.2f"%roc_auc
????????)

plt.legend(loc="lower?right")
plt.plot([0,1],[0,1],linestyle="--")
plt.axis("tight")
#?真陽(yáng)性：預(yù)測(cè)類(lèi)別為1的positive；預(yù)測(cè)正確True
plt.ylabel("True?Positive?Rate")?
#?假陽(yáng)性：預(yù)測(cè)類(lèi)別為1的positive；預(yù)測(cè)錯(cuò)誤False
plt.xlabel("False?Positive?Rate")??

模型2：高斯樸素貝葉斯

建模

from?sklearn.naive_bayes?import?GaussianNB
model_naive?=?GaussianNB()

#?建模
model_naive.fit(X_train,?y_train)

#?預(yù)測(cè)概率
y_prob?=?model_naive.predict_proba(X_test)[:,1]??
y_pred?=?np.where(y_prob?>?0.5,1,0)
model_naive.score(X_test,y_pred)

#?結(jié)果
1

預(yù)測(cè)值和真實(shí)值不等的數(shù)量：111個(gè)

交叉驗(yàn)證

scores?=?cross_val_score(model_naive,
????????????????????????X,
????????????????????????y,
????????????????????????cv=10,
????????????????????????scoring="accuracy"
???????????????????????)
scores

混淆矩陣和AUC

真假陽(yáng)性

#?導(dǎo)入評(píng)價(jià)模塊
from?sklearn.metrics?import?roc_curve,?auc

#?評(píng)價(jià)指標(biāo)
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)

#?roc曲線(xiàn)面積
roc_auc?=?auc(false_positive_rate,?true_positive_rate)
roc_auc

#?結(jié)果
0.9592201486876043

ROC曲線(xiàn)

AUC的值才0.96

#?繪圖
import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))

plt.title("ROC")
plt.plot(false_positive_rate,true_positive_rate,color="red",label="AUC=%0.2f"%roc_auc)

plt.legend(loc="lower?right")
plt.plot([0,1],[0,1],linestyle='--')

plt.axis("tight")
plt.xlabel('False?Positive?Rate')
plt.ylabel('True?Positive?Rate')
plt.show()

模型3：支持向量機(jī)SVM

默認(rèn)參數(shù)下的支持向量機(jī)過(guò)程

建模過(guò)程

from?sklearn.svm?import?SVC
svm_model?=?SVC()

tuned_parameters?=?{
????'C':?[1,?10,?100,500,?1000],
????'kernel':?['linear','rbf'],
????'C':?[1,?10,?100,500,?1000],?
????'gamma':?[1,0.1,0.01,0.001,?0.0001],?
????'kernel':?['rbf']
}

隨機(jī)網(wǎng)格搜索-RandomizedSearchCV

from?sklearn.model_selection?import?RandomizedSearchCV

#?建立隨機(jī)搜索模型
model_svm?=?RandomizedSearchCV(
????svm_model,??#?待搜索模型
????tuned_parameters,??#?參數(shù)
????cv=10,??#?10折交叉驗(yàn)證
????scoring="accuracy",??#?評(píng)分標(biāo)準(zhǔn)
????n_iter=20??#?迭代次數(shù)
????)

#?訓(xùn)練模型
model_svm.fit(X_train,y_train)

RandomizedSearchCV(cv=10,?
???????????????????estimator=SVC(),?
???????????????????n_iter=20,
???????????????????param_distributions={'C':?[1,?10,?100,?500,?1000],
????????????????????????????????????????'gamma':?[1,?0.1,?0.01,?0.001,?0.0001],
????????????????????????????????????????'kernel':?['rbf']},
???????????????????scoring='accuracy')

#?最佳得分效果
print(model_svm.best_score_)
1.0

得分最佳匹配參數(shù)：

#?預(yù)測(cè)
y_pred?=?model_svm.predict(X_test)

#?預(yù)測(cè)值和原始標(biāo)簽值計(jì)算：分類(lèi)準(zhǔn)確率
metrics.accuracy_score(y_pred,?y_test)
#?結(jié)果
1

混淆矩陣

查看具體的混淆矩陣和預(yù)測(cè)情況：

ROC曲線(xiàn)

from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_pred)
roc_auc?=?auc(false_positive_rate,?true_positive_rate)

import?matplotlib.pyplot?as?plt

plt.figure(figsize=(10,10))
plt.title('ROC')

plt.plot(false_positive_rate,true_positive_rate,?color='red',label?=?'AUC?=?%0.2f'?%?roc_auc)

plt.legend(loc?=?'lower?right')
plt.plot([0,?1],?[0,?1],linestyle='--')

plt.axis('tight')
plt.ylabel('True?Positive?Rate')
plt.xlabel('False?Positive?Rate')

模型5：隨機(jī)森林

建模擬合

from?sklearn.ensemble?import?RandomForestClassifier

#?建模
model_RR?=?RandomForestClassifier()
#?擬合
model_RR.fit(X_train,?y_train)

預(yù)測(cè)得分

混淆矩陣

ROC曲線(xiàn)

from?sklearn.metrics?import?roc_curve,?auc

false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)

roc_auc?=?auc(false_positive_rate,?true_positive_rate)
roc_auc??#?1

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))
plt.title('ROC')

plt.plot(false_positive_rate,true_positive_rate,?color='red',label?=?'AUC?=?%0.2f'?%?roc_auc)

plt.legend(loc?=?'lower?right')
plt.plot([0,?1],?[0,?1],linestyle='--')

plt.axis('tight')
plt.ylabel('True?Positive?Rate')
plt.xlabel('False?Positive?Rate')
plt.show()

模型6：決策樹(shù)（CART）

建模

from?sklearn.tree?import?DecisionTreeClassifier

#?建模
model_tree?=?DecisionTreeClassifier()
model_tree.fit(X_train,?y_train)

#?預(yù)測(cè)
y_prob?=?model_tree.predict_proba(X_test)[:,1]

#?預(yù)測(cè)的概率轉(zhuǎn)成0-1分類(lèi)
y_pred?=?np.where(y_prob?>?0.5,?1,?0)
model_tree.score(X_test,?y_pred)
#?結(jié)果
1

混淆矩陣

各種評(píng)價(jià)指標(biāo)的體現(xiàn)：

ROC曲線(xiàn)

from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)
roc_auc?=?auc(false_positive_rate,?true_positive_rate)
roc_auc??#?1

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))??#?畫(huà)布
plt.title('ROC')??#?標(biāo)題

plt.plot(false_positive_rate,??#?繪圖
?????????true_positive_rate,?
?????????color='red',
?????????label?=?'AUC?=?%0.2f'?%?roc_auc)??

plt.legend(loc?=?'lower?right')?#??圖例位置
plt.plot([0,?1],?[0,?1],linestyle='--')??#?正比例直線(xiàn)

plt.axis('tight')
plt.xlabel('False?Positive?Rate')
plt.ylabel('True?Positive?Rate')
plt.show()

模型6：神經(jīng)網(wǎng)絡(luò)ANN

建模

混淆矩陣

ROC曲線(xiàn)

?#?真假陽(yáng)性
from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)
roc_auc?=?auc(false_positive_rate,?true_positive_rate)
roc_auc??#?1

#?繪制ROC曲線(xiàn)

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))
plt.title('ROC')
plt.plot(false_positive_rate,true_positive_rate,?color='red',label?=?'AUC?=?%0.2f'?%?roc_auc)

plt.legend(loc?=?'lower?right')
plt.plot([0,?1],?[0,?1],linestyle='--')

plt.axis('tight')
plt.ylabel('True?Positive?Rate')
plt.xlabel('False?Positive?Rate')
plt.show()

下面對(duì)神經(jīng)網(wǎng)絡(luò)的參數(shù)進(jìn)行調(diào)優(yōu)：

hidden_layer_sizes：隱藏層個(gè)數(shù)
activation：激活函數(shù)
alpha：學(xué)習(xí)率
max_iter：最大迭代次數(shù)

網(wǎng)格搜索

from?sklearn.neural_network?import?MLPClassifier

#?實(shí)例化
mlp_model?=?MLPClassifier()
#?待調(diào)節(jié)參數(shù)
tuned_parameters={'hidden_layer_sizes':?range(1,200,10),
??????????????????'activation':?['tanh','logistic','relu'],
??????????????????'alpha':[0.0001,0.001,0.01,0.1,1,10],
??????????????????'max_iter':?range(50,200,50)
}

model_mlp=?RandomizedSearchCV(mlp_model,
??????????????????????????????tuned_parameters,
??????????????????????????????cv=10,
??????????????????????????????scoring='accuracy',
??????????????????????????????n_iter=5,
??????????????????????????????n_jobs=?-1,
??????????????????????????????random_state=5)
model_mlp.fit(X_train,y_train)

模型屬性

調(diào)優(yōu)之后的模型屬性情況以及合適的參數(shù)：

ROC曲線(xiàn)

from?sklearn.metrics?import?roc_curve,?auc
false_positive_rate,?true_positive_rate,?thresholds?=?roc_curve(y_test,?y_prob)
roc_auc?=?auc(false_positive_rate,?true_positive_rate)
roc_auc??#?1

import?matplotlib.pyplot?as?plt
plt.figure(figsize=(10,10))
plt.title('ROC')

plt.plot(false_positive_rate,true_positive_rate,?color='red',label?=?'AUC?=?%0.2f'?%?roc_auc)

plt.legend(loc?=?'lower?right')
plt.plot([0,?1],?[0,?1],linestyle='--')

plt.axis('tight')
plt.xlabel('False?Positive?Rate')
plt.ylabel('True?Positive?Rate')

混淆矩陣和ROC

這是一篇很好的文章來(lái)解釋混淆矩陣和ROC：https://www.cnblogs.com/wuliytTaotao/p/9285227.html

1、什么是混淆矩陣？

2、4大指標(biāo)

TP、FP、TN、FN，第二個(gè)字母表示樣本被預(yù)測(cè)的類(lèi)別，第一個(gè)字母表示樣本的預(yù)測(cè)類(lèi)別與真實(shí)類(lèi)別是否一致。

3、準(zhǔn)確率

4、精準(zhǔn)率和召回率

5、F_1和F_B

6、ROC曲線(xiàn)

AUC全稱(chēng)為Area Under Curve，表示一條曲線(xiàn)下面的面積，ROC曲線(xiàn)的AUC值可以用來(lái)對(duì)模型進(jìn)行評(píng)價(jià)。ROC曲線(xiàn)如圖 1 所示：

總結(jié)

看完這篇notebook源碼，你需要掌握的知識(shí)點(diǎn)：

機(jī)器學(xué)習(xí)建模整體思路：選擇模型、建模、網(wǎng)格搜索調(diào)參、模型評(píng)估、ROC曲線(xiàn)（分類(lèi)）
特征工程的技術(shù)：編碼轉(zhuǎn)換、數(shù)據(jù)標(biāo)準(zhǔn)化、數(shù)據(jù)集劃分
評(píng)價(jià)指標(biāo)：混淆矩陣、ROC曲線(xiàn)作為重點(diǎn)，后續(xù)有文章專(zhuān)門(mén)講解


往期精彩回顧




適合初學(xué)者入門(mén)人工智能的路線(xiàn)及資料下載
機(jī)器學(xué)習(xí)及深度學(xué)習(xí)筆記等資料打印
機(jī)器學(xué)習(xí)在線(xiàn)手冊(cè)
深度學(xué)習(xí)筆記專(zhuān)輯
《統(tǒng)計(jì)學(xué)習(xí)方法》的代碼復(fù)現(xiàn)專(zhuān)輯
AI基礎(chǔ)下載
黃海廣老師《機(jī)器學(xué)習(xí)課程》視頻課
黃海廣老師《機(jī)器學(xué)習(xí)課程》711頁(yè)完整版課件
本站qq群955171419，加入微信群請(qǐng)掃碼：

【機(jī)器學(xué)習(xí)】6大監(jiān)督學(xué)習(xí)模型：毒蘑菇分類(lèi)

排名

數(shù)據(jù)集

數(shù)據(jù)EDA

導(dǎo)入數(shù)據(jù)

有無(wú)毒對(duì)比

可視化分析

菌蓋顏色

菌的氣味

特征相關(guān)性

特征工程

特征轉(zhuǎn)換

數(shù)據(jù)分布

分離特征和標(biāo)簽

數(shù)據(jù)標(biāo)準(zhǔn)化

主成分分析PCA

PCA過(guò)程

2個(gè)主成分下的數(shù)據(jù)分布

基于17主成分下的建模

模型1：邏輯回歸

真假陽(yáng)性

ROC曲線(xiàn)

模型2：高斯樸素貝葉斯

建模

交叉驗(yàn)證

混淆矩陣和AUC

真假陽(yáng)性

ROC曲線(xiàn)

模型3：支持向量機(jī)SVM

建模過(guò)程

隨機(jī)網(wǎng)格搜索-RandomizedSearchCV

混淆矩陣

ROC曲線(xiàn)

模型5：隨機(jī)森林

建模擬合

預(yù)測(cè)得分

混淆矩陣

ROC曲線(xiàn)

模型6：決策樹(shù)（CART）

建模

混淆矩陣

ROC曲線(xiàn)

模型6：神經(jīng)網(wǎng)絡(luò)ANN

建模

混淆矩陣

ROC曲線(xiàn)

網(wǎng)格搜索

模型屬性

ROC曲線(xiàn)

混淆矩陣和ROC

總結(jié)