微信扫一扫,移动浏览光盘
简介
目录
部分 机器学习入门
第1章 机器学习概论3
1.1 欢迎来到机器学习的世界3
1.2 范围、术语、预测和数据4
1.2.1 特征5
1.2.2 目标值和预测值6
1.3 让机器开始机器学习7
1.4 学 统举例9
1.4.1 预测类别:分类器举例9
1.4.2 预测值:回归器举例10
1.5 评估机器学 统11
1.5.1 准确率11
1.5.2 资源消耗12
1.6 创建机器学 统的过程13
1.7 机器学习的假设和现实15
1.8 参考阅读资料17
1.8.1 进一步研究方向17
1.8.2 注释17
第2章 相关技术背景19
2.1 编程环境配置19
2.2 数学语言的必要性19
2.3 用于解决机器学习问题的软件20
2.4 概率21
2.4.1 基本事件22
2.4.2 独立性23
2.4.3 条件概率24
2.4.4 概率分布25
2.5 线性组合、加权和以及点积28
2.5.1 加权平均30
2.5.2 平方和32
2.5.3 误差平方和33
2.6 几何视图:空间中的点34
2.6.1 直线34
2.6.2 直线拓展39
2.7 表示法和加1技巧43
2.8 渐入佳境:突破线性和非线性45
2.9 NumPy与“数学无所不在”47
2.9.1 一维数组与二维数组49
2.10 浮点数问题52
2.11 参考阅读资料53
2.11.1 小结53
2.11.2 注释54
第3章 预测类别:分类入门55
3.1 分类任务55
3.2 一个简单的分类数据集56
3.3 训练和测试:请勿“应试教育”59
3.4 评估:考试评分62
3.5 简单分类器1: 近邻分类器、远距离关系和假设63
3.5.1 定义相似性63
3.5.2 k- 近邻中的k64
3.5.3 答案组合64
3.5.4 k- 近邻、参数和非参数方法65
3.5.5 建立一个k- 近邻分类模型66
3.6 简单分类器2:朴素贝叶斯分类器、概率和违背承诺68
3.7 分类器的简单评估70
3.7.1 机器学习的性能70
3.7.2 分类器的资源消耗71
3.7.3 独立资源评估77
3.8 参考阅读资料81
3.8.1 再次警告:局限性和尚未解决的问题81
3.8.2 小结82
3.8.3 注释82
3.8.4 练习题83
第4章 预测数值:回归入门85
4.1 一个简单的回归数据集85
4.2 近邻回归和汇总统计87
4.2.1 中心测量:中位数和均值88
4.2.2 构建一个k- 近邻回归模型90
4.3 线性回归和误差91
4.3.1 地面总是不平坦的:为什么需要斜坡92
4.3.2 倾斜直线94
4.3.3 执行线性回归97
4.4 优化:选择 答案98
4.4.1 随机猜测98
4.4.2 随机步进99
4.4.3 智能步进99
4.4.4 计算的捷径100
4.4.5 线性回归的应用101
4.5 回归器的简单评估和比较101
4.5.1 均方根误差101
4.5.2 机器学习的性能102
4.5.3 回归过程中的资源消耗102
4.6 参考阅读资料104
4.6.1 局限性和尚未解决的问题104
4.6.2 小结105
4.6.3 注释105
4.6.4 练习题105
第二部分 通用评估技术
第5章 机器学习算法的评估和比较分析109
5.1 评估和大道至简的原则109
5.2 机器学习阶段的术语110
5.2.1 有关机器的重新讨论110
5.2.2 规范的阐述113
5.3 过拟合和欠拟合116
5.3.1 合成数据和线性回归117
5.3.2 手动操控模型的复杂度118
5.3.3 “恰到好处”原则:可视化过拟合、欠拟合和 拟合120
5.3.4 简单性124
5.3.5 关于过拟合必须牢记的注意事项124
5.4 从误差到成本125
5.4.1 损失125
5.4.2 成本126
5.4.3 评分127
5.5 (重新)抽样:以少胜多128
5.5.1 交叉验证128
5.5.2 分层抽样132
5.5.3 重复的训练–测试数据集拆分133
5.5.4 一种 好的方法和混排137
5.5.5 留一交叉验证140
5.6 分解:将误差分解为偏差和方差142
5.6.1 数据的方差143
5.6.2 模型的方差144
5.6.3 模型的偏差144
5.6.4 结合所有的因素145
5.6.5 偏差–方差权衡示例145
5.7 图形可视化评估和比较149
5.7.1 学习曲线:到底需要多少数据150
5.7.2 复杂度曲线152
5.8 使用交叉验证比较机器学习模型154
5.9 参考阅读资料155
5.9.1 小结155
5.9.2 注释155
5.9.3 练习题157
第6章 评估分类器159
6.1 基线分类器159
6.2 准确度以外:分类器的其他度量指标161
6.2.1 从混淆矩阵中消除混淆163
6.2.2 错误的方式164
6.2.3 基于混淆矩阵的度量指标165
6.2.4 混淆矩阵编码166
6.2.5 处理多元类别:多元类别平均168
6.2.6 F1分数170
6.3 ROC曲线170
6.3.1 ROC模式173
6.3.2 二元分类ROC174
6.3.3 AUC:(ROC)曲线下的面积177
6.3.4 多元分类机器学习模型、一对其他和ROC179
6.4 多元分类的另一种方法:一对一181
6.4.1 多元分类AUC第二部分:寻找单一值182
6.5 率–召回率曲线185
6.5.1 关于 率–召回率权衡的说明185
6.5.2 构建 率–召回率曲线186
6.6 累积响应和提升曲线187
6.7 复杂的分类器评估:第二阶段190
6.7.1 二元分类190
6.7.2 一个新颖的多元分类问题195
6.8 参考阅读资料201
6.8.1 小结201
6.8.2 注释202
6.8.3 练习题203
第7章 评估回归器205
7.1 基线回归器205
7.2 回归器的其他度量指标207
7.2.1 创建自定义的评估指标207
7.2.2 其他内置的回归度量指标208
7.2.3 R2209
7.3 残差图214
7.3.1 误差图215
7.3.2 残差图217
7.4 标准化初探221
7.5 使用 复杂的方法评估回归系数:第二阶段225
7.5.1 多个度量指标的交叉验证结果226
7.5.2 交叉验证结果汇总230
7.5.3 残差230
7.6 参考阅读资料232
7.6.1 小结232
7.6.2 注释232
7.6.3 练习题234
第三部分 多方法和其他技术
第8章 多分类方法237
8.1 重温分类知识237
8.2 决策树239
8.2.1 树构建算法242
8.2.2 让我们开始:决策树时间245
8.2.3 决策树中的偏差和方差249
8.3 支持向量分类器2498.3.1 执行支持向量分类器253
8.3.2 SVC中的偏差和方差256
8.4 逻辑回归259
8.4.1 投注几率259
8.4.2 概率、几率和对数几率262
8.4.3 实现操作:逻辑回归版本267
8.4.4 逻辑回归:空间奇异性268
8.5 判别分析269
8.5.1 协方差270
8.5.2 方法282
8.5.3 执行判别分析283
8.6 假设、偏差和分类器285
8.7 分类器的比较:第三阶段287
8.7.1 数字287
8.8 参考阅读资料290
8.8.1 小结290
8.8.2 注释290
8.8.3 练习题293
第9章 多回归方法295
9.1 惩罚框中的线性回归:正则化295
9.1.1 执行正则化回归300
9.2 支持向量回归301
9.2.1 铰链损失301
9.2.2 从线性回归到正则化回归,再到支持向量回归305
9.2.3 实践应用:支持向量回归风格307
9.3 分段常数回归308
9.3.1 实现分段常数回归器310
9.3.2 模型实现的一般说明311
9.4 回归树313
9.4.1 用决策树实现回归313
9.5 回归器比较:第三阶段314
9.6 参考阅读资料318
9.6.1 小结318
9.6.2 注释318
9.6.3 练习题319
0章 手动特征工程:操作数据的乐趣和意义321
10.1 特征工程的术语和动机321
10.1.1 为什么选择特征工程322
10.1.2 何时开始特征工程323
10.1.3 特征工程是如何发生的324
10.2 特征选择和数据简化:清除垃圾324
10.3 特征缩放325
10.4 离散化329
10.5 分类编码332
10.5.1 编码的另一种方式以及无截距的奇怪情况334
10.6 关系和相互作用341
10.6.1 手动特征构造341
10.6.2 相互作用343
10.6.3 使用转换器添加特征348
10.7 对输入空间和目标的相关操作350
10.7.1 对输入空间的相关操作351
10.7.2 对目标的相关操作353
10.8 参考阅读资料356
10.8.1 小结356
10.8.2 注释356
10.8.3 练习题357
1章 调整超参数和管道技术359
11.1 模型、参数、超参数360
11.2 调整超参数362
11.2.1 关于计算机科学和机器学习术语的说明362
11.2.2 关于完整搜索的示例362
11.2.3 使用随机性在大海中捞针368
11.3 递归的神奇世界:嵌套交叉验证370
11.3.1 重温交叉验证370
11.3.2 作为模型的网格搜索371
11.3.3 交叉验证中嵌套的交叉验证372
11.3.4 关于嵌套交叉验证的注释375
11.4 管道技术377
11.4.1 简单的管道378
11.4.2 复杂的管道379
11.5 管道和调参相结合380
11.6 参考阅读资料382
11.6.1 小结382
11.6.2 注释382
11.6.3 练习题383
第四部分 主题
2章 组合机器学习模型387
12.1 集成387
12.2 投票集成389
12.3 装袋法和随机森林390
12.3.1 自举390
12.3.2 从自举到装袋法394
12.3.3 随机森林396
12.4 提升方法398
12.4.1 提升方法的核心理念399
12.5 各种树集成方法的比较401
12.6 参考阅读资料405
12.6.1 小结405
12.6.2 注释405
12.6.3 练习题406
3章 提供特征工程的模型409
13.1 特征选择411
13.1.1 基于度量特征的“单步筛选”方法412
13.1.2 基于模型的特征选择423
13.1.3 将特征选择与机器学习管道相集成426
13.2 基于核的特征构造428
13.2.1 核激励因子428
13.2.2 手动核方法433
13.2.3 核方法和核选项438
13.2.4 核化支持向量分类器:支持向量机442
13.2.5 关于SVM的建议和示例443
13.3 主成分分析:一种无监督技术445
13.3.1 预热:中心化数据445
13.3.2 寻找不同的 线路448
13.3.3 次执行PCA449
13.3.4 PCA的内部原理452
13.3.5 对一般PCA的评论457
13.3.6 核心PCA和流形方法458
13.4 参考阅读资料462
13.4.1 小结462
13.4.2 注释462
13.4.3 练习题467
4章 领域特征工程:领域特定的机器学习469
14.1 处理文本470
14.1.1 对文本进行编码471
14.1.2 文本学习的示例476
14.2 聚类479
14.2.1 k-均值聚类479
14.3 处理图像481
14.3.1 视觉词袋481
14.3.2 图像数据482
14.3.3 端到端系统483
14.3.4 全局视觉词袋转换器的完整代码491
14.4 参考阅读资料493
14.4.1 小结493
14.4.2 注释494
14.4.3 练习题495
5章 连接、扩展和进一步研究方向497
15.1 优化497
15.2 基于原始数据的线性回归500
15.2.1 线性回归的可视化视图504
15.3 基于原始数据构建逻辑回归504
15.3.1 采用0-1编码的逻辑回归506
15.3.2 采用加1减1编码的逻辑回归508
15.3.3 逻辑回归的可视化视图509
15.4 基于原始数据的SVM510
15.5 神经网络512
15.5.1 线性回归的神经网络视图512
15.5.2 逻辑回归的神经网络视图515
15.5.3 基本神经网络516
15.6 概率图模型516
15.6.1 抽样518
15.6.2 线性回归的概率图模型视图519
15.6.3 逻辑回归的概率图模型视图523
15.7 参考阅读资料525
15.7.1 小结525
15.7.2 注释526
15.7.3 练习题527
附录A mlwpy.py程序清单529
Contents
I First Steps 1
1 Let’s Discuss Learning 3
1.1 Welcome 3
1.2 Scope, Terminology, Prediction, and Data 4
1.2.1 Features 5
1.2.2 Target Values and Predictions 6
1.3 Putting the Machine in Machine Learning 7
1.4 Examples of Learning Systems 9
1.4.1 Predicting Categories: Examples of Classifiers 9
1.4.2 Predicting Values: Examples of Regressors 10
1.5 Evaluating Learning Systems 11
1.5.1 Correctness 11
1.5.2 Resource Consumption 12
1.6 A Process for Building Learning Systems 13
1.7 Assumptions and Reality of Learning 15
1.8 End-of-Chapter Material 17
1.8.1 The Road Ahead 17
1.8.2 Notes 17
2 Some Technical Background 19
2.1 Abut Our Setup 19
2.2 The Need for Mathematical Language 19
2.3 Our Software for Tackling Machine Learning 20
2.4 Probability 21
2.4.1 Primitive Events 22
2.4.2 Independence 23
2.4.3 Conditional Probability 24
2.4.4 Distributions 25
2.5 Linear Combinations, Weighted Sums, and Dot Products 28
2.5.1 Weighted Average 30
2.5.2 Sums of Squares 32
2.5.3 Sum of Squared Errors 33
2.6 A Geometric View: Points in Space 34
2.6.1 Lines 34
2.6.2 Beyond Lines 39
2.7 Notation and the Plus-One Trick 43
2.8 Getting Groovy, Breaking the Straight-Jacket, and Nonlinearity 45
2.9 NumPy versus “All the Maths” 47
2.9.1 Back to 1D versus 2D 49
2.10 Floating-Point Issues 52
2.11 EOC 53
2.11.1 Summary 53
2.11.2 Notes 54
3 Predicting Categories: Getting Started with Classification 55
3.1 Classification Tasks 55
3.2 A Simple Classification Dataset 56
3.3 Training and Testing: Don’t Teach to the Test 59
3.4 Evaluation: Grading the Exam 62
3.5 Simple Classifier #1: Nearest Nei ors, Long Distance Relationships, and Assumptions 63
3.5.1 Defining Similarity 63
3.5.2 The k in k-NN 64
3.5.3 Answer Combination 64
3.5.4 k-NN, Parameters, and Nonparametric Methods 65
3.5.5 Building a k-NN Classification Model 66
3.6 Simple Classifier #2: Naive Bayes, Probability, and Broken Promises 68
3.7 Simplistic Evaluation of Classifiers 70
3.7.1 Learning Performance 70
3.7.2 Resource Utilization in Classification 71
3.7.3 Stand-Alone Resource Evaluation 77
3.8 EOC 81
3.8.1 Sophomore Warning: Limitations and Open Issues 81
3.8.2 Summary 82
3.8.3 Notes 82
3.8.4 Exercises 83
4 Predicting Numerical Values: Getting Started with Regression 85
4.1 A Simple Regression Dataset 85
4.2 Nearest-Nei ors Regression and Summary Statistics 87
4.2.1 Measures of Center: Median and Mean 88
4.2.2 Building a k-NN Regression Model 90
4.3 Linear Regression and Errors 91
4.3.1 No Flat Earth: Why We Need Slope 92
4.3.2 Tilting the Field 94
4.3.3 Performing Linear Regression 97
4.4 Optimization: Picking the Best Answer 98
4.4.1 Random Guess 98
4.4.2 Random Step 99
4.4.3 Smart Step 99
4.4.4 Calculated Shortcuts 100
4.4.5 Application to Linear Regression 101
4.5 Simple Evaluation and Comparison of Regressors 101
4.5.1 Root Mean Squared Error 101
4.5.2 Learning Performance 102
4.5.3 Resource Utilization in Regression 102
4.6 EOC 104
4.6.1 Limitations and Open Issues 104
4.6.2 Summary 105
4.6.3 Notes 105
4.6.4 Exercises 105
II Evaluation 107
5 Evaluating and Comparing Learners 109
5.1 Evaluation and Why Less Is More 109
5.2 Terminology for Learning Phases 110
5.2.1 Back to the Machines 110
5.2.2 More Technically 113
5.3 Major Tom, There’s Something Wrong: Overfitting and Underfitting 116
5.3.1 Synthetic Data and Linear Regression 117
5.3.2 Manually Manipulating Model Complexity 118
5.3.3 Goldilocks: Visualizing Overfitting, Underfitting, and “Just Right” 120
5.3.4 Simplicity 124
5.3.5 Take-Home Notes on Overfitting 124
5.4 From Errors to Costs 125
5.4.1 Loss 125
5.4.2 Cost 126
Speaking
5.4.3 Score 127
5.5 (Re)Sampling: Making More from Less 128
5.5.1 Cross-Validation 128
5.5.2 Stratification 132
5.5.3 Repeated Train-Test Splits 133
5.5.4 A Better Way and Shuffling 137
5.5.5 Leave-One-Out Cross-Validation 140
5.6 Break-It-Down: Deconstructing Error into Bias and Variance 142
5.6.1 Variance of the Data 143
5.6.2 Variance of the Model 144
5.6.3 Bias of the Model 144
5.6.4 All Together Now 145
5.6.5 Examples of Bias-Variance Tradeoffs 145
5.7 Graphical Evaluation and Comparison 149
5.7.1 Learning Curves: How Much Data Do We Need? 150
5.7.2 Complexity Curves 152
5.8 Comparing Learners with Cross-Validation 154
5.9 EOC 155
5.9.1 Summary 155
5.9.2 Notes 155
5.9.3 Exercises 157
6 Evaluating Classifiers 159
6.1 Baseline Classifiers 159
6.2 Beyond Accuracy: Metrics
for Classification 161
6.2.1 Eliminating Confusion from the Confusion Matrix 163
6.2.2 Ways of Being Wrong 164
6.2.3 Metrics from the Confusion Matrix 165
6.2.4 Coding the Confusion Matrix 166
6.2.5 Dealing with Multiple Classes: Multiclass Averaging 168
6.2.6 F1 170
6.3 ROC Curves 170
6.3.1 Patterns in the ROC 173
6.3.2 Binary ROC 174
6.3.3 AUC: Area-Under-the-(ROC)- Curve 177
6.3.4 Multiclass Learners, One-versus-Rest, and ROC 179
6.4 Another Take on Multiclass: One-versus-One 181
6.4.1 Multiclass AUC Part Two: The Quest for a Single Value 182
6.5 Precision-Recall Curves 185
6.5.1 A Note on Precision-Recall Tradeoff 185
6.5.2 Constructing a Precision-Recall Curve 186
6.6 Cumulative Response and Lift Curves 187
6.7 More Sophisticated Evaluation of Classifiers: Take Two 190
6.7.1 Binary 190
6.7.2 A Novel Multiclass Problem 195
6.8 EOC 201
6.8.1 Summary 201
6.8.2 Notes 202
6.8.3 Exercises 203
7 Evaluating Regressors 205
7.1 Baseline Regressors 205
7.2 Additional Measures for Regression 207
7.2.1 Creating Our Own Evaluation Metric 207
7.2.2 Other Built-in Regression Metrics 208
7.2.3 R2 209
7.3 Residual Plots 214
7.3.1 Error Plots 215
7.3.2 Residual Plots 217
7.4 A First Look at Standardization 221
7.5 Evaluating Regressors in a More Sophisticated Way: Take Two 225
7.5.1 Cross-Validated Results on Multiple Metrics 226
7.5.2 Summarizing Cross-Validated Results 230
7.5.3 Residuals 230
7.6 EOC 232
7.6.1 Summary 232
7.6.2 Notes 232
7.6.3 Exercises 234
III More Methods and Fundamentals 235
8 More Classification Methods 237
8.1 Revisiting Classification 237
8.2 Decision Trees 239
8.2.1 Tree-Building Algorithms 242
8.2.2 Let’s Go: Decision Tree Time 245
8.2.3 Bias and Variance in Decision Trees 249
8.3 Support Vector Classifiers 249
8.3.1 Performing SVC 253
8.3.2 Bias and Variance in SVCs 256
8.4 Logistic Regression 259
8.4.1 Betting Odds 259
8.4.2 Probabilities, Odds, and Log-Odds 262
8.4.3 Just Do It: Logistic Regression Edition 267
8.4.4 A Logistic Regression: A Space Oddity 268
8.5 Discriminant Analysis 269
8.5.1 Covariance 270
8.5.2 The Methods 282
8.5.3 Performing DA 283
8.6 Assumptions, Biases, and Classifiers 285
8.7 Comparison of Classifiers: Take Three 287
8.7.1 Digits 287
8.8 EOC 290
8.8.1 Summary 290
8.8.2 Notes 290
8.8.3 Exercises 293
9 More Regression Methods 295
9.1 Linear Regression in the Penalty Box: Regularization 295
9.1.1 Performing Regularized Regression 300
9.2 Support Vector Regression 301
9.2.1 Hinge Loss 301
9.2.2 From Linear Regression to Regularized Regression to
Support Vector Regression 305
9.2.3 Just Do It — SVR Style 307
9.3 Piecewise Constant Regression 308
9.3.1 Implementing a Piecewise
Constant Regressor 310
9.3.2 General Notes on Implementing Models 311
9.4 Regression Trees 313
9.4.1 Performing Regression with Trees 313
9.5 Comparison of Regressors: Take Three 314
9.6 EOC 318
9.6.1 Summary 318
9.6.2 Notes 318
9.6.3 Exercises 319
Data for Fun and Profit 321
10.1 Feature Engineering Terminology and Motivation 321
10.1.1 Why Engineer Features? 322
10.1.2 When Does Engineering Happen? 323
10.1.3 How Does Feature Engineering Occur? 324
10.2 Feature Selection and Data Reduction: Taking out the Trash 324
10.3 Feature Scaling 325
10.4 Discretization 329
10.5 Categorical Coding 332
10.5.1 Another Way to Code and the Curious Case of the Missing
Intercept 334
10.6 Relationships and Interactions 341
10.6.1 Manual Feature Construction 341
10.6.2 Interactions 343
10.6.3 Adding Features with Transformers 348
10.7 Target Manipulations 350
10.7.1 Manipulating the Input Space 351
10.7.2 Manipulating the Target 353
10.8 EOC 356
10.8.1 Summary 356
10.8.2 Notes 356
10.8.3 Exercises 357
359
11.1 Models, Parameters, Hyperparameters 360
11.2 Tuning Hyperparameters 362
11.2.1 A Note on Computer Science and Learning Terminology 362
11.2.2 An Example of Complete Search 362
11.2.3 Using Randomness to Search for a Needle in a Haystack 368
11 Tuning Hyperparameters and Pipelines
10 Manual Feature Engineering: Manipulating
11.3 Down the Recursive Rabbit Hole: Nested Cross-Validation 370
11.3.1 Cross-Validation, Redux 370
11.3.2 GridSearch as a Model 371
11.3.3 Cross-Validation Nested within Cross-Validation 372
11.3.4 Comments on Nested CV 375
11.4 Pipelines 377
11.4.1 A Simple Pipeline 378
11.4.2 A More Complex Pipeline 379
11.5 Pipelines and Tuning Together 380
11.6 EOC 382
11.6.1 Summary 382
11.6.2 Notes 382
11.6.3 Exercises 383
IV Adding Complexity 385
387
12.1 Ensembles 387
12.2 Voting Ensembles 389
12.3 Bagging and Random Forests 390
12.3.1 Bootstrapping 390
12.3.2 From Bootstrapping to Bagging 394
12.3.3 Through the Random Forest 396
12.4 Boosting 398
12.4.1 Boosting Details 399
12.5 Comparing the Tree-Ensemble Methods 401
12.6 EOC 405
12.6.1 Summary 405
12.6.2 Notes 405
12.6.3 Exercises 406
12 Combining Learners409
13.1 Feature Selection 411
13.1.1 Single-Step Filtering with
Metric-Based Feature Selection 412
13.1.2 Model-Based Feature Selection 423
13.1.3 Integrating Feature Selection with
a Learning Pipeline 426
13.2 Feature Construction with Kernels 428
13.2.1 A Kernel Motivator 428
13.2.2 Manual Kernel Methods 433
13.2.3 Kernel Methods and Kernel Options 438
13.2.4 Kernelized SVCs: SVMs 442
13.2.5 Take-Home Notes on SVM and an Example 443
13.3 Principal Components Analysis: An Unsupervised Technique 445
13.3.1 A Warm Up: Centering 445
13.3.2 Finding a Different Best Line 448
13.3.3 A First PCA 449
13.3.4 Under the Hood of PCA 452
13.3.5 A Finale: Comments on General PCA 457
13.3.6 Kernel PCA and Manifold
Methods 458
13.4 EOC 462
13.4.1 Summary 462
13.4.2 Notes 462
13.4.3 Exercises 467
Domain-Specific Learning 469
14.1 Working with Text 470
14.1.1 Encoding Text 471
14.1.2 Example of Text Learning 476
14.2 Clustering 479
14.2.1 k-Means Clustering 479
13 Models That Engineer Features for Us
14 Feature Engineering for Domains:
14.3 Working with Images 481
14.3.1 Bag of Visual Words 481
14.3.2 Our Image Data 482
14.3.3 An End-to-End System 483
14.3.4 Complete Code of BoVW Transformer 491
14.4 EOC 493
14.4.1 Summary 493
14.4.2 Notes 494
14.4.3 Exercises 495
Directions 497
15.1 Optimization 497
15.2 Linear Regression from Raw Materials 500
15.2.1 A Graphical View of Linear Regression 504
15.3 Building Logistic Regression from Raw Materials 504
15.3.1 Logistic Regression with Zero-One Coding 506
15.3.2 Logistic Regression with Plus-One Minus-One Coding 508
15.3.3 A Graphical View of Logistic Regression 509
15.4 SVM from Raw Materials 510
15.5 Neural Networks 512
15.5.1 A NN View of Linear Regression 512
15.5.2 A NN View of Logistic Regression 515
15.5.3 Beyond Basic Neural Networks 516
15.6 Probabilistic Graphical Models 516
15.6.1 Sampling 518
15.6.2 A PGM View of Linear Regression 519
15 Connections, Extensions, and Further
15.6.3 A PGM View of Logistic Regression 523
15.7 EOC 525
15.7.1 Summary 525
15.7.2 Notes 526
15.7.3 Exercises 527
A mlwpy.py Listing 529
Index 537
第1章 机器学习概论3
1.1 欢迎来到机器学习的世界3
1.2 范围、术语、预测和数据4
1.2.1 特征5
1.2.2 目标值和预测值6
1.3 让机器开始机器学习7
1.4 学 统举例9
1.4.1 预测类别:分类器举例9
1.4.2 预测值:回归器举例10
1.5 评估机器学 统11
1.5.1 准确率11
1.5.2 资源消耗12
1.6 创建机器学 统的过程13
1.7 机器学习的假设和现实15
1.8 参考阅读资料17
1.8.1 进一步研究方向17
1.8.2 注释17
第2章 相关技术背景19
2.1 编程环境配置19
2.2 数学语言的必要性19
2.3 用于解决机器学习问题的软件20
2.4 概率21
2.4.1 基本事件22
2.4.2 独立性23
2.4.3 条件概率24
2.4.4 概率分布25
2.5 线性组合、加权和以及点积28
2.5.1 加权平均30
2.5.2 平方和32
2.5.3 误差平方和33
2.6 几何视图:空间中的点34
2.6.1 直线34
2.6.2 直线拓展39
2.7 表示法和加1技巧43
2.8 渐入佳境:突破线性和非线性45
2.9 NumPy与“数学无所不在”47
2.9.1 一维数组与二维数组49
2.10 浮点数问题52
2.11 参考阅读资料53
2.11.1 小结53
2.11.2 注释54
第3章 预测类别:分类入门55
3.1 分类任务55
3.2 一个简单的分类数据集56
3.3 训练和测试:请勿“应试教育”59
3.4 评估:考试评分62
3.5 简单分类器1: 近邻分类器、远距离关系和假设63
3.5.1 定义相似性63
3.5.2 k- 近邻中的k64
3.5.3 答案组合64
3.5.4 k- 近邻、参数和非参数方法65
3.5.5 建立一个k- 近邻分类模型66
3.6 简单分类器2:朴素贝叶斯分类器、概率和违背承诺68
3.7 分类器的简单评估70
3.7.1 机器学习的性能70
3.7.2 分类器的资源消耗71
3.7.3 独立资源评估77
3.8 参考阅读资料81
3.8.1 再次警告:局限性和尚未解决的问题81
3.8.2 小结82
3.8.3 注释82
3.8.4 练习题83
第4章 预测数值:回归入门85
4.1 一个简单的回归数据集85
4.2 近邻回归和汇总统计87
4.2.1 中心测量:中位数和均值88
4.2.2 构建一个k- 近邻回归模型90
4.3 线性回归和误差91
4.3.1 地面总是不平坦的:为什么需要斜坡92
4.3.2 倾斜直线94
4.3.3 执行线性回归97
4.4 优化:选择 答案98
4.4.1 随机猜测98
4.4.2 随机步进99
4.4.3 智能步进99
4.4.4 计算的捷径100
4.4.5 线性回归的应用101
4.5 回归器的简单评估和比较101
4.5.1 均方根误差101
4.5.2 机器学习的性能102
4.5.3 回归过程中的资源消耗102
4.6 参考阅读资料104
4.6.1 局限性和尚未解决的问题104
4.6.2 小结105
4.6.3 注释105
4.6.4 练习题105
第二部分 通用评估技术
第5章 机器学习算法的评估和比较分析109
5.1 评估和大道至简的原则109
5.2 机器学习阶段的术语110
5.2.1 有关机器的重新讨论110
5.2.2 规范的阐述113
5.3 过拟合和欠拟合116
5.3.1 合成数据和线性回归117
5.3.2 手动操控模型的复杂度118
5.3.3 “恰到好处”原则:可视化过拟合、欠拟合和 拟合120
5.3.4 简单性124
5.3.5 关于过拟合必须牢记的注意事项124
5.4 从误差到成本125
5.4.1 损失125
5.4.2 成本126
5.4.3 评分127
5.5 (重新)抽样:以少胜多128
5.5.1 交叉验证128
5.5.2 分层抽样132
5.5.3 重复的训练–测试数据集拆分133
5.5.4 一种 好的方法和混排137
5.5.5 留一交叉验证140
5.6 分解:将误差分解为偏差和方差142
5.6.1 数据的方差143
5.6.2 模型的方差144
5.6.3 模型的偏差144
5.6.4 结合所有的因素145
5.6.5 偏差–方差权衡示例145
5.7 图形可视化评估和比较149
5.7.1 学习曲线:到底需要多少数据150
5.7.2 复杂度曲线152
5.8 使用交叉验证比较机器学习模型154
5.9 参考阅读资料155
5.9.1 小结155
5.9.2 注释155
5.9.3 练习题157
第6章 评估分类器159
6.1 基线分类器159
6.2 准确度以外:分类器的其他度量指标161
6.2.1 从混淆矩阵中消除混淆163
6.2.2 错误的方式164
6.2.3 基于混淆矩阵的度量指标165
6.2.4 混淆矩阵编码166
6.2.5 处理多元类别:多元类别平均168
6.2.6 F1分数170
6.3 ROC曲线170
6.3.1 ROC模式173
6.3.2 二元分类ROC174
6.3.3 AUC:(ROC)曲线下的面积177
6.3.4 多元分类机器学习模型、一对其他和ROC179
6.4 多元分类的另一种方法:一对一181
6.4.1 多元分类AUC第二部分:寻找单一值182
6.5 率–召回率曲线185
6.5.1 关于 率–召回率权衡的说明185
6.5.2 构建 率–召回率曲线186
6.6 累积响应和提升曲线187
6.7 复杂的分类器评估:第二阶段190
6.7.1 二元分类190
6.7.2 一个新颖的多元分类问题195
6.8 参考阅读资料201
6.8.1 小结201
6.8.2 注释202
6.8.3 练习题203
第7章 评估回归器205
7.1 基线回归器205
7.2 回归器的其他度量指标207
7.2.1 创建自定义的评估指标207
7.2.2 其他内置的回归度量指标208
7.2.3 R2209
7.3 残差图214
7.3.1 误差图215
7.3.2 残差图217
7.4 标准化初探221
7.5 使用 复杂的方法评估回归系数:第二阶段225
7.5.1 多个度量指标的交叉验证结果226
7.5.2 交叉验证结果汇总230
7.5.3 残差230
7.6 参考阅读资料232
7.6.1 小结232
7.6.2 注释232
7.6.3 练习题234
第三部分 多方法和其他技术
第8章 多分类方法237
8.1 重温分类知识237
8.2 决策树239
8.2.1 树构建算法242
8.2.2 让我们开始:决策树时间245
8.2.3 决策树中的偏差和方差249
8.3 支持向量分类器2498.3.1 执行支持向量分类器253
8.3.2 SVC中的偏差和方差256
8.4 逻辑回归259
8.4.1 投注几率259
8.4.2 概率、几率和对数几率262
8.4.3 实现操作:逻辑回归版本267
8.4.4 逻辑回归:空间奇异性268
8.5 判别分析269
8.5.1 协方差270
8.5.2 方法282
8.5.3 执行判别分析283
8.6 假设、偏差和分类器285
8.7 分类器的比较:第三阶段287
8.7.1 数字287
8.8 参考阅读资料290
8.8.1 小结290
8.8.2 注释290
8.8.3 练习题293
第9章 多回归方法295
9.1 惩罚框中的线性回归:正则化295
9.1.1 执行正则化回归300
9.2 支持向量回归301
9.2.1 铰链损失301
9.2.2 从线性回归到正则化回归,再到支持向量回归305
9.2.3 实践应用:支持向量回归风格307
9.3 分段常数回归308
9.3.1 实现分段常数回归器310
9.3.2 模型实现的一般说明311
9.4 回归树313
9.4.1 用决策树实现回归313
9.5 回归器比较:第三阶段314
9.6 参考阅读资料318
9.6.1 小结318
9.6.2 注释318
9.6.3 练习题319
0章 手动特征工程:操作数据的乐趣和意义321
10.1 特征工程的术语和动机321
10.1.1 为什么选择特征工程322
10.1.2 何时开始特征工程323
10.1.3 特征工程是如何发生的324
10.2 特征选择和数据简化:清除垃圾324
10.3 特征缩放325
10.4 离散化329
10.5 分类编码332
10.5.1 编码的另一种方式以及无截距的奇怪情况334
10.6 关系和相互作用341
10.6.1 手动特征构造341
10.6.2 相互作用343
10.6.3 使用转换器添加特征348
10.7 对输入空间和目标的相关操作350
10.7.1 对输入空间的相关操作351
10.7.2 对目标的相关操作353
10.8 参考阅读资料356
10.8.1 小结356
10.8.2 注释356
10.8.3 练习题357
1章 调整超参数和管道技术359
11.1 模型、参数、超参数360
11.2 调整超参数362
11.2.1 关于计算机科学和机器学习术语的说明362
11.2.2 关于完整搜索的示例362
11.2.3 使用随机性在大海中捞针368
11.3 递归的神奇世界:嵌套交叉验证370
11.3.1 重温交叉验证370
11.3.2 作为模型的网格搜索371
11.3.3 交叉验证中嵌套的交叉验证372
11.3.4 关于嵌套交叉验证的注释375
11.4 管道技术377
11.4.1 简单的管道378
11.4.2 复杂的管道379
11.5 管道和调参相结合380
11.6 参考阅读资料382
11.6.1 小结382
11.6.2 注释382
11.6.3 练习题383
第四部分 主题
2章 组合机器学习模型387
12.1 集成387
12.2 投票集成389
12.3 装袋法和随机森林390
12.3.1 自举390
12.3.2 从自举到装袋法394
12.3.3 随机森林396
12.4 提升方法398
12.4.1 提升方法的核心理念399
12.5 各种树集成方法的比较401
12.6 参考阅读资料405
12.6.1 小结405
12.6.2 注释405
12.6.3 练习题406
3章 提供特征工程的模型409
13.1 特征选择411
13.1.1 基于度量特征的“单步筛选”方法412
13.1.2 基于模型的特征选择423
13.1.3 将特征选择与机器学习管道相集成426
13.2 基于核的特征构造428
13.2.1 核激励因子428
13.2.2 手动核方法433
13.2.3 核方法和核选项438
13.2.4 核化支持向量分类器:支持向量机442
13.2.5 关于SVM的建议和示例443
13.3 主成分分析:一种无监督技术445
13.3.1 预热:中心化数据445
13.3.2 寻找不同的 线路448
13.3.3 次执行PCA449
13.3.4 PCA的内部原理452
13.3.5 对一般PCA的评论457
13.3.6 核心PCA和流形方法458
13.4 参考阅读资料462
13.4.1 小结462
13.4.2 注释462
13.4.3 练习题467
4章 领域特征工程:领域特定的机器学习469
14.1 处理文本470
14.1.1 对文本进行编码471
14.1.2 文本学习的示例476
14.2 聚类479
14.2.1 k-均值聚类479
14.3 处理图像481
14.3.1 视觉词袋481
14.3.2 图像数据482
14.3.3 端到端系统483
14.3.4 全局视觉词袋转换器的完整代码491
14.4 参考阅读资料493
14.4.1 小结493
14.4.2 注释494
14.4.3 练习题495
5章 连接、扩展和进一步研究方向497
15.1 优化497
15.2 基于原始数据的线性回归500
15.2.1 线性回归的可视化视图504
15.3 基于原始数据构建逻辑回归504
15.3.1 采用0-1编码的逻辑回归506
15.3.2 采用加1减1编码的逻辑回归508
15.3.3 逻辑回归的可视化视图509
15.4 基于原始数据的SVM510
15.5 神经网络512
15.5.1 线性回归的神经网络视图512
15.5.2 逻辑回归的神经网络视图515
15.5.3 基本神经网络516
15.6 概率图模型516
15.6.1 抽样518
15.6.2 线性回归的概率图模型视图519
15.6.3 逻辑回归的概率图模型视图523
15.7 参考阅读资料525
15.7.1 小结525
15.7.2 注释526
15.7.3 练习题527
附录A mlwpy.py程序清单529
Contents
I First Steps 1
1 Let’s Discuss Learning 3
1.1 Welcome 3
1.2 Scope, Terminology, Prediction, and Data 4
1.2.1 Features 5
1.2.2 Target Values and Predictions 6
1.3 Putting the Machine in Machine Learning 7
1.4 Examples of Learning Systems 9
1.4.1 Predicting Categories: Examples of Classifiers 9
1.4.2 Predicting Values: Examples of Regressors 10
1.5 Evaluating Learning Systems 11
1.5.1 Correctness 11
1.5.2 Resource Consumption 12
1.6 A Process for Building Learning Systems 13
1.7 Assumptions and Reality of Learning 15
1.8 End-of-Chapter Material 17
1.8.1 The Road Ahead 17
1.8.2 Notes 17
2 Some Technical Background 19
2.1 Abut Our Setup 19
2.2 The Need for Mathematical Language 19
2.3 Our Software for Tackling Machine Learning 20
2.4 Probability 21
2.4.1 Primitive Events 22
2.4.2 Independence 23
2.4.3 Conditional Probability 24
2.4.4 Distributions 25
2.5 Linear Combinations, Weighted Sums, and Dot Products 28
2.5.1 Weighted Average 30
2.5.2 Sums of Squares 32
2.5.3 Sum of Squared Errors 33
2.6 A Geometric View: Points in Space 34
2.6.1 Lines 34
2.6.2 Beyond Lines 39
2.7 Notation and the Plus-One Trick 43
2.8 Getting Groovy, Breaking the Straight-Jacket, and Nonlinearity 45
2.9 NumPy versus “All the Maths” 47
2.9.1 Back to 1D versus 2D 49
2.10 Floating-Point Issues 52
2.11 EOC 53
2.11.1 Summary 53
2.11.2 Notes 54
3 Predicting Categories: Getting Started with Classification 55
3.1 Classification Tasks 55
3.2 A Simple Classification Dataset 56
3.3 Training and Testing: Don’t Teach to the Test 59
3.4 Evaluation: Grading the Exam 62
3.5 Simple Classifier #1: Nearest Nei ors, Long Distance Relationships, and Assumptions 63
3.5.1 Defining Similarity 63
3.5.2 The k in k-NN 64
3.5.3 Answer Combination 64
3.5.4 k-NN, Parameters, and Nonparametric Methods 65
3.5.5 Building a k-NN Classification Model 66
3.6 Simple Classifier #2: Naive Bayes, Probability, and Broken Promises 68
3.7 Simplistic Evaluation of Classifiers 70
3.7.1 Learning Performance 70
3.7.2 Resource Utilization in Classification 71
3.7.3 Stand-Alone Resource Evaluation 77
3.8 EOC 81
3.8.1 Sophomore Warning: Limitations and Open Issues 81
3.8.2 Summary 82
3.8.3 Notes 82
3.8.4 Exercises 83
4 Predicting Numerical Values: Getting Started with Regression 85
4.1 A Simple Regression Dataset 85
4.2 Nearest-Nei ors Regression and Summary Statistics 87
4.2.1 Measures of Center: Median and Mean 88
4.2.2 Building a k-NN Regression Model 90
4.3 Linear Regression and Errors 91
4.3.1 No Flat Earth: Why We Need Slope 92
4.3.2 Tilting the Field 94
4.3.3 Performing Linear Regression 97
4.4 Optimization: Picking the Best Answer 98
4.4.1 Random Guess 98
4.4.2 Random Step 99
4.4.3 Smart Step 99
4.4.4 Calculated Shortcuts 100
4.4.5 Application to Linear Regression 101
4.5 Simple Evaluation and Comparison of Regressors 101
4.5.1 Root Mean Squared Error 101
4.5.2 Learning Performance 102
4.5.3 Resource Utilization in Regression 102
4.6 EOC 104
4.6.1 Limitations and Open Issues 104
4.6.2 Summary 105
4.6.3 Notes 105
4.6.4 Exercises 105
II Evaluation 107
5 Evaluating and Comparing Learners 109
5.1 Evaluation and Why Less Is More 109
5.2 Terminology for Learning Phases 110
5.2.1 Back to the Machines 110
5.2.2 More Technically 113
5.3 Major Tom, There’s Something Wrong: Overfitting and Underfitting 116
5.3.1 Synthetic Data and Linear Regression 117
5.3.2 Manually Manipulating Model Complexity 118
5.3.3 Goldilocks: Visualizing Overfitting, Underfitting, and “Just Right” 120
5.3.4 Simplicity 124
5.3.5 Take-Home Notes on Overfitting 124
5.4 From Errors to Costs 125
5.4.1 Loss 125
5.4.2 Cost 126
Speaking
5.4.3 Score 127
5.5 (Re)Sampling: Making More from Less 128
5.5.1 Cross-Validation 128
5.5.2 Stratification 132
5.5.3 Repeated Train-Test Splits 133
5.5.4 A Better Way and Shuffling 137
5.5.5 Leave-One-Out Cross-Validation 140
5.6 Break-It-Down: Deconstructing Error into Bias and Variance 142
5.6.1 Variance of the Data 143
5.6.2 Variance of the Model 144
5.6.3 Bias of the Model 144
5.6.4 All Together Now 145
5.6.5 Examples of Bias-Variance Tradeoffs 145
5.7 Graphical Evaluation and Comparison 149
5.7.1 Learning Curves: How Much Data Do We Need? 150
5.7.2 Complexity Curves 152
5.8 Comparing Learners with Cross-Validation 154
5.9 EOC 155
5.9.1 Summary 155
5.9.2 Notes 155
5.9.3 Exercises 157
6 Evaluating Classifiers 159
6.1 Baseline Classifiers 159
6.2 Beyond Accuracy: Metrics
for Classification 161
6.2.1 Eliminating Confusion from the Confusion Matrix 163
6.2.2 Ways of Being Wrong 164
6.2.3 Metrics from the Confusion Matrix 165
6.2.4 Coding the Confusion Matrix 166
6.2.5 Dealing with Multiple Classes: Multiclass Averaging 168
6.2.6 F1 170
6.3 ROC Curves 170
6.3.1 Patterns in the ROC 173
6.3.2 Binary ROC 174
6.3.3 AUC: Area-Under-the-(ROC)- Curve 177
6.3.4 Multiclass Learners, One-versus-Rest, and ROC 179
6.4 Another Take on Multiclass: One-versus-One 181
6.4.1 Multiclass AUC Part Two: The Quest for a Single Value 182
6.5 Precision-Recall Curves 185
6.5.1 A Note on Precision-Recall Tradeoff 185
6.5.2 Constructing a Precision-Recall Curve 186
6.6 Cumulative Response and Lift Curves 187
6.7 More Sophisticated Evaluation of Classifiers: Take Two 190
6.7.1 Binary 190
6.7.2 A Novel Multiclass Problem 195
6.8 EOC 201
6.8.1 Summary 201
6.8.2 Notes 202
6.8.3 Exercises 203
7 Evaluating Regressors 205
7.1 Baseline Regressors 205
7.2 Additional Measures for Regression 207
7.2.1 Creating Our Own Evaluation Metric 207
7.2.2 Other Built-in Regression Metrics 208
7.2.3 R2 209
7.3 Residual Plots 214
7.3.1 Error Plots 215
7.3.2 Residual Plots 217
7.4 A First Look at Standardization 221
7.5 Evaluating Regressors in a More Sophisticated Way: Take Two 225
7.5.1 Cross-Validated Results on Multiple Metrics 226
7.5.2 Summarizing Cross-Validated Results 230
7.5.3 Residuals 230
7.6 EOC 232
7.6.1 Summary 232
7.6.2 Notes 232
7.6.3 Exercises 234
III More Methods and Fundamentals 235
8 More Classification Methods 237
8.1 Revisiting Classification 237
8.2 Decision Trees 239
8.2.1 Tree-Building Algorithms 242
8.2.2 Let’s Go: Decision Tree Time 245
8.2.3 Bias and Variance in Decision Trees 249
8.3 Support Vector Classifiers 249
8.3.1 Performing SVC 253
8.3.2 Bias and Variance in SVCs 256
8.4 Logistic Regression 259
8.4.1 Betting Odds 259
8.4.2 Probabilities, Odds, and Log-Odds 262
8.4.3 Just Do It: Logistic Regression Edition 267
8.4.4 A Logistic Regression: A Space Oddity 268
8.5 Discriminant Analysis 269
8.5.1 Covariance 270
8.5.2 The Methods 282
8.5.3 Performing DA 283
8.6 Assumptions, Biases, and Classifiers 285
8.7 Comparison of Classifiers: Take Three 287
8.7.1 Digits 287
8.8 EOC 290
8.8.1 Summary 290
8.8.2 Notes 290
8.8.3 Exercises 293
9 More Regression Methods 295
9.1 Linear Regression in the Penalty Box: Regularization 295
9.1.1 Performing Regularized Regression 300
9.2 Support Vector Regression 301
9.2.1 Hinge Loss 301
9.2.2 From Linear Regression to Regularized Regression to
Support Vector Regression 305
9.2.3 Just Do It — SVR Style 307
9.3 Piecewise Constant Regression 308
9.3.1 Implementing a Piecewise
Constant Regressor 310
9.3.2 General Notes on Implementing Models 311
9.4 Regression Trees 313
9.4.1 Performing Regression with Trees 313
9.5 Comparison of Regressors: Take Three 314
9.6 EOC 318
9.6.1 Summary 318
9.6.2 Notes 318
9.6.3 Exercises 319
Data for Fun and Profit 321
10.1 Feature Engineering Terminology and Motivation 321
10.1.1 Why Engineer Features? 322
10.1.2 When Does Engineering Happen? 323
10.1.3 How Does Feature Engineering Occur? 324
10.2 Feature Selection and Data Reduction: Taking out the Trash 324
10.3 Feature Scaling 325
10.4 Discretization 329
10.5 Categorical Coding 332
10.5.1 Another Way to Code and the Curious Case of the Missing
Intercept 334
10.6 Relationships and Interactions 341
10.6.1 Manual Feature Construction 341
10.6.2 Interactions 343
10.6.3 Adding Features with Transformers 348
10.7 Target Manipulations 350
10.7.1 Manipulating the Input Space 351
10.7.2 Manipulating the Target 353
10.8 EOC 356
10.8.1 Summary 356
10.8.2 Notes 356
10.8.3 Exercises 357
359
11.1 Models, Parameters, Hyperparameters 360
11.2 Tuning Hyperparameters 362
11.2.1 A Note on Computer Science and Learning Terminology 362
11.2.2 An Example of Complete Search 362
11.2.3 Using Randomness to Search for a Needle in a Haystack 368
11 Tuning Hyperparameters and Pipelines
10 Manual Feature Engineering: Manipulating
11.3 Down the Recursive Rabbit Hole: Nested Cross-Validation 370
11.3.1 Cross-Validation, Redux 370
11.3.2 GridSearch as a Model 371
11.3.3 Cross-Validation Nested within Cross-Validation 372
11.3.4 Comments on Nested CV 375
11.4 Pipelines 377
11.4.1 A Simple Pipeline 378
11.4.2 A More Complex Pipeline 379
11.5 Pipelines and Tuning Together 380
11.6 EOC 382
11.6.1 Summary 382
11.6.2 Notes 382
11.6.3 Exercises 383
IV Adding Complexity 385
387
12.1 Ensembles 387
12.2 Voting Ensembles 389
12.3 Bagging and Random Forests 390
12.3.1 Bootstrapping 390
12.3.2 From Bootstrapping to Bagging 394
12.3.3 Through the Random Forest 396
12.4 Boosting 398
12.4.1 Boosting Details 399
12.5 Comparing the Tree-Ensemble Methods 401
12.6 EOC 405
12.6.1 Summary 405
12.6.2 Notes 405
12.6.3 Exercises 406
12 Combining Learners409
13.1 Feature Selection 411
13.1.1 Single-Step Filtering with
Metric-Based Feature Selection 412
13.1.2 Model-Based Feature Selection 423
13.1.3 Integrating Feature Selection with
a Learning Pipeline 426
13.2 Feature Construction with Kernels 428
13.2.1 A Kernel Motivator 428
13.2.2 Manual Kernel Methods 433
13.2.3 Kernel Methods and Kernel Options 438
13.2.4 Kernelized SVCs: SVMs 442
13.2.5 Take-Home Notes on SVM and an Example 443
13.3 Principal Components Analysis: An Unsupervised Technique 445
13.3.1 A Warm Up: Centering 445
13.3.2 Finding a Different Best Line 448
13.3.3 A First PCA 449
13.3.4 Under the Hood of PCA 452
13.3.5 A Finale: Comments on General PCA 457
13.3.6 Kernel PCA and Manifold
Methods 458
13.4 EOC 462
13.4.1 Summary 462
13.4.2 Notes 462
13.4.3 Exercises 467
Domain-Specific Learning 469
14.1 Working with Text 470
14.1.1 Encoding Text 471
14.1.2 Example of Text Learning 476
14.2 Clustering 479
14.2.1 k-Means Clustering 479
13 Models That Engineer Features for Us
14 Feature Engineering for Domains:
14.3 Working with Images 481
14.3.1 Bag of Visual Words 481
14.3.2 Our Image Data 482
14.3.3 An End-to-End System 483
14.3.4 Complete Code of BoVW Transformer 491
14.4 EOC 493
14.4.1 Summary 493
14.4.2 Notes 494
14.4.3 Exercises 495
Directions 497
15.1 Optimization 497
15.2 Linear Regression from Raw Materials 500
15.2.1 A Graphical View of Linear Regression 504
15.3 Building Logistic Regression from Raw Materials 504
15.3.1 Logistic Regression with Zero-One Coding 506
15.3.2 Logistic Regression with Plus-One Minus-One Coding 508
15.3.3 A Graphical View of Logistic Regression 509
15.4 SVM from Raw Materials 510
15.5 Neural Networks 512
15.5.1 A NN View of Linear Regression 512
15.5.2 A NN View of Logistic Regression 515
15.5.3 Beyond Basic Neural Networks 516
15.6 Probabilistic Graphical Models 516
15.6.1 Sampling 518
15.6.2 A PGM View of Linear Regression 519
15 Connections, Extensions, and Further
15.6.3 A PGM View of Logistic Regression 523
15.7 EOC 525
15.7.1 Summary 525
15.7.2 Notes 526
15.7.3 Exercises 527
A mlwpy.py Listing 529
Index 537
机器学习Python版(英文版)
- 名称
- 类型
- 大小
光盘服务联系方式: 020-38250260 客服QQ:4006604884
云图客服:
用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问
Video Player
×
Audio Player
×
pdf Player
×