Data analysis /

副标题：无

作者：edited by G鈋rard Govaert.

分类号：

ISBN：9781848210981

收录收藏 (0) 评论纠错

微信扫一扫,移动浏览光盘

简介

简介

Summary: Publisher Summary 1 The first part of this book is devoted to methods seeking relevant dimensions of data. The variables thus obtained provide a synthetic description which often results in a graphical representation of the data. After a general presentation of the discriminating analysis, the second part is devoted to clustering methods which constitute another method, often complementary to the methods described in the first part, to synthesize and to analyze the data. The book concludes by examining the links existing between data mining and data analysis.

Contents 7
Preface 15
Chapter 1. Principal Component Analysis: Application to Statistical Process Control 17
1.1. Introduction 17
1.2. Data table and related subspaces 18
1.2.1. Data and their characteristics 18
1.2.2. The space of statistical units 21
1.2.3. Variables space 23
1.3. Principal component analysis 24
1.3.1. The method 24
1.3.2. Principal factors and principal components 24
1.3.3. Principal factors and principal components properties 26
1.4. Interpretation of PCA results 27
1.4.1. Quality of representations onto principal planes 27
1.4.2. Axis selection 28
1.4.3. Internal interpretation 29
1.4.4. External interpretation: supplementary variables and individuals 31
1.5. Application to statistical process control 34
1.5.1. Introduction 34
1.5.2. Control charts and PCA 36
1.6. Conclusion 38
1.7. Bibliography 39
Chapter 2. Correspondence Analysis: Extensions and Applications to the Statistical Analysis of Sensory Data 41
2.1. Correspondence analysis 41
2.1.1. Data, example, notations 41
2.1.2. Questions: independence model 42
2.1.3. Intensity, significance and nature of a relationship between two qualitative variables 43
2.1.4. Transformation of the data 44
2.1.5. Two clouds 45
2.1.6. Factorial analysis of X 46
2.1.7. Aid to interpretation 48
2.1.8. Some properties 49
2.1.9. Relationships to the traditional presentation 51
2.1.10. Example: recognition of three fundamental tastes 52
2.2. Multiple correspondence analysis 55
2.2.1. Data, notations and example 55
2.2.2. Aims 57
2.2.3. MCA and CA 57
2.2.4. Spaces, clouds and metrics 58
2.2.5. Properties of the clouds in CA of the CDT 59
2.2.6. Transition formulae 61
2.2.7. Aid for interpretation 61
2.2.8. Example: relationship between two taste thresholds 62
2.3. An example of application at the crossroads of CA and MCA 66
2.3.1. Data 66
2.3.2. Questions: construction of the analyzed table 67
2.3.3. Properties of the CA of the analyzed table 69
2.3.4. Results 70
2.4. Conclusion: two other extensions 79
2.4.1. Internal correspondence analysis 79
2.4.2. Multiple factor analysis (MFA) 79
2.5. Bibliography 80
Chapter 3. Exploratory Projection Pursuit 83
3.1. Introduction 83
3.2. General principles 84
3.2.1. Background 84
3.2.2. What is an interesting projection? 85
3.2.3. Looking for an interesting projection 86
3.2.4. Inference 86
3.2.5. Outliers 87
3.3. Some indexes of interest: presentation and use 87
3.3.1. Projection indexes based on entropy measures 87
3.3.2. Projection indexes based on L2 distances 89
3.3.3. Chi-squared type indexes 91
3.3.4. Indexes based on the cumulative empirical function 91
3.4. Generalized principal component analysis 92
3.4.1. Theoretical background 92
3.4.2. Practice 94
3.4.3. Some precisions 95
3.5. Example 97
3.6. Further topics 102
3.6.1. Other indexes, other structures 102
3.6.2. Unsupervised classification 102
3.6.3. Discrete data 103
3.6.4. Related topics 104
3.6.5. Computation 105
3.7. Bibliography 105
Chapter 4. The Analysis of Proximity Data 109
4.1. Introduction 109
4.2. Representation of proximity data in a metric space 113
4.2.1. Four illustrative examples 113
4.2.2. Definitions 116
4.3. Isometric embedding and projection 119
4.3.1. An example of computations 121
4.3.2. The additive constant problem 122
4.3.3. The case of observed dissimilarity measures blurred by noise 124
4.4. Multidimensional scaling and approximation 124
4.4.1. The parametric MDS model 125
4.4.2. The Shepard founding heuristics 127
4.4.3. The majorization approach 130
4.4.4. Extending MDS to a semi-parametric setting 135
4.5. A fielded application 138
4.5.1. Principal coordinates analysis 138
4.5.2. Dimensionality for the representation space 139
4.5.3. The scree test 141
4.5.4. Recourse to simulations 143
4.5.5. Validation of results 143
4.5.6. The use of exogenous information for interpreting the output configuration 147
4.5.7. Introduction to stochastic modeling in MDS 153
4.6. Bibliography 155
Chapter 5. Statistical Modeling of Functional Data 165
5.1. Introduction 165
5.2. Functional framework 168
5.2.1. Functional random variable 168
5.2.2. Smoothness assumption 169
5.2.3. Smoothing splines 170
5.3. Principal components analysis 172
5.3.1. Model and estimation 172
5.3.2. Dimension and smoothing parameter selection 174
5.3.3. Some comments on discretization effects 175
5.3.4. PCA of climatic time series 176
5.4. Linear regression models and extensions 177
5.4.1. Functional linear models 178
5.4.2. Principal components regression 179
5.4.3. Roughness penalty approach 179
5.4.4. Smoothing parameters selection 180
5.4.5. Some notes on asymptotics 181
5.4.6. Generalized linear models and extensions 181
5.4.7. Land use estimation with the temporal evolution of remote sensing data 182
5.5. Forecasting 185
5.5.1. Functional autoregressive process 185
5.5.2. Smooth ARH(1) 187
5.5.3. Locally ARH(1) processes 188
5.5.4. Selecting smoothing parameters 189
5.5.5. Some asymptotic results 189
5.5.6. Prediction of climatic time series 189
5.6. Concluding remarks 192
5.7. Bibliography 193
Chapter 6. Discriminant Analysis 197
6.1. Introduction 197
6.2. Main steps in supervised classification 198
6.2.1. The probabilistic framework 198
6.2.2. Sampling schemes 199
6.2.3. Decision function estimation strategies 200
6.2.4. Variables selection 201
6.2.5. Assessing the misclassification error rate 203
6.2.6. Model selection and resampling techniques 205
6.3. Standard methods in supervised classification 206
6.3.1. Linear discriminant analysis 207
6.3.2. Logistic regression 208
6.3.3. The K nearest neighbors method 211
6.3.4. Classification trees 213
6.3.5. Single hidden layer back-propagation network 215
6.4. Recent advances 220
6.4.1. Parametric methods 220
6.4.2. Radial basis functions 223
6.4.3. Boosting 224
6.4.4. Support vector machines 225
6.5. Conclusion 227
6.6. Bibliography 228
Chapter 7. Cluster Analysis 231
7.1. Introduction 231
7.2. General principles 233
7.2.1. The data 233
7.2.2. Visualizing clusters 234
7.2.3. Types of classification 234
7.2.4. Objectives of clustering 238
7.3. Hierarchical clustering 240
7.3.1. Agglomerative hierarchical clustering (AHC) 241
7.3.2. Agglomerative criteria 242
7.3.3. Example 243
7.3.4. Ward\u2019s method or minimum variance approach 243
7.3.5. Optimality properties 244
7.3.6. Using hierarchical clustering 247
7.4. Partitional clustering: the k -means algorithm 249
7.4.1. The algorithm 249
7.4.2. k-means: a family of methods 250
7.4.3. Using the k-means algorithm 252
7.5. Miscellaneous clustering methods 255
7.5.1. Dynamic cluster method 255
7.5.2. Fuzzy clustering 256
7.5.3. Constrained clustering 257
7.5.4. Self-organizing map 258
7.5.5. Clustering variables 259
7.5.6. Clustering high-dimensional datasets 260
7.6. Block clustering 261
7.6.1. Binary data 263
7.6.2. Contingency table 264
7.6.3. Continuous data 265
7.6.4. Some remarks 266
7.7. Conclusion 267
7.8. Bibliography 267
Chapter 8. Clustering and the Mixture Model 273
8.1. Probabilistic approaches in cluster analysis 273
8.1.1. Introduction 273
8.1.2. Parametric approaches 274
8.1.3. Non-parametric methods 275
8.1.4. Validation 276
8.1.5. Notation 276
8.2. The mixture model 277
8.2.1. Introduction 277
8.2.2. The model 277
8.2.3. Estimation of parameters 278
8.2.4. Number of components 279
8.2.5. Identifiability 279
8.3. EM algorithm 279
8.3.1. Introduction 279
8.3.2. Complete data and complete-data likelihood 280
8.3.3. Principle 280
8.3.4. Application to mixture models 281
8.3.5. Properties 282
8.3.6. EM: an alternating optimization algorithm 282
8.4. Clustering and the mixture model 283
8.4.1. The two approaches 283
8.4.2. Classification likelihood 283
8.4.3. The CEM algorithm 284
8.4.4. Comparison of the two approaches 285
8.4.5. Fuzzy clustering 286
8.5. Gaussian mixture model 287
8.5.1. The model 287
8.5.2. CEM algorithm 288
8.5.3. Spherical form, identical proportions and volumes 289
8.5.4. Spherical form, identical proportions but differing volumes 290
8.5.5. Identical covariance matrices and proportions 291
8.6. Binary variables 291
8.6.1. Data 291
8.6.2. Binary mixture model 292
8.6.3. Parsimonious model 293
8.6.4. Example of application 295
8.7. Qualitative variables 295
8.7.1. Data 295
8.7.2. The model 295
8.7.3. Parsimonious model 297
8.8. Implementation 298
8.8.1. Choice of model and of the number of classes 298
8.8.2. Strategies for use 299
8.8.3. Extension to particular situations 299
8.9. Conclusion 300
8.10. Bibliography 300
Chapter 9. Spatial Data Clustering 305
9.1. Introduction 305
9.1.1. The spatial data clustering problem 305
9.1.2. Examples of applications 306
9.2. Non-probabilistic approaches 309
9.2.1. Using spatial variables 309
9.2.2. Transformation of variables 309
9.2.3. Using a matrix of spatial distances 309
9.2.4. Clustering with contiguity constraints 310
9.3. Markov random fields as models 311
9.3.1. Global methods and Bayesian approaches 311
9.3.2. Markov random fields 313
9.3.3. Markov fields for observations and classes 316
9.3.4. Supervised segmentation 317
9.4. Estimating the parameters for a Markov field 321
9.4.1. Supervised estimation 321
9.4.2. Unsupervised estimation with EM 323
9.4.3. Classification likelihood and inertia with spatial smoothing 326
9.4.4. Other methods of unsupervised estimation 328
9.5. Application to numerical ecology 329
9.5.1. The problem 329
9.5.2. The model: Potts field and Bernoulli distributions 330
9.5.3. Estimating the parameters 331
9.5.4. Resulting clustering 331
9.6. Bibliography 332
List of Authors 335
Index 339

已确认勘误

页码	勘误内容	提交人	修订印次

Data analysis /

名称
类型
大小

用户反馈

FAQ

Data analysis /

已确认勘误

第次印刷 筛选

第次印刷