副标题:无

作   者:

分类号:

ISBN:9783642231995

微信扫一扫,移动浏览光盘

简介

This book constitutes the refereed proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2011, held in New York, NY, USA. The 44 revised full papers presented were carefully reviewed and selected from 170 submissions. The papers are organized in topical sections on classification and decision theory, theory of learning, clustering, appilication in medicine, Webmining and information mining; and machine learning and image mining.

目录

Title 1
Preface 4
Organization 5
Table of Contents 7
Classification and Decision Theory 7
Quadratically Constrained Maximum a Posteriori Estimation for Binary Classifier 11
Introduction 11
Maximum a Posteriori-Based Classifier 12
Proposed Method 13
Linear Model and Training Method 14
A Unified Characterization of LSR and SVM 15
A New Classifier 19
Construction of GQCM Classifier 20
Experiments 21
Experiment Using Artificial Samples 22
Performance with UCI Data Sets 22
Discussion 23
Conclusions and Future Work 24
References 24
Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification 26
Introduction 26
Related Work 27
Hubness-Weighted kNN 27
Fuzzy Nearest Neighbor Algorithm 28
Proposed Hubness-Based Fuzzy Measures 29
Experimental Evaluation 31
UCI Data Sets 32
ImageNet Data 37
Conclusions and Future Work 38
References 39
Decisions: Algebra and Implementation 41
Introduction 41
Decision Algebra 42
Decision Functions 42
Learning and Deciding 43
Auxiliary Decision Function Operations 44
Decision Lattices 46
The ``More Accurate'' Relations 46
Approximating Decision Functions 47
Experiments 48
Implementation Details 49
Decision Graph Sizes 50
k-Approximated Decision Graphs 51
Related Work 52
Conclusions and Future Work 53
References 54
Smoothing Multinomial Na茂ve Bayes in the Presence of Imbalance 56
Introduction 56
Related Work 57
Random OverSampling Expected Smoothing 59
ROSE Smoothing Background 59
ROSE Smoothing Approach 60
Experiments 61
Experiment 1: Standard Datasets 61
Experiment 2: Class Prior Controlled Data Sets 64
Conclusion 68
References 69
ACE-Cost: Acquisition Cost Efficient Classifier by Hybrid Decision Tree with Local SVM Leaves 70
Introduction 70
Preliminaries and Related Work 72
Computing Average Test Cost in Decision Tree and SVM 72
Cost Efficient SVM 73
Cost Efficient Decision Trees 74
Preprune and Postprune 74
ACE-Cost Approach: The Hybrid Decision Tree with Local SVM Leaves 75
Decision Tree Sketch 75
Postpruning with Local SVM 76
Feature Selection at Local SVM Leaves 78
Experimental Results 78
Performance Comparison on Standard Dataset 79
Synthetic Dataset 80
A Practical Application with Dependent Cost 82
Discussion and Future Work 82
Conclusion 83
References 83
Informative Variables Selection for Multi-relational Supervised Learning 85
Introduction 85
Approach Illustration 87
Evaluation Criterion 88
Grid Optimisation 89
Experiments 89
Protocol 89
Artificial Datasets 91
Stulong Dataset 93
Conclusion 96
References 97
Separability of Split Value Criterion with Weighted Separation Gains 98
Introduction 98
Split Criteria 99
Weighting Separability Gains 100
The Analysis 101
Comparison of the Split Criteria 101
Analysis of the $伪$ Parameter 104
Conclusions 107
References 108
Granular Instances Selection for Fuzzy Modeling 109
Introduction 109
Background Studies 110
Instance Selection 110
Fuzzy Modeling 111
The Proposed Methodology 112
The Framework of the Proposed GIS Fuzzy Modeling Methodology 112
Granular Instances Selection 113
Evaluation 115
Experimental Studies 115
Conclusion 119
References 120
Parameter-Free Anomaly Detection for Categorical Data 122
Introduction 122
Related Work 123
Our Proposed Method 124
Problem Definition 124
Outlier Detection as an Optimization Problem 128
Outlier Factor 129
Update of Entropy and Weight 130
The ITB Methods and Approximate Optimization 131
Experimental Results 132
Effectiveness Test 132
Efficiency Test 134
Conclusion 135
References 136
Fuzzy Semi-supervised Support Vector Machines 137
Introduction 137
Fuzzy Semi-supervised Support Vector Machines Approach 141
Experimental Results 144
Datasets 144
Experimental Design 144
The Classification Tasks 145
Evaluation Procedure 145
Parameter Optimization 145
Experiments and Results 145
Conclusion 146
References 147
GENCCS: A Correlated Group Difference Approach to Contrast Set Mining 150
Introduction 150
Related Work 151
Problem Definition 152
Mutual Information and All Confidence 153
Correlated Group Difference 154
Background 154
Data Format 154
Search for Quantitative Contrast Sets 155
Distribution Difference 156
Our Proposed Approach 156
Tests for Significance 157
Comparison of Contrasting Groups 157
Discretization 157
Mining Correlated Group Differences 158
Experimental Results 160
Performance of GENCCS 160
Effect of Mutual Information and All Confidence 161
Conclusion 163
References 164
Collective Classification Using Heterogeneous Classifiers 165
Introduction 165
Background 167
Notation 167
Collective Classification 168
Collective Classification Using Heterogeneous Classifiers 169
Related Work 170
Experimental Setup 172
Datasets 172
Sampling 173
Classification Methods 174
Experimental Results 174
Analysis of Average Local Accuracy Values 174
Performance of Different Classifiers 174
Performance of Classifier Combination 176
Discussion 177
References 178
Spherical Nearest Neighbor Classification: Application to Hyperspectral Data 180
Introduction 180
Mapping of Images to a Hypersphere 183
Tangent Space and Manifolds 183
Exponential and Log Maps 184
Spherical Metrics 184
Spherical Geodesic and Mahalanobis Metrics 184
Spherical Discriminant Adaptive Nearest-Neighbor Classifier 186
Experiments 187
Data 188
Results 188
Conclusions 192
References 193
Adaptive Kernel Diverse Density Estimate for Multiple Instance Learning 195
Introduction 195
Maximum Diverse Density 197
Kernel Density Estimate 198
Kernel Density Estimate for MIL 199
Kernelized Diverse Density Estimate of Positive Bags 200
Kernel Density Estimate of Negative Bags 202
Objective Function 202
Optimization and Implementation 204
Experiments 205
Conclusions 207
References 207
Boosting Inspired Process for Improving AUC 209
Introduction 209
Boosting Inspired Process AUCBoost 210
Weighted AUC (WAUC) 212
Experiments 214
Base Learning Algorithm: C4.4 215
Base Learning Algorithm: Naive Bayes 217
Conclusions and Future Work 218
References 219
Theory of Learning 8
Investigation in Transfer Learning: Better Way to Apply Transfer Learning between Agents 220
Introduction 220
Reinforcement Learning 221
The Q\u2013Learning Algorithm 222
Accelerating Reinforcement Learning through Heuristics 223
Case Based Reasoning and Transfer Learning 224
Combination of the Techniques: Transfer Learning with Case Based Reasoning and Reinforcement Learning 226
The Transfer Learning Experience 226
Conclusion 231
References 232
Exploration Strategies for Learned Probabilities in Smart Terrain 234
Probabilistic Smart Terrain 234
Learned Probabilities 235
Object Categories and Prior Knowledge 235
Bayesian Parameter Learning 236
Exploration Strategies and Benchmarks 236
Defining Information Gain 237
Defining a Simple Two-Object Case 238
Estimating Distance Traveled and Error 238
Normalizing Error over Categories Based on Prevalence 239
Estimating Information Gain 240
A Simple Example of Information Gain 240
Creating the Influence Map 241
Inverse Falloff 241
Cumulative Effect of Objects 241
Updating the Influence Map When Information Is Learned 242
Benchmark Testing 242
Different Levels of Prior Knowledge 242
Different Category Prevalence 243
Significantly Closer Objects 243
Aggregate Influence 243
Empirical Demonstration of Learning 244
Evaluating Ability to Move Towards the Best Objects 244
Evaluating the Value of Learning 246
Conclusions and Ongoing Research 247
Ongoing Research 247
References 247
Sensitivity Analysis for Weak Constraint Generation 249
Introduction 249
Abstract Problem Statement 250
Formulation of the Multi-constraint Planning Problem 250
Examples 252
Solution Approach 252
Plan Changes 252
Sensitivity Analysis, Simulation, and Clustering 253
Strategic Release Planning: A Multi-constraint Planning Problem 254
Simulation Overview 255
Data Generation and Pre-processing 257
Sensitivity Analysis and Clustering 257
Simulation and Clustering Experiment 258
Use Case Scenario 259
Discussion 260
Limitations 260
Conclusions and Future Work 261
References 262
Dictionary Learning Based on Laplacian Score in Sparse Coding 263
Introduction 263
Related Work 264
Laplacian Score for Dictionary Learning 266
Experiments 267
Experiment Datasets 267
Experiment Setup 267
Experiment Result 268
Conclusion 273
References 273
Clustering 8
A Practical Approach for Clustering Transaction Data 275
Introduction 275
Clustering Algorithm 277
Objective Function 278
Clustering Procedure 280
Empirical Evaluation 281
Comparing Algorithms 281
Quality Measure 282
Experiments on Synthetic Data 282
Experiments on Real-World Data 286
Conclusion 288
References 288
Hierarchical Clustering with High Order Dissimilarities 290
Introduction 290
Dissimilarity Increments 291
Dissimilarity Increments Distribution 292
Hierarchical Clustering Algorithm 293
Algorithm 293
Minimum Description Length Criterion 295
Algorithm Analysis 296
Experimental Results 299
Datasets 299
Parameter Selection 299
Exprimental Results and Discussion 300
Conclusions 302
References 302
Clust-XPaths: Clustering of XML Paths 304
Introduction 304
Related Work 305
XML Structure Clustering 305
XML Content Clustering 306
Clust-XPaths: Our Approach 307
Thesaurus 307
Paths Matrix 308
Clustering 311
Experiments 312
Evaluation Prototype 312
Test Collection and Results 312
Conclusion 314
References 314
Comparing Clustering and Metaclustering Algorithms 316
Introduction 316
Related Work 317
Meta-clustering 318
Bagged Clustering 318
Majority Voting 320
Graph Partitioning 320
Cluster Validation Techniques 321
Experimental Results 322
Conclusions 323
References 324
Applications in Medicine 8
Detection of Phenotypes in Microarray Data Using Force-Directed Placement Transforms 330
Introduction 330
Force-Directed Placement Strategies 331
Modified Force-Directed Placement Transform 332
Application to Model Data 333
Feature Selection Algorithm 334
Results on Model Data 335
Application to Real Data 338
Discussion 342
References 343
On the Temporal Behavior of EEG Recorded during Real Finger Movement 345
Introduction 345
Exploiting the Temporal Information 346
Hidden Markov Models for BCI 346
Conditional Random Fields for BCI 347
Methods for Temporal Classification of Self-Paced EEG 348
Method I 348
Method II 349
Method III 350
Experiments 352
Results 353
Discussion and Conclusion 355
References 356
A Machine Learning and Data Mining Framework to Enable Evolutionary Improvement in Trauma Triage 358
Problem Description 358
A Model for Intelligent Triage Support 359
Data 360
Realtime Decision Support through Machine Learning 361
Experiments: Methodology and Results 361
Discussion 363
Mining for Deeper Understanding 363
Knowledge Frontiers 364
Related Work 365
Algorithm Description 366
Experimental Methodology 367
Results and Discussion 368
Conclusion and Future Work 368
References 369
A Decision Support System Based on the Semantic Analysis of Melanoma Images Using Multi-elitist PSO and SVM 372
Introduction 372
Formulation of the Problem of the Semantic Analysis of the Melanoma Malignum 373
The MEPSO Algorithm 375
The Basic Concept of the SVM Classifier 376
Experimental Results 378
Conclusion 382
References 383
WebMining/Information Mining 9
Authorship Similarity Detection from Email Messages 385
Introduction 385
Stylistic Features 387
Authorship Similarity Detection 388
Frequent Pattern Matching 388
Style Differentiation 389
Detection Algorithm 390
Baseline Comparison Methods 390
The Enron Email Corpus 391
Experiment Results 391
Conclusion 394
References 394
An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data 397
Introduction 397
Related Work 398
Problem Definition 400
Classifier Generation Using Secondary Data 400
The SAVSNET Application 402
Evaluation 402
Conclusion 407
References 407
Comparing the One-vs-One and One-vs-All Methods in Benthic Macroinvertebrate Image Classification 409
Introduction 409
Method 410
Linearly Separable Case 410
Linearly Non-separable Case 412
Nonlinear Support Vector Machines 412
One-vs-All 413
One-vs-One 414
Experimental Tests 414
Data Description and Test Arrangements 414
Results 417
Discussion 421
References 422
Incremental Web-Site Boundary Detection Using Random Walks 424
Introduction 424
Preliminaries 425
k-Means Clustering 426
Clustering Based on Random Crawling 426
Experiments 429
Data Set 430
Evaluation Criteria 432
Results 433
Discussion 435
References 436
Discovering Text Patterns by a New Graphic Model 438
Introduction 438
The Method 440
Defining the Task 440
The Model 440
Properties of the Model 441
Finding 442
Complexities 443
Three Tasks 444
Descriptions 444
Definition of Task 1 444
Definition of Task 2 445
Definition of Task 3 445
Empirical Results 446
Experiments Set Up 446
Results on the First Task 446
Results on the Second Task 447
Results on the Third Task 448
Related Researches and More Comparisons 449
Conclusions 451
References 451
Topic Sentiment Change Analysis 453
Introduction 453
Related Works 455
Solution Overview 456
Topic-Level Sentiment Analysis 456
Topic Content Division 456
Topic Sentiment Evaluation 458
Sentiment Change Analysis 458
Time Period Partition 459
Cause Identification 461
Experiments 462
Experiment Setup 462
Topic Identification Results 463
Sentiment Change Analysis Results 463
Conclusions 466
References 467
Adaptive Context Modeling for Deception Detection in Emails 468
Introduction 468
Related Work 469
Proposed Deception Detector 470
Prediction by Partial Matching 470
Generalized Suffix Tree Data Structure 471
Adaptive Context Modeling 472
Experimental Results 473
Conclusions 476
References 477
Contrasting Correlations by an Efficient Double-Clique Condition 479
Introduction 479
Preliminaries 482
Correlation Based on k-Way Mutual Information 482
Problem of Mining Correlation Contrast Sets 484
Detecting Correlation Contrast Sets with Double-Clique Methods 484
Excluding Useless Itemsets with Anti-correlation Graphs 484
Enumerating Cliques in Undirected Graph 485
Additional Support Constraint 486
Algorithm for Extracting Correlation Contrast Sets 486
Experimental Results 487
Contrasted Databases 487
Extracted Correlation Contrast Sets 488
Computational Performance 489
Conclusion and Further Research 491
References 492
Machine Learning and Image Mining 9
Estimating Image Segmentation Difficulty 494
Introduction 494
Mathematical Background 496
Details of the Approach 496
Transformation of Difficulty Measures 497
Feature Extraction 497
Modeling Process 498
Experiments and Results 499
Building the Model with Labeled Data 500
Applying the Model to Additional Data 502
Conclusion and Future Work 504
References 504
Mining Spatial Trajectories Using Non-parametric Density Functions 506
Introduction 506
Related Work 507
Trajectory Mining with Density Functions 508
Trajectory Density Estimation 508
The DENTRAC Trajectory Clustering Algorithm 508
Hill Climbing Procedure 510
Complexity of DENTRAC 512
Post Analysis for Trajectory Clusters 512
Experimental Evaluations 513
Datasets 514
Results for the Oldenburg Traffic Data 514
Post Analysis by Cluster Average Density and the Density of Density Attractors 515
Results of Atlantic Hurricane Tracks Data 516
Conclusion and Future Works 518
References 519
Exploring Synergetic Effects of Dimensionality Reduction and Resampling Tools on Hyperspectral Imagery Data Classification 521
Introduction 521
Methodology 522
Data Preprocessing 522
Classification 524
Experimental Set-Up 525
Classification Performance Measures 526
Results and Discussion 527
Conclusions and Further Extensions 530
References 531
A Comparison between Haralick麓s Texture Descriptor and the Texture Descriptor Based on Random Sets for Biological Images 534
Introduction 534
Texture Descriptors 535
Haralick麓s Texture Descriptor 535
Texture Descriptor Based on Random Sets 539
Material and Application 541
Results 542
Discussion 547
Conclusion 547
References 548
Time Series and Frequent Item Set Mining 10
Unsupervised Discovery of Motifs under Amplitude Scaling and Shifting in Time Series Databases 549
Introduction 549
Background 550
Related Work 551
Algorithm 552
Thresholding 554
Experimental Results 555
Synthetic Data 555
Real-World Data with Known Motifs 557
Discord Discovery in Synthetic Data 557
Empirical Results and Discussion 559
Spurious Motifs 560
Conclusion and Future Work 560
References 561
Static Load Balancing of Parallel Mining of Frequent Itemsets Using Reservoir Sampling 563
Introduction 563
Notation 564
The Lattice of All Itemsets 565
Sampling Methods 566
Database Sample 566
The Reservoir Sampling Algorithm 567
Error of the Estimation of the Size of a Union of PBECs 567
Summary of the Previous Two Methods 568
Proposal of a New DM Parallel Method 569
Detailed Description of Phase 1 570
Detailed Description of Phase 2 570
Detailed Description of Phase 3 572
Detailed Description of Phase 4 572
The Parallel-FIMI-Reservoir Method 572
Experimental Evaluation 573
Evaluation of the Speedup 575
Conclusion and Future Work 576
References 576
GA-TVRC: A Novel Relational Time Varying Classifier to Extract Temporal Information Using Genetic Algorithms 578
Introduction 578
Background 580
Classification Method 580
Genetic Algorithms 581
Related Work 581
Genetic Algorithm Enhanced Time Varying Relational Classifier (GA-TVRC) 582
Training Phase 583
Validation Phase Using Evolutionary Strategies 583
Test Phase 584
Experimental Results 585
Datasets 585
Methodology 587
Results and Analysis 588
Conclusions and Future Work 591
References 591
Aspects of Machine Learning and Data Mining 10
Detection of Communities and Bridges in Weighted Networks 594
Introduction 594
Background and Motivation 595
Methodology 597
Fuzzy Clustering of Weighted Graphs 597
Fuzzy Clustering of Unweighted Graphs 598
Bridgeness Measure 598
Experimental Setup 599
Datasets 599
Edge Weights 599
Label Correspondence 600
Evaluation Metrics 600
Software 602
Results 602
Accuracy 602
Sensitivity Analysis 602
Bridgeness Analysis 604
Conclusion and Future Work 606
References 607
Techniques for Improving Filters in Power Grid Contingency Analysis 609
Introduction 609
Context 610
A Metric for Evaluating Filters 612
Resource-Aware Filter Combination 613
Multi-criteria Optimization 616
Related Work 618
Future Work 619
Conclusion 619
References 620
Author Index 622

已确认勘误

次印刷

页码 勘误内容 提交人 修订印次

    • 名称
    • 类型
    • 大小

    光盘服务联系方式: 020-38250260    客服QQ:4006604884

    意见反馈

    14:15

    关闭

    云图客服:

    尊敬的用户,您好!您有任何提议或者建议都可以在此提出来,我们会谦虚地接受任何意见。

    或者您是想咨询:

    用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问

    Video Player
    ×
    Audio Player
    ×
    pdf Player
    ×
    Current View

    看过该图书的还喜欢

    some pictures

    解忧杂货店

    东野圭吾 (作者), 李盈春 (译者)

    loading icon