Home > Computing and Information Technology > Computer science > Artificial intelligence > Machine learning > Fundamentals of Robust Machine Learning: Handling Outliers and Anomalies in Data Science

38%

Fundamentals of Robust Machine Learning: Handling Outliers and Anomalies in Data Science (Hardback) | Released: 09 May 2025

By: A. K. Md. Ehsanes Saleh (Author) , Resve A. Saleh (Author) , Sohaib Majzoub (Author) | Publisher: John Wiley & Sons Inc | Publisher Imprint: John Wiley & Sons Inc

Write Reviews

₹6,193 ~~M.R.P. :~~₹9,989~~~~
Save: ₹3796(38%)

Available

Ships within 1-2 Business Days

ISBN-10

1394294379

ISBN-13

9781394294374

Page Number

416

Language

English

Imprint

John Wiley & Sons Inc

Weight (gr)

839

Dimention(mm)

234x28x188

See all details

Premium quality

Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!

Easy Return

Easy return

Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.

Certified product

First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.

Secure Checkout

Secure checkout

Security at its finest! Login, browse, purchase and pay, every step is safe and secured.

Money back guarantee

Money-back guarantee:

It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.

On time delivery

On-time delivery

At your doorstep on time! Get this book delivered without any delay.

Quantity:

Add to Cart Buy Now

Add to Wishlist

About the Book

An essential guide for tackling outliers and anomalies in machine learning and data science. In recent years, machine learning (ML) has transformed virtually every area of research and technology, becoming one of the key tools for data scientists. Robust machine learning is a new approach to handling outliers in datasets, which is an often-overlooked aspect of data science. Ignoring outliers can lead to bad business decisions, wrong medical diagnoses, reaching the wrong conclusions or incorrectly assessing feature importance, just to name a few. Fundamentals of Robust Machine Learning offers a thorough but accessible overview of this subject by focusing on how to properly handle outliers and anomalies in datasets. There are two main approaches described in the book: using outlier-tolerant ML tools, or removing outliers before using conventional tools. Balancing theoretical foundations with practical Python code, it provides all the necessary skills to enhance the accuracy, stability and reliability of ML models. Fundamentals of Robust Machine Learning readers will also find: A blend of robust statistics and machine learning principles Detailed discussion of a wide range of robust machine learning methodologies, from robust clustering, regression and classification, to neural networks and anomaly detection Python code with immediate application to data science problems Fundamentals of Robust Machine Learning is ideal for undergraduate or graduate students in data science, machine learning, and related fields, as well as for professionals in the field looking to enhance their understanding of building models in the presence of outliers.

Table of Contents:
Preface xv About the Companion Website xix 1 Introduction 1 1.1 Defining Outliers 2 1.2 Overview of the Book 3 1.3 What Is Robust Machine Learning? 3 1.3.1 Machine Learning Basics 4 1.3.2 Effect of Outliers 6 1.3.3 What Is Robust Data Science? 7 1.3.4 Noise in Datasets 7 1.3.5 Training and Testing Flows 8 1.4 Robustness of the Median 9 1.4.1 Mean vs. Median 9 1.4.2 Effect on Standard Deviation 10 1.5 l 1 and l 2 Norms 11 1.6 Review of Gaussian Distribution 12 1.7 Unsupervised Learning Case Study 13 1.7.1 Clustering Example 14 1.7.2 Clustering Problem Specification 14 1.8 Creating Synthetic Data for Clustering 16 1.8.1 One-Dimensional Datasets 16 1.8.2 Multidimensional Datasets 17 1.9 Clustering Algorithms 19 1.9.1 k-Means Clustering 19 1.9.2 k-Medians Clustering 21 1.10 Importance of Robust Clustering 22 1.10.1 Clustering with No Outliers 22 1.10.2 Clustering with Outliers 23 1.10.3 Detection and Removal of Outliers 25 1.11 Summary 27 Problems 28 References 34 2 Robust Linear Regression 35 2.1 Introduction 35 2.2 Supervised Learning 35 2.3 Linear Regression 36 2.4 Importance of Residuals 38 2.4.1 Defining Errors and Residuals 38 2.4.2 Residuals in Loss Functions 39 2.4.3 Distribution of Residuals 40 2.5 Estimation Background 42 2.5.1 Linear Models 42 2.5.2 Desirable Properties of Estimators 43 2.5.3 Maximum-Likelihood Estimation 44 2.5.4 Gradient Descent 47 2.6 M-Estimation 49 2.7 Least Squares Estimation (LSE) 52 2.8 Least Absolute Deviation (LAD) 54 2.9 Comparison of LSE and LAD 55 2.9.1 Simple Linear Model 55 2.9.2 Location Problem 56 2.10 Huber’s Method 58 2.10.1 Huber Loss Function 58 2.10.2 Comparison with LSE and LAD 63 2.11 Summary 64 Problems 64 References 67 3 The Log-Cosh Loss Function 69 3.1 Introduction 69 3.2 An Intuitive View of Log-Cosh 69 3.3 Hyperbolic Functions 71 3.4 M-Estimation 71 3.4.1 Asymptotic Behavior 72 3.4.2 Linear Regression Using Log-Cosh 74 3.5 Deriving the Distribution for Log-Cosh 75 3.6 Standard Errors for Robust Estimators 79 3.6.1 Example: Swiss Fertility Dataset 81 3.6.2 Example: Boston Housing Dataset 82 3.7 Statistical Properties of Log-Cosh Loss 83 3.7.1 Maximum-Likelihood Estimation 83 3.8 A General Log-Cosh Loss Function 84 3.9 Summary 88 Problems 88 References 93 4 Outlier Detection, Metrics, and Standardization 95 4.1 Introduction 95 4.2 Effect of Outliers 95 4.3 Outlier Diagnosis 97 4.3.1 Boxplots 98 4.3.2 Histogram Plots 100 4.3.3 Exploratory Data Analysis 101 4.4 Outlier Detection 102 4.4.1 3-Sigma Edit Rule 102 4.4.2 4.5-MAD Edit Rule 104 4.4.3 1.5-IQR Edit Rule 105 4.5 Outlier Removal 105 4.5.1 Trimming Methods 105 4.5.2 Winsorization 105 4.5.3 Anomaly Detection Method 106 4.6 Regression-Based Outlier Detection 107 4.6.1 LS vs. LC Residuals 108 4.6.2 Comparison of Detection Methods 109 4.6.3 Ordered Absolute Residuals (OARs) 110 4.6.4 Quantile–Quantile Plot 111 4.6.5 Quad-Plots for Outlier Diagnosis 113 4.7 Regression-Based Outlier Removal 114 4.7.1 Iterative Boxplot Method 114 4.8 Regression Metrics with Outliers 116 4.8.1 Mean Square Error (MSE) 117 4.8.2 Median Absolute Error (MAE) 118 4.8.3 MSE vs. MAE on Realistic Data 119 4.8.4 Selecting Hyperparameters for Robust Regression 120 4.9 Dataset Standardization 121 4.9.1 Robust Standardization 122 4.10 Summary 126 Problems 126 References 131 5 Robustness of Penalty Estimators 133 5.1 Introduction 133 5.2 Penalty Functions 133 5.2.1 Multicollinearity 133 5.2.2 Penalized Loss Functions 135 5.3 Ridge Penalty 136 5.4 LASSO Penalty 137 5.5 Effect of Penalty Functions 138 5.6 Penalty Functions with Outliers 139 5.7 Ridge Traces 142 5.8 Elastic Net (Enet) Penalty 143 5.9 Adaptive LASSO (aLASSO) Penalty 145 5.10 Penalty Effects on Variance and Bias 146 5.10.1 Effect on Variance 146 5.10.2 Geometric Interpretation of Bias 148 5.11 Variable Importance 151 5.11.1 The t-Statistic 151 5.11.2 LASSO and aLASSO Traces 153 5.12 Summary 155 Problems 156 References 159 6 Robust Regularized Models 161 6.1 Introduction 161 6.2 Overfitting and Underfitting 161 6.3 The Bias–Variance Trade-Off 162 6.4 Regularization with Ridge 164 6.4.1 Selection of Hyperparameter λ 165 6.4.2 Example: Diabetes Dataset 167 6.5 Generalization using Robust Estimators 169 6.5.1 Training and Test Sets 169 6.5.2 k-Fold Cross-validation 171 6.6 Robust Generalization and Regularization 173 6.6.1 Regularization with LC-Ridge 174 6.7 Model Complexity 175 6.7.1 Variable Selection Using LS-LASSO 176 6.7.2 Variable Ordering Using LC-aLASSO 176 6.7.3 Building a Compact Model 179 6.8 Summary 182 Problems 182 References 186 7 Quantile Regression Using Log-Cosh 187 7.1 Introduction 187 7.2 Understanding Quantile Regression 188 7.3 The Crossing Problem 189 7.4 Standard Quantile Loss Function 190 7.5 Smooth Regression Quantiles (SMRQ) 192 7.6 Evaluation of Quantile Methods 195 7.6.1 Qualitative Assessment 196 7.6.2 Quantitative Assessment 198 7.7 Selection of Robustness Coefficient 200 7.8 Maximum-Likelihood Procedure for SMRQ 202 7.9 Standard Error Computation 204 7.10 Summary 206 Problems 207 References 209 8 Robust Binary Classification 211 8.1 Introduction 211 8.2 Binary Classification Problem 212 8.2.1 Why Linear Regression Fails 212 8.2.2 Outliers in Binary Classification 213 8.3 The Cross-Entropy (CE) Loss 215 8.3.1 Deriving the Cross-Entropy Loss 216 8.3.2 Understanding Logistic Regression 218 8.3.3 Gradient Descent 221 8.4 The Log-Cosh (LC) Loss Function 221 8.4.1 General Formulation 223 8.5 Algorithms for Logistic Regression 224 8.6 Example: Motor Trend Cars 226 8.7 Regularization of Logistic Regression 227 8.7.1 Overfitting and Underfitting 228 8.7.2 k-Fold Cross-Validation 229 8.7.3 Penalty Functions 229 8.7.4 Effect of Outliers 230 8.8 Example: Circular Dataset 231 8.9 Outlier Detection 234 8.10 Robustness of Binary Classifiers 235 8.10.1 Support Vector Classifier (SVC) 235 8.10.2 Support Vector Machines (SVMs) 238 8.10.3 k-Nearest Neighbors (k-NN) 241 8.10.4 Decision Trees and Random Forest 243 8.11 Summary 244 Problems 244 Reference 249 9 Neural Networks Using Log-Cosh 251 9.1 Introduction 251 9.2 A Brief History of Neural Networks 251 9.3 Defining Neural Networks 252 9.3.1 Basic Computational Unit 253 9.3.2 Four-Layer Neural Network 254 9.3.3 Activation Functions 255 9.4 Training of Neural Networks 257 9.5 Forward and Backward Propagation 258 9.5.1 Forward Propagation 259 9.5.2 Backward Propagation 260 9.5.3 Log-Cosh Gradients 263 9.6 Cross-entropy and Log-Cosh Algorithms 264 9.7 Example: Circular Dataset 266 9.8 Classification Metrics and Outliers 269 9.8.1 Precision, Recall, F 1 Score 269 9.8.2 Receiver Operating Characteristics (ROCs) 271 9.9 Summary 273 Problems 273 References 280 10 Multi-class Classification and Adam Optimization 281 10.1 Introduction 281 10.2 Multi-class Classification 281 10.2.1 Multi-class Loss Functions 282 10.2.2 Softmax Activation Function 284 10.3 Example: MNIST Dataset 288 10.3.1 Neural Network Architecture 289 10.3.2 Comparing Cross-Entropy with Log-Cosh Losses 289 10.3.3 Outliers in MNIST 291 10.4 Optimization of Neural Networks 291 10.4.1 Momentum 293 10.4.2 rmsprop Approach 294 10.4.3 Optimizer Warm-Up Phase 295 10.4.4 Adam Optimizer 296 10.5 Summary 297 Problems 297 References 302 11 Anomaly Detection and Evaluation Metrics 303 11.1 Introduction 303 11.2 Anomaly Detection Methods 303 11.2.1 k-Nearest Neighbors 304 11.2.2 Dbscan 308 11.2.3 Isolation Forest 311 11.3 Anomaly Detection Using MADmax 316 11.3.1 Robust Standardization 317 11.3.2 k-Medians Clustering 317 11.3.3 Selecting MADmax 319 11.3.4 k-Nearest Neighbors (k-NN) 319 11.3.5 k-Nearest Medians (k-NM) 320 11.4 Qualitative Evaluation Methods 323 11.5 Quantitative Evaluation Methods 326 11.6 Summary 330 Problems 330 Reference 336 12 Case Studies in Data Science 337 12.1 Introduction 337 12.2 Example: Boston Housing Dataset 337 12.2.1 Exploratory Data Analysis 338 12.2.2 Neural Network Architecture 339 12.2.3 Comparison of LSNN and LCNN 342 12.2.4 Predicting Housing Prices 344 12.2.5 RMSE vs. MAE 344 12.2.6 Correlation Coefficients 345 12.3 Example: Titanic Dataset 346 12.3.1 Exploratory Data Analysis 346 12.3.2 LCLR vs. CELR 351 12.3.3 Outlier Detection and Removal 353 12.3.4 Robustness Coefficient for Log-Cosh 355 12.3.5 The Implications of Robustness 356 12.3.6 Ridge and aLASSO 357 12.4 Application to Explainable Artificial Intelligence (XAI) 359 12.4.1 Case Study: Logistic Regression 360 12.4.2 Case Study: Neural Networks 365 12.5 Time Series Example: Climate Change 366 12.5.1 Autoregressive Model 367 12.5.2 Forecasting Using AR(p) 369 12.5.3 Stationary Time Series 371 12.5.4 Moving Average 374 12.5.5 Finding Outliers in Time Series 375 12.6 Summary and Conclusions 376 Problems 376 References 382 Index 383

Best Sellers

See All

32%

Quick View

Deep Work: Rules For Focused Success In A Distracted World Newport, Cal

(4)

₹271 ~~₹399~~

36%

Quick View

My First Library: Boxset of 10 Board Books for Kids

4.1

(8)

₹480 ~~₹750~~

31%

Quick View

Atomic Habits James Clear

4.6

(5)

₹620 ~~₹899~~

Quick View

My First Book of Patterns Pencil Control: Patterns Practice book for kids (Pattern Writing) No Author

4.6

(8)

₹136 ~~₹149~~

24%

Quick View

My First 365 Coloring Book

(4)

₹303 ~~₹399~~

23%

Quick View

Animals Tales From Panchatantra

4.5

(10)

₹269 ~~₹349~~

32%

Quick View

1001 Ultimate Brain Booster Activities for 3 to 6 Years Old Kids

(9)

₹339 ~~₹499~~

25%

Quick View

Mandala

4.5

(6)

₹722 ~~₹963~~

26%

Quick View

Wise and Otherwise: A salute to Life [Paperback] Sudha Murty Sudha Murty

4.5

(6)

₹221 ~~₹299~~

14%

Quick View

Dopamine Detox: A Short Guide to Remove Distractions and Get Your Brain to Do Hard Things Thibaut Meurisse

No Review Yet

₹211 ~~₹245~~

Product Details

ISBN-13: 9781394294374
Publisher: John Wiley & Sons Inc
Publisher Imprint: John Wiley & Sons Inc
Height: 234 mm
No of Pages: 416
Spine Width: 28 mm
Weight: 839 gr

ISBN-10: 1394294379
Publisher Date: 09 May 2025
Binding: Hardback
Language: English
Returnable: Y
Sub Title: Handling Outliers and Anomalies in Data Science
Width: 188 mm

Related Categories

Similar Products

36%

Very poor	Poor	Neutral	Good	Great

Fundamentals of Robust Machine Learning: Handling Outliers and Anomalies in Data Science (Hardback) | Released: 09 May 2025

Premium quality

Easy return

Certified product

Secure checkout

Money-back guarantee:

On-time delivery

Best Sellers

Similar Products

How would you rate your experience shopping for books on Bookswagon?

Thank you for your rating!

Customer Reviews

Fundamentals of Robust Machine Learning: Handling Outliers and Anomalies in Data Science

New Arrivals

Inspired by your browsing history