close menu
Bookswagon-24x7 online bookstore
close menu
My Account
33%
Learning Theory from First Principles (Adaptive Computation and Machine Learning series)

Learning Theory from First Principles (Adaptive Computation and Machine Learning series)

          
5
4
3
2
1

Available


Premium quality
Premium quality
Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!
Easy Return
Easy return
Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.
Certified product
Certified product
First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.
Secure Checkout
Secure checkout
Security at its finest! Login, browse, purchase and pay, every step is safe and secured.
Money back guarantee
Money-back guarantee:
It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.
On time delivery
On-time delivery
At your doorstep on time! Get this book delivered without any delay.
Quantity:
Add to Wishlist

About the Book

A comprehensive and cutting-edge introduction to the foundations and modern applications of learning theory. A comprehensive and cutting-edge introduction to the foundations and modern applications of learning theory. Research has exploded in the field of machine learning resulting in complex mathematical arguments that are hard to grasp for new comers. . In this accessible textbook, Francis Bach presents the foundations and latest advances of learning theory for graduate students as well as researchers who want to acquire a basic mathematical understanding of the most widely used machine learning architectures. Taking the position that learning theory does not exist outside of algorithms that can be run in practice, this book focuses on the theoretical analysis of learning algorithms as it relates to their practical performance. Bach provides the simplest formulations that can be derived from first principles, constructing mathematically rigorous results and proofs without overwhelming students. Provides a balanced and unified treatment of most prevalent machine learning methods Emphasizes practical application and features only commonly used algorithmic frameworks Covers modern topics not found in existing texts, such as overparameterized models and structured prediction Integrates coverage of statistical theory, optimization theory, and approximation theory Focuses on adaptivity, allowing distinctions between various learning techniques Hands-on experiments, illustrative examples, and accompanying code link theoretical guarantees to practical behaviors

Table of Contents:
Preface ix
I Preliminaries 1
1 Mathematical preliminaries 3
1.1 Linear algebra and differentiable calculus.................. 3
1.1.1 Minimization of quadratic forms................... 3
1.1.2 Inverting a 2 × 2 matrix........................ 4
1.1.3 Inverting matrices defined by blocks, matrix inversion lemma... 4
1.1.4 Eigenvalue and singular value decomposition............ 6
1.1.5 Differential calculus.......................... 7
1.2 Concentration inequalities........................... 7
1.2.1 Hoeffding’s inequality......................... 9
1.2.2 McDiarmid’s inequality........................ 12
1.2.3 Bernstein’s inequality (_x0007_)....................... 14
1.2.4 Expectation of the maximum..................... 16
1.2.5 Estimation of expectations through quadrature (_x0007_)......... 17
1.2.6 Concentration inequalities for matrices (_x0007__x0007_)............. 18
2 Introduction to supervised learning 21
2.1 From training data to predictions....................... 22
2.2 Decision theory................................. 25
2.2.1 Loss functions............................. 25
2.2.2 Risks................................... 26
2.2.3 Bayes risk and Bayes predictor.................... 27
2.3 Learning from data............................... 30
2.3.1 Local averaging............................. 30
2.3.2 Empirical risk minimization...................... 31
2.4 Statistical learning theory........................... 33
2.4.1 Measures of performance....................... 35
2.4.2 Notions of consistency over classes of problems........... 35
2.5 No free lunch theorems (_x0007_).......................... 36
2.6 Quest for adaptivity.............................. 38
2.7 Beyond supervised learning.......................... 39
2.8 Summary - book outline............................ 40
3 Linear least-squares regression 43
3.1 Introduction................................... 43
3.2 Least-squares framework............................ 44
3.3 Ordinary least-squares (OLS) estimator................... 45
3.3.1 Closed-form solution.......................... 45
3.3.2 Geometric interpretation........................ 46
3.3.3 Numerical resolution.......................... 47
3.4 Statistical analysis of OLS........................... 47
3.5 Fixed design setting.............................. 48
3.5.1 Statistical properties of the OLS estimator............. 50
3.5.2 Experiments.............................. 52
3.6 Ridge least-squares regression......................... 53
3.7 Lower-bound (_x0007_)................................ 57
3.8 Random design analysis............................ 59
3.8.1 Gaussian designs............................ 61
3.8.2 General designs (_x0007__x0007_)......................... 61
3.9 Principal component analysis (_x0007_)....................... 63
3.10 Conclusion................................... 65
II Generalization bounds for learning algorithms 67
4 Empirical risk minimization 69
4.1 Convexification of the risk........................... 70
4.1.1 Convex surrogates........................... 71
4.1.2 Geometric interpretation of the support vector machine (_x0007_).... 72
4.1.3 Conditional Φ-risk and classification calibration (_x0007_)........ 74
4.1.4 Relationship between risk and Φ-risk (_x0007__x0007_).............. 76
4.2 Risk minimization decomposition....................... 80
4.3 Approximation error.............................. 80
4.4 Estimation error................................ 81
4.4.1 Application of McDiarmid’s inequality................ 82
4.4.2 Easy case I: quadratic functions.................... 83
4.4.3 Easy case II: Finite number of models................ 84
4.4.4 Beyond finitely many models through covering numbers (_x0007_)... 84
4.5 Rademacher complexity............................ 86
4.5.1 Symmetrization............................. 87
4.5.2 Lipschitz-continuous losses...................... 89
4.5.3 Ball-constrained linear predictions.................. 91
4.5.4 Putting things together (linear predictions)............. 92
4.5.5 From constrained to regularized estimation (_x0007_)........... 93
4.5.6 Extensions and improvements..................... 96
4.6 Model selection (_x0007_)............................... 98
4.6.1 Structural risk minimization...................... 98
4.6.2 Selection based on validation set................... 99
4.7 Relationship with asymptotic statistics (_x0007_)................. 99
4.8 Summary.................................... 101
5 Optimization for machine learning 103
5.1 Optimization in machine learning....................... 103
5.2 Gradient descent................................ 105
5.2.1 Simplest analysis: ordinary least-squares............... 106
5.2.2 Convex functions and their properties................ 110
5.2.3 Analysis of GD for strongly convex and smooth functions..... 112
5.2.4 Analysis of GD for convex and smooth functions (_x0007_)....... 117
5.2.5 Beyond gradient descent (_x0007_)..................... 120
5.2.6 Non-convex objective functions (_x0007_)................. 122
5.3 Gradient methods on non-smooth problems................. 123
5.4 Convergence rate of stochastic gradient descent (SGD)........... 127
5.4.1 Strongly convex problems (_x0007_).................... 132
5.4.2 Adaptive methods (_x0007_)......................... 134
5.4.3 Bias-variance trade-offs for least-squares (_x0007_)............. 135
5.4.4 Variance reduction (_x0007_)........................ 138
5.5 Conclusion................................... 143
6 Local averaging methods 145
6.1 Introduction................................... 145
6.2 Local averaging methods............................ 147
6.2.1 Linear estimators............................ 147
6.2.2 Partition estimators.......................... 148
6.2.3 Nearest-neighbors........................... 150
6.2.4 Nadaraya-Watson estimator a.k.a. kernel regression (_x0007_)...... 151
6.3 Generic “simplest” consistency analysis................... 153
6.3.1 Fixed partition............................. 155
6.3.2 k-nearest neighbor........................... 158
6.3.3 Kernel regression (Nadaraya-Watson) (_x0007_).............. 160
6.4 Universal consistency (_x0007_)........................... 163
6.5 Adaptivity (_x0007__x0007_)................................ 166
6.6 Conclusion................................... 167
7 Kernel methods 169
7.1 Introduction................................... 170
7.2 Representer theorem.............................. 170
7.3 Kernels..................................... 173
7.3.1 Linear and polynomial kernels.................... 175
7.3.2 Translation-invariant kernels on [0, 1]................. 176
7.3.3 Translation-invariant kernels on Rd.................. 179
7.3.4 Beyond vectorial input spaces (_x0007_).................. 182
7.4 Algorithms................................... 184
7.4.1 Representer theorem.......................... 184
7.4.2 Column sampling............................ 185
7.4.3 Random features............................ 185
7.4.4 Dual algorithms (_x0007_).......................... 186
7.4.5 Stochastic gradient descent (_x0007_).................... 187
7.4.6 “Kernelization” of linear algorithms................. 188
7.5 Generalization guarantees - Lipschitz-continuous losses........... 189
7.5.1 Risk decomposition........................... 190
7.5.2 Approximation error for translation-invariant kernels on Rd.... 191
7.6 Theoretical analysis of ridge regression (_x0007_)................. 194
7.6.1 Kernel ridge regression as a “linear” estimator........... 194
7.6.2 Bias and variance decomposition (_x0007_)................ 195
7.6.3 Relating empirical and population covariance operators...... 198
7.6.4 Analysis for well-specified problems (_x0007_)............... 200
7.6.5 Analysis beyond well-specified problems (_x0007_)............. 201
7.6.6 Balancing bias and variance (_x0007_)................... 202
7.7 Experiments................................... 203
7.8 Conclusion................................... 205
8 Sparse methods 207
8.1 Introduction................................... 207
8.1.1 Dedicated proof technique for constrained least-squares...... 209
8.1.2 Probabilistic and combinatorial lemmas............... 210
8.2 Variable selection by the ℓ0-penalty...................... 212
8.2.1 Assuming k is known.......................... 212
8.2.2 Estimating k (_x0007_)............................ 214
8.3 Variable selection by ℓ1-regularization.................... 216
8.3.1 Intuition and algorithms........................ 217
8.3.2 Slow rates - random design...................... 221
8.3.3 Slow rates - fixed design........................ 221
8.3.4 Fast rates (_x0007_).............................. 224
8.3.5 Zoo of conditions (_x0007__x0007_)......................... 225
8.3.6 Random design (_x0007_)........................... 227
8.4 Experiments................................... 228
8.5 Extensions.................................... 229
8.6 Conclusion................................... 230
9 Neural networks 233
9.1 Introduction................................... 233
9.2 Single hidden layer neural network...................... 235
9.2.1 Optimization.............................. 236
9.2.2 Rectified linear units and homogeneity................ 237
9.2.3 Estimation error............................ 239
9.3 Approximation properties........................... 241
9.3.1 Universal approximation property in one dimension........ 242
9.3.2 Infinitely many neurons and variation norm............. 243
9.3.3 Variation norm in one dimension................... 244
9.3.4 Variation norm in arbitrary dimension................ 248
9.3.5 Precise approximation properties................... 249
9.3.6 From the variation norm to a finite number of neurons (_x0007_).... 250
9.4 Generalization performance for neural networks............... 253
9.5 Relationship with kernel methods (_x0007_).................... 255
9.5.1 From a Banach space F1 to a Hilbert space F2 (_x0007_)......... 255
9.5.2 Kernel function (_x0007__x0007_).......................... 257
9.5.3 Upper-bound on RKHS norm (_x0007__x0007_).................. 258
9.6 Experiments................................... 259
9.7 Extensions.................................... 260
9.8 Conclusion................................... 261
III Special topics 263
10 Ensemble learning 265
10.1 Averaging / bagging.............................. 266
10.1.1 Independent datasets.......................... 266
10.1.2 Bagging................................. 268
10.2 Random projections and averaging...................... 269
10.2.1 Gaussian sketching........................... 271
10.2.2 Random projections.......................... 273
10.3 Boosting..................................... 278
10.3.1 Problem set-up............................. 279
10.3.2 Incremental learning.......................... 281
10.3.3 Matching pursuit............................ 282
10.3.4 Adaboost................................ 283
10.3.5 Greedy algorithm based on gradient boosting............ 284
10.3.6 Convergence of expected risk..................... 287
10.3.7 Experiments.............................. 290
10.4 Conclusion................................... 290
11 From online learning to bandits 291
11.1 First-order online convex optimization.................... 292
11.1.1 Convex case............................... 293
11.1.2 Strongly-convex case (_x0007_)....................... 295
11.1.3 Online mirror descent (_x0007_)....................... 295
11.1.4 Lower bounds (_x0007__x0007_)........................... 297
11.2 Zero-th order convex optimization...................... 299
11.2.1 Smooth stochastic gradient descent.................. 301
11.2.2 Stochastic smoothing (_x0007_)....................... 303
11.2.3 Extensions............................... 307
11.3 Multi-armed bandits.............................. 307
11.3.1 Need for an exploration-exploitation trade-off............ 308
11.3.2 “Explore-then-commit”........................ 308
11.3.3 Optimism in the face of uncertainty (_x0007_)............... 310
11.3.4 Adversarial bandits (_x0007_)........................ 312
11.4 Conclusion................................... 314
12 Over-parameterized models 315
12.1 Implicit bias of gradient descent........................ 316
12.1.1 Least-squares.............................. 316
12.1.2 Separable classification......................... 318
12.1.3 Beyond convex problems (_x0007_)..................... 323
12.2 Double descent................................. 325
12.2.1 The double descent phenomenon................... 325
12.2.2 Empirical evidence........................... 326
12.2.3 Linear regression with Gaussian projections (_x0007_)........... 327
12.3 Global convergence of gradient descent.................... 332
12.3.1 Mean field limits............................ 333
12.3.2 From linear networks to positive definite matrices.......... 337
12.3.3 Global convergence for positive definite matrices.......... 338
12.3.4 Special case of Oja flow........................ 340
12.4 Lazy regime and neural tangent kernels (_x0007_)................. 341
12.5 Conclusion................................... 343
13 Structured prediction 345
13.1 Multi-category classification.......................... 346
13.1.1 Extension of classical convex surrogates............... 346
13.1.2 Generalization bound I: stochastic gradient descent......... 348
13.1.3 Generalization bound II: Rademacher complexities (_x0007_)....... 350
13.2 General set-up and examples......................... 352
13.2.1 Examples................................ 352
13.2.2 Structure encoding loss functions................... 354
13.3 Surrogate methods............................... 356
13.3.1 Score functions and decoding step.................. 356
13.3.2 Fisher consistency and calibration functions............. 357
13.3.3 Main surrogate frameworks...................... 357
13.4 Smooth/quadratic surrogates......................... 358
13.4.1 Quadratic surrogate.......................... 358
13.4.2 Theoretical guarantees......................... 358
13.4.3 Linear estimators and decoding steps................. 359
13.4.4 Smooth surrogates (_x0007_)......................... 360
13.5 Max-margin formulations........................... 362
13.5.1 Structured SVM............................ 362
13.5.2 Max-min formulations (_x0007__x0007_)...................... 363
13.6 Generalization bounds (_x0007_)........................... 365
13.7 Experiments................................... 366
13.7.1 Robust regression............................ 366
13.7.2 Ranking................................. 366
13.8 Conclusion................................... 369
14 Probabilistic methods 371
14.1 From empirical risks to log-likelihoods.................... 371
14.1.1 Conditional likelihoods......................... 373
14.1.2 Classical priors............................. 373
14.1.3 Sparse priors.............................. 374
14.1.4 On the relationship between MAP and MMSE (_x0007_)......... 375
14.2 Discriminative vs. generative models..................... 378
14.2.1 Linear discriminant analysis and softmax regression........ 379
14.2.2 Naive Bayes............................... 379
14.2.3 Maximum likelihood estimations................... 380
14.3 Bayesian inference............................... 381
14.3.1 Computational handling of posterior distributions......... 382
14.3.2 Model selection through marginal likelihood............. 383
14.4 PAC-Bayesian analysis............................. 384
14.4.1 Set-up.................................. 384
14.4.2 Uniformly bounded loss functions................... 385
14.5 Conclusion................................... 387
15 Lower bounds on performance 389
15.1 Statistical lower bounds............................ 390
15.1.1 Minimax lower bounds......................... 390
15.1.2 Reduction to a hypothesis test.................... 391
15.1.3 Review of information theory..................... 393
15.1.4 Lower-bound on hypothesis testing based on information theory. 395
15.1.5 Examples................................ 398
15.1.6 Minimax lower bounds through Bayesian analysis.......... 399
15.2 Optimization lower bounds.......................... 402
15.2.1 Convex optimization.......................... 402
15.2.2 Non-convex optimization (_x0007_)..................... 404
15.3 Lower bounds for stochastic gradient descent (_x0007_).............. 407
15.4 Conclusion................................... 409


Best Seller

| | See All

Product Details
  • ISBN-13: 9780262049443
  • Publisher: MIT Press Ltd
  • Publisher Imprint: MIT Press
  • Height: 178 mm
  • No of Pages: 478
  • Spine Width: 38 mm
  • Width: 114 mm
  • ISBN-10: 0262049449
  • Publisher Date: 24 Dec 2024
  • Binding: Hardback
  • Language: English
  • Returnable: Y
  • Weight: 1082 gr


Similar Products

How would you rate your experience shopping for books on Bookswagon?

Add Photo
Add Photo

Customer Reviews

REVIEWS           
Be The First to Review
Learning Theory from First Principles (Adaptive Computation and Machine Learning series)
MIT Press Ltd -
Learning Theory from First Principles (Adaptive Computation and Machine Learning series)
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Learning Theory from First Principles (Adaptive Computation and Machine Learning series)

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book
    Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    | | See All


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!
    ASK VIDYA