38%
The Data Science Handbook

The Data Science Handbook

          
5
4
3
2
1

Available


Premium quality
Premium quality
Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!
Easy Return
Easy return
Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.
Certified product
Certified product
First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.
Secure Checkout
Secure checkout
Security at its finest! Login, browse, purchase and pay, every step is safe and secured.
Money back guarantee
Money-back guarantee:
It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.
On time delivery
On-time delivery
At your doorstep on time! Get this book delivered without any delay.
Quantity:
Add to Wishlist

About the Book

Practical, accessible guide to becoming a data scientist, updated to include the latest advances in data science and related fields. Becoming a data scientist is hard. The job focuses on mathematical tools, but also demands fluency with software engineering, understanding of a business situation, and deep understanding of the data itself. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. The focus of The Data Science Handbook is on practical applications and the ability to solve real problems, rather than theoretical formalisms that are rarely needed in practice. Among its key points are: An emphasis on software engineering and coding skills, which play a significant role in most real data science problems. Extensive sample code, detailed discussions of important libraries, and a solid grounding in core concepts from computer science (computer architecture, runtime complexity, and programming paradigms). A broad overview of important mathematical tools, including classical techniques in statistics, stochastic modeling, regression, numerical optimization, and more. Extensive tips about the practical realities of working as a data scientist, including understanding related jobs functions, project life cycles, and the varying roles of data science in an organization. Exactly the right amount of theory. A solid conceptual foundation is required for fitting the right model to a business problem, understanding a tool’s limitations, and reasoning about discoveries. Data science is a quickly evolving field, and this 2nd edition has been updated to reflect the latest developments, including the revolution in AI that has come from Large Language Models and the growth of ML Engineering as its own discipline. Much of data science has become a skillset that anybody can have, making this book not only for aspiring data scientists, but also for professionals in other fields who want to use analytics as a force multiplier in their organization.

Table of Contents:
Preface to the First Edition xvii Preface to the Second Edition xix 1 Introduction 1 1.1 What Data Science Is and Isn’t 2 1.2 This Book’s Slogan: Simple Models Are Easier to Work With 3 1.3 How Is This Book Organized? 4 1.4 How to Use This Book? 4 1.5 Why Is It All in Python, Anyway? 4 1.6 Example Code and Datasets 5 1.7 Parting Words 5 Part I The Stuff You’ll Always Use 7 2 The Data Science Road Map 9 2.1 Frame the Problem 10 2.2 Understand the Data: Basic Questions 11 2.3 Understand the Data: Data Wrangling 12 2.4 Understand the Data: Exploratory Analysis 12 2.5 Extract Features 13 2.6 Model 14 2.7 Present Results 14 2.8 Deploy Code 14 2.9 Iterating 15 2.10 Glossary 15 3 Programming Languages 17 3.1 Why Use a Programming Language? What Are the Other Options? 17 3.2 A Survey of Programming Languages for Data Science 18 3.3 Where to Write Code 20 3.4 Python Overview and Example Scripts 21 3.5 Python Data Types 25 3.6 GOTCHA: Hashable and Unhashable Types 30 3.7 Functions and Control Structures 31 3.8 Other Parts of Python 33 3.9 Python’s Technical Libraries 35 3.10 Other Python Resources 39 3.11 Further Reading 39 3.12 Glossary 40 3a Interlude: My Personal Toolkit 41 4 Data Munging: String Manipulation, Regular Expressions, and Data Cleaning 43 4.1 The Worst Dataset in the World 43 4.2 How to Identify Pathologies 44 4.3 Problems with Data Content 44 4.4 Formatting Issues 46 4.5 Example Formatting Script 49 4.6 Regular Expressions 50 4.7 Life in the Trenches 53 4.8 Glossary 54 5 Visualizations and Simple Metrics 55 5.1 A Note on Python’s Visualization Tools 56 5.2 Example Code 56 5.3 Pie Charts 56 5.4 Bar Charts 58 5.5 Histograms 59 5.6 Means, Standard Deviations, Medians, and Quantiles 61 5.7 Boxplots 62 5.8 Scatterplots 64 5.9 Scatterplots with Logarithmic Axes 65 5.10 Scatter Matrices 67 5.11 Heatmaps 68 5.12 Correlations 69 5.13 Anscombe’s Quartet and the Limits of Numbers 71 5.14 Time Series 72 5.15 Further Reading 75 5.16 Glossary 75 6 Overview: Machine Learning and Artificial Intelligence 77 6.1 Historical Context 77 6.2 The Central Paradigm: Learning a Function from Example 78 6.3 Machine Learning Data: Vectors and Feature Extraction 79 6.4 Supervised, Unsupervised, and In-Between 79 6.5 Training Data, Testing Data, and the Great Boogeyman of Overfitting 80 6.6 Reinforcement Learning 81 6.7 ML Models as Building Blocks for AI Systems 82 6.8 ML Engineering as a New Job Role 82 6.9 Further Reading 83 6.10 Glossary 83 7 Interlude: Feature Extraction Ideas 85 7.1 Standard Features 85 7.2 Features that Involve Grouping 86 7.3 Preview of More Sophisticated Features 86 7.4 You Get What You Measure: Defining the Target Variable 87 8 Machine-Learning Classification 89 8.1 What Is a Classifier, and What Can You Do with It? 89 8.2 A Few Practical Concerns 90 8.3 Binary Versus Multiclass 90 8.4 Example Script 91 8.5 Specific Classifiers 92 8.6 Evaluating Classifiers 102 8.7 Selecting Classification Cutoffs 105 8.8 Further Reading 106 8.9 Glossary 106 9 Technical Communication and Documentation 109 9.1 Several Guiding Principles 109 9.2 Slide Decks 112 9.3 Written Reports 114 9.4 Speaking: What Has Worked for Me 115 9.5 Code Documentation 117 9.6 Further Reading 117 9.7 Glossary 117 Part II Stuff You Still Need to Know 119 10 Unsupervised Learning: Clustering and Dimensionality Reduction 121 10.1 The Curse of Dimensionality 121 10.2 Example: Eigenfaces for Dimensionality Reduction 123 10.3 Principal Component Analysis and Factor Analysis 125 10.4 Skree Plots and Understanding Dimensionality 127 10.5 Factor Analysis 127 10.6 Limitations of PCA 128 10.7 Clustering 128 10.8 Further Reading 133 10.9 Glossary 134 11 Regression 135 11.1 Example: Predicting Diabetes Progression 136 11.2 Fitting a Line with Least Squares 137 11.3 Alternatives to Least Squares 139 11.4 Fitting Nonlinear Curves 139 11.5 Goodness of Fit: R 2 and Correlation 141 11.6 Correlation of Residuals 142 11.7 Linear Regression 142 11.8 LASSO Regression and Feature Selection 144 11.9 Further Reading 145 11.10 Glossary 145 12 Data Encodings and File Formats 147 12.1 Typical File Format Categories 147 12.2 CSV Files 149 12.3 JSON Files 150 12.4 XML Files 151 12.5 HTML Files 153 12.6 Tar Files 154 12.7 GZip Files 155 12.8 Zip Files 155 12.9 Image Files: Rasterized, Vectorized, and/or Compressed 156 12.10 It’s All Bytes at the End of the Day 157 12.11 Integers 158 12.12 Floats 158 12.13 Text Data 159 12.14 Further Reading 161 12.15 Glossary 161 13 Big Data 163 13.1 What Is Big Data? 163 13.2 When to Use – And not Use – Big Data 164 13.3 Hadoop: The File System and the Processor 165 13.4 Example PySpark Script 165 13.5 Spark Overview 166 13.6 Spark Operations 168 13.7 PySpark Data Frames 169 13.8 Two Ways to Run PySpark 170 13.9 Configuring Spark 170 13.10 Under the Hood 172 13.11 Spark Tips and Gotchas 172 13.12 The MapReduce Paradigm 173 13.13 Performance Considerations 174 13.14 Further Reading 175 13.15 Glossary 176 14 Databases 177 14.1 Relational Databases and MySQL® 178 14.2 Key–Value Stores 183 14.3 Wide-Column Stores 183 14.4 Document Stores 184 14.5 Further Reading 186 14.6 Glossary 186 15 Software Engineering Best Practices 187 15.1 Coding Style 187 15.2 Version Control and Git for Data Scientists 189 15.3 Testing Code 191 15.4 Test-Driven Development 193 15.5 AGILE Methodology 194 15.6 Further Reading 194 15.7 Glossary 194 16 Traditional Natural Language Processing 197 16.1 Do I Even Need NLP? 197 16.2 The Great Divide: Language Versus Statistics 198 16.3 Example: Sentiment Analysis on Stock Market Articles 198 16.4 Software and Datasets 200 16.5 Tokenization 201 16.6 Central Concept: Bag-of-Words 201 16.7 Word Weighting: TF-IDF 202 16.8 n-Grams 202 16.9 Stop Words 203 16.10 Lemmatization and Stemming 203 16.11 Synonyms 204 16.12 Part of Speech Tagging 204 16.13 Common Problems 204 16.14 Advanced Linguistic NLP: Syntax Trees, Knowledge, and Understanding 206 16.15 Further Reading 207 16.16 Glossary 207 17 Time Series Analysis 209 17.1 Example: Predicting Wikipedia Page Views 210 17.2 A Typical Workflow 213 17.3 Time Series Versus Time-Stamped Events 213 17.4 Resampling and Interpolation 214 17.5 Smoothing Signals 216 17.6 Logarithms and Other Transformations 217 17.7 Trends and Periodicity 217 17.8 Windowing 217 17.9 Brainstorming Simple Features 218 17.10 Better Features: Time Series as Vectors 219 17.11 Fourier Analysis: Sometimes a Magic Bullet 220 17.12 Time Series in Context: The Whole Suite of Features 222 17.13 Further Reading 222 17.14 Glossary 222 18 Probability 225 18.1 Flipping Coins: Bernoulli Random Variables 225 18.2 Throwing Darts: Uniform Random Variables 226 18.3 The Uniform Distribution and Pseudorandom Numbers 227 18.4 Nondiscrete, Noncontinuous Random Variables 228 18.5 Notation, Expectations, and Standard Deviation 230 18.6 Dependence, Marginal, and Conditional Probability 231 18.7 Understanding the Tails 232 18.8 Binomial Distribution 234 18.9 Poisson Distribution 234 18.10 Normal Distribution 235 18.11 Multivariate Gaussian 236 18.12 Exponential Distribution 237 18.13 Log-Normal Distribution 238 18.14 Entropy 238 18.15 Further Reading 240 18.16 Glossary 240 19 Statistics 243 19.1 Statistics in Perspective 243 19.2 Bayesian Versus Frequentist: Practical Tradeoffs and Differing Philosophies 244 19.3 Hypothesis Testing: Key Idea and Example 245 19.4 Multiple Hypothesis Testing 246 19.5 Parameter Estimation 247 19.6 Hypothesis Testing: t-Test 248 19.7 Confidence Intervals 250 19.8 Bayesian Statistics 252 19.9 Naive Bayesian Statistics 253 19.10 Bayesian Networks 253 19.11 Choosing Priors: Maximum Entropy or Domain Knowledge 254 19.12 Further Reading 255 19.13 Glossary 255 20 Programming Language Concepts 257 20.1 Programming Paradigms 257 20.2 Compilation and Interpretation 264 20.3 Type Systems 266 20.4 Further Reading 267 20.5 Glossary 267 21 Performance and Computer Memory 269 21.1 A Word of Caution 269 21.2 Example Script 270 21.3 Algorithm Performance and Big-O Notation 272 21.4 Some Classic Problems: Sorting a List and Binary Search 273 21.5 Amortized Performance and Average Performance 276 21.6 Two Principles: Reducing Overhead and Managing Memory 277 21.7 Performance Tip: Use Numerical Libraries When Applicable 278 21.8 Performance Tip: Delete Large Structures You Don’t Need 280 21.9 Performance Tip: Use Built-In Functions When Possible 280 21.10 Performance Tip: Avoid Superfluous Function Calls 280 21.11 Performance Tip: Avoid Creating Large New Objects 281 21.12 Further Reading 281 21.13 Glossary 281 Part III Specialized or Advanced Topics 283 22 Computer Memory and Data Structures 285 22.1 Virtual Memory, the Stack, and the Heap 285 22.2 Example C Program 286 22.3 Data Types and Arrays in Memory 286 22.4 Structs 287 22.5 Pointers, the Stack, and the Heap 288 22.6 Key Data Structures 292 22.7 Further Reading 297 22.8 Glossary 297 23 Maximum-Likelihood Estimation and Optimization 299 23.1 Maximum-Likelihood Estimation 299 23.2 A Simple Example: Fitting a Line 300 23.3 Another Example: Logistic Regression 301 23.4 Optimization 302 23.5 Gradient Descent 303 23.6 Convex Optimization 306 23.7 Stochastic Gradient Descent 307 23.8 Further Reading 308 23.9 Glossary 308 24 Deep Learning and AI 309 24.1 A Note on Libraries and Hardware 310 24.2 A Note on Training Data 310 24.3 Simple Deep Learning: Perceptrons 311 24.4 What Is a Tensor? 314 24.5 Convolutional Neural Networks 315 24.6 Example: The MNIST Handwriting Dataset 317 24.7 Autoencoders and Latent Vectors 318 24.8 Generative AI and GANs 321 24.9 Diffusion Models 323 24.10 RNNs, Hidden State, and the Encoder–Decoder 324 24.11 Attention and Transformers 325 24.12 Stable Diffusion: Bringing the Parts Together 326 24.13 Large Language Models and Prompt Engineering 327 24.14 Further Reading 328 24.15 Glossary 329 25 Stochastic Modeling 331 25.1 Markov Chains 331 25.2 Two Kinds of Markov Chain, Two Kinds of Questions 333 25.3 Hidden Markov Models and the Viterbi Algorithm 334 25.4 The Viterbi Algorithm 336 25.5 Random Walks 337 25.6 Brownian Motion 338 25.7 ARIMA Models 339 25.8 Continuous-Time Markov Processes 339 25.9 Poisson Processes 340 25.10 Further Reading 341 25.11 Glossary 341 26 Parting Words: Your Future as a Data Scientist 343 Index 345


Best Sellers


Product Details
  • ISBN-13: 9781394234493
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Height: 257 mm
  • No of Pages: 368
  • Spine Width: 31 mm
  • Width: 185 mm
  • ISBN-10: 139423449X
  • Publisher Date: 28 Oct 2024
  • Binding: Hardback
  • Language: English
  • Returnable: Y
  • Weight: 771 gr


Similar Products

How would you rate your experience shopping for books on Bookswagon?

Add Photo
Add Photo

Customer Reviews

REVIEWS           
Click Here To Be The First to Review this Product
The Data Science Handbook
John Wiley & Sons Inc -
The Data Science Handbook
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

The Data Science Handbook

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book
    Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!
    ASK VIDYA