close menu
Bookswagon-24x7 online bookstore
close menu
My Account
27%
Professional CUDA C Programming: (English)

Professional CUDA C Programming: (English)

4.5       |  6 Reviews 
5
4
3
2
1

In Stock


Premium quality
Premium quality
Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!
Easy Return
Easy return
Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.
Certified product
Certified product
First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.
Secure Checkout
Secure checkout
Security at its finest! Login, browse, purchase and pay, every step is safe and secured.
Money back guarantee
Money-back guarantee:
It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.
On time delivery
On-time delivery
At your doorstep on time! Get this book delivered without any delay.
Quantity:
Add to Wishlist

About the Book

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide

Designed for professionals across multiple industrial sectors, Professional CUDA C Programming  presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the "hard" and "soft" aspects of GPU programming.

Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:

  • CUDA Programming Model
  • GPU Execution Model
  • GPU Memory model
  • Streams, Event and Concurrency
  • Multi-GPU Programming
  • CUDA Domain-Specific Libraries
  • Profiling and Performance Tuning

The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.



Table of Contents:

Foreword xvii

Preface xix

Introduction xxi

Chapter 1: Heterogeneous Parallel Computing with CUDA 1

Parallel Computing 2

Sequential and Parallel Programming 3

Parallelism 4

Computer Architecture 6

Heterogeneous Computing 8

Heterogeneous Architecture 9

Paradigm of Heterogeneous Computing 12

CUDA: A Platform for Heterogeneous Computing 14

Hello World from GPU 17

Is CUDA C Programming Difficult? 20

Summary 21

Chapter 2: CUDA Programming Model 23

Introducing the CUDA Programming Model 23

CUDA Programming Structure 25

Managing Memory 26

Organizing Threads 30

Launching a CUDA Kernel 36

Writing Your Kernel 37

Verifying Your Kernel 39

Handling Errors 40

Compiling and Executing 40

Timing Your Kernel 43

Timing with CPU Timer 44

Timing with nvprof 47

Organizing Parallel Threads 49

Indexing Matrices with Blocks and Threads 49

Summing Matrices with a 2D Grid and 2D Blocks 53

Summing Matrices with a 1D Grid and 1D Blocks 57

Summing Matrices with a 2D Grid and 1D Blocks 58

Managing Devices 60

Using the Runtime API to Query GPU Information 61

Determining the Best GPU 63

Using nvidia-smi to Query GPU Information 63

Setting Devices at Runtime 64

Summary 65

Chapter 3: CUDA Execution Model 67

Introducing the CUDA Execution Model 67

GPU Architecture Overview 68

The Fermi Architecture 71

The Kepler Architecture 73

Profile-Driven Optimization 78

Understanding the Nature of Warp Execution 80

Warps and Thread Blocks 80

Warp Divergence 82

Resource Partitioning 87

Latency Hiding 90

Occupancy 93

Synchronization 97

Scalability 98

Exposing Parallelism 98

Checking Active Warps with nvprof 100

Checking Memory Operations with nvprof 100

Exposing More Parallelism 101

Avoiding Branch Divergence 104

The Parallel Reduction Problem 104

Divergence in Parallel Reduction 106

Improving Divergence in Parallel Reduction 110

Reducing with Interleaved Pairs 112

Unrolling Loops 114

Reducing with Unrolling 115

Reducing with Unrolled Warps 117

Reducing with Complete Unrolling 119

Reducing with Template Functions 120

Dynamic Parallelism 122

Nested Execution 123

Nested Hello World on the GPU 124

Nested Reduction 128

Summary 132

Chapter 4: Global Memory 135

Introducing the CUDA Memory Model 136

Benefits of a Memory Hierarchy 136

CUDA Memory Model 137

Memory Management 145

Memory Allocation and Deallocation 146

Memory Transfer 146

Pinned Memory 148

Zero-Copy Memory 150

Unified Virtual Addressing 156

Unified Memory 157

Memory Access Patterns 158

Aligned and Coalesced Access 158

Global Memory Reads 160

Global Memory Writes 169

Array of Structures versus Structure of Arrays 171

Performance Tuning 176

What Bandwidth Can a Kernel Achieve? 179

Memory Bandwidth 179

Matrix Transpose Problem 180

Matrix Addition with Unified Memory 195

Summary 199

Chapter 5: Shared Memory and Constant Memory 203

Introducing CUDA Shared Memory 204

Shared Memory 204

Shared Memory Allocation 206

Shared Memory Banks and Access Mode 206

Configuring the Amount of Shared Memory 212

Synchronization 214

Checking the Data Layout of Shared Memory 216

Square Shared Memory 217

Rectangular Shared Memory 225

Reducing Global Memory Access 232

Parallel Reduction with Shared Memory 232

Parallel Reduction with Unrolling 236

Parallel Reduction with Dynamic Shared Memory 238

Effective Bandwidth 239

Coalescing Global Memory Accesses 239

Baseline Transpose Kernel 240

Matrix Transpose with Shared Memory 241

Matrix Transpose with Padded Shared Memory 245

Matrix Transpose with Unrolling 246

Exposing More Parallelism 249

Constant Memory 250

Implementing a 1D Stencil with Constant Memory 250

Comparing with the Read-Only Cache 253

The Warp Shuffle Instruction 255

Variants of the Warp Shuffle Instruction 256

Sharing Data within a Warp 258

Parallel Reduction Using the Warp Shuffle Instruction 262

Summary 264

Chapter 6: Streams and Concurrency 267

Introducing Streams and Events 268

CUDA Streams 269

Stream Scheduling 271

Stream Priorities 273

CUDA Events 273

Stream Synchronization 275

Concurrent Kernel Execution 279

Concurrent Kernels in Non-NULL Streams 279

False Dependencies on Fermi GPUs 281

Dispatching Operations with OpenMP 283

Adjusting Stream Behavior Using Environment Variables 284

Concurrency-Limiting GPU Resources 286

Blocking Behavior of the Default Stream 287

Creating Inter-Stream Dependencies 288

Overlapping Kernel Execution and Data Transfer 289

Overlap Using Depth-First Scheduling 289

Overlap Using Breadth-First Scheduling 293

Overlapping GPU and CPU Execution 294

Stream Callbacks 295

Summary 297

Chapter 7: Tuning Instruction-Level Primitives 299

Introducing CUDA Instructions 300

Floating-Point Instructions 301

Intrinsic and Standard Functions 303

Atomic Instructions 304

Optimizing Instructions for Your Application 306

Single-Precision vs. Double-Precision 306

Standard vs. Intrinsic Functions 309

Understanding Atomic Instructions 315

Bringing It All Together 322

Summary 324

Chapter 8: GPU-Accelerated CUDA Libraries and OpenACC 327

Introducing the CUDA Libraries 328

Supported Domains for CUDA Libraries 329

A Common Library Workflow 330

The CUSPARSE Library 332

cuSPARSE Data Storage Formats 333

Formatting Conversion with cuSPARSE 337

Demonstrating cuSPARSE 338

Important Topics in cuSPARSE Development 340

cuSPARSE Summary 341

The cuBLAS Library 341

Managing cuBLAS Data 342

Demonstrating cuBLAS 343

Important Topics in cuBLAS Development 345

cuBLAS Summary 346

The cuFFT Library 346

Using the cuFFT API 347

Demonstrating cuFFT 348

cuFFT Summary 349

The cuRAND Library 349

Choosing Pseudo- or Quasi- Random Numbers 349

Overview of the cuRAND Library 350

Demonstrating cuRAND 354

Important Topics in cuRAND Development 357

CUDA Library Features Introduced in CUDA 6 358

Drop-In CUDA Libraries 358

Multi-GPU Libraries 359

A Survey of CUDA Library Performance 361

cuSPARSE versus MKL 361

cuBLAS versus MKL BLAS 362

cuFFT versus FFTW versus MKL 363

CUDA Library Performance Summary 364

Using OpenACC 365

Using OpenACC Compute Directives 367

Using OpenACC Data Directives 375

The OpenACC Runtime API 380

Combining OpenACC and the CUDA Libraries 382

Summary of OpenACC 384

Summary 384

Chapter 9: Multi-GPU Programming 387

Moving to Multiple GPUs 388

Executing on Multiple GPUs 389

Peer-to-Peer Communication 391

Synchronizing across Multi-GPUs 392

Subdividing Computation across Multiple GPUs 393

Allocating Memory on Multiple Devices 393

Distributing Work from a Single Host Thread 394

Compiling and Executing 395

Peer-to-Peer Communication on Multiple GPUs 396

Enabling Peer-to-Peer Access 396

Peer-to-Peer Memory Copy 396

Peer-to-Peer Memory Access with Unified Virtual Addressing 398

Finite Difference on Multi-GPU 400

Stencil Calculation for 2D Wave Equation 400

Typical Patterns for Multi-GPU Programs 401

2D Stencil Computation with Multiple GPUs 403

Overlapping Computation and Communication 405

Compiling and Executing 406

Scaling Applications across GPU Clusters 409

CPU-to-CPU Data Transfer 410

GPU-to-GPU Data Transfer Using Traditional MPI 413

GPU-to-GPU Data Transfer with CUDA-aware MPI 416

Intra-Node GPU-to-GPU Data Transfer with CUDA-Aware MPI 417

Adjusting Message Chunk Size 418

GPU to GPU Data Transfer with GPUDirect RDMA 419

Summary 422

Chapter 10: Implementation Considerations 425

The CUDA C Development Process 426

APOD Development Cycle 426

Optimization Opportunities 429

CUDA Code Compilation 432

CUDA Error Handling 437

Profile-Driven Optimization 438

Finding Optimization Opportunities Using nvprof 439

Guiding Optimization Using nvvp 443

NVIDIA Tools Extension 446

CUDA Debugging 448

Kernel Debugging 448

Memory Debugging 456

Debugging Summary 462

A Case Study in Porting C Programs to CUDA C 462

Assessing crypt 463

Parallelizing crypt 464

Optimizing crypt 465

Deploying Crypt 472

Summary of Porting crypt 475

Summary 476

Appendix: Suggested Readings 477

Index 481


Best Seller

| | See All

Product Details
  • ISBN-13: 9781118739327
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: Wrox Press
  • Depth: 38
  • Height: 236 mm
  • No of Pages: 528
  • Series Title: English
  • Weight: 998 gr
  • ISBN-10: 1118739329
  • Publisher Date: 07 Oct 2014
  • Binding: Paperback
  • Edition: PAP/PSC
  • Language: English
  • Returnable: N
  • Spine Width: 36 mm
  • Width: 191 mm


Similar Products

How would you rate your experience shopping for books on Bookswagon?

Add Photo
Add Photo

Customer Reviews

4.5       |  6 Reviews 
out of (%) reviewers recommend this product
Top Reviews
Rating Snapshot
Select a row below to filter reviews.
5
4
3
2
1
Average Customer Ratings
4.5       |  6 Reviews 
00 of 0 Reviews
Sort by :
Active Filters

00 of 0 Reviews
SEARCH RESULTS
1–2 of 2 Reviews
    BoxerLover2 - 5 Days ago
    A Thrilling But Totally Believable Murder Mystery

    Read this in one evening. I had planned to do other things with my day, but it was impossible to put down. Every time I tried, I was drawn back to it in less than 5 minutes. I sobbed my eyes out the entire last 100 pages. Highly recommend!

    BoxerLover2 - 5 Days ago
    A Thrilling But Totally Believable Murder Mystery

    Read this in one evening. I had planned to do other things with my day, but it was impossible to put down. Every time I tried, I was drawn back to it in less than 5 minutes. I sobbed my eyes out the entire last 100 pages. Highly recommend!


Sample text
Photo of
    Media Viewer

    Sample text
    Reviews
    Reader Type:
    BoxerLover2
    00 of 0 review

    Your review was submitted!
    Professional CUDA C Programming: (English)
    John Wiley & Sons Inc -
    Professional CUDA C Programming: (English)
    Writing guidlines
    We want to publish your review, so please:
    • keep your review on the product. Review's that defame author's character will be rejected.
    • Keep your review focused on the product.
    • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
    • Refrain from mentioning competitors or the specific price you paid for the product.
    • Do not include any personally identifiable information, such as full names.

    Professional CUDA C Programming: (English)

    Required fields are marked with *

    Review Title*
    Review
      Add Photo Add up to 6 photos
      Would you recommend this product to a friend?
      Tag this Book
      Read more
      Does your review contain spoilers?
      What type of reader best describes you?
      I agree to the terms & conditions
      You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

      CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

      These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


      By submitting any content to Bookswagon, you guarantee that:
      • You are the sole author and owner of the intellectual property rights in the content;
      • All "moral rights" that you may have in such content have been voluntarily waived by you;
      • All content that you post is accurate;
      • You are at least 13 years old;
      • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
      You further agree that you may not submit any content:
      • That is known by you to be false, inaccurate or misleading;
      • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
      • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
      • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
      • For which you were compensated or granted any consideration by any unapproved third party;
      • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
      • That contains any computer viruses, worms or other potentially damaging computer programs or files.
      You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


      For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


      All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

      Accept

      New Arrivals

      | | See All


      Inspired by your browsing history


      Your review has been submitted!

      You've already reviewed this product!
      ASK VIDYA