38%
Spark: Big Data Cluster Computing in Production

Spark: Big Data Cluster Computing in Production

          
5
4
3
2
1

International Edition


Premium quality
Premium quality
Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!
Easy Return
Easy return
Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.
Certified product
Certified product
First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.
Secure Checkout
Secure checkout
Security at its finest! Login, browse, purchase and pay, every step is safe and secured.
Money back guarantee
Money-back guarantee:
It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.
On time delivery
On-time delivery
At your doorstep on time! Get this book delivered without any delay.
Quantity:
Add to Wishlist

About the Book

Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings. Review Spark hardware requirements and estimate cluster size Gain insight from real-world production use cases Tighten security, schedule resources, and fine-tune performance Overcome common problems encountered using Spark in production Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

Table of Contents:
Introduction xix Chapter 1 Finishing Your Spark Job 1 Installation of the Necessary Components 2 Native Installation Using a Spark Standalone Cluster 3 The History of Distributed Computing That Led to Spark 3 Enter the Cloud 4 Understanding Resource Management 5 Using Various Formats for Storage 8 Text Files 10 Sequence Files 11 Avro Files 11 Parquet Files 12 Making Sense of Monitoring and Instrumentation 13 Spark UI 13 Spark Standalone UI 15 Metrics REST API 16 Metrics System 16 External Monitoring Tools 16 Summary 17 Chapter 2 Cluster Management 19 Background 21 Spark Components 24 Driver 25 Workers and Executors 26 Configuration 27 Spark Standalone 30 Architecture 31 Single-Node Setup Scenario 31 Multi-Node Setup 32 YARN 33 Architecture 35 Dynamic Resource Allocation 37 Scenario 39 Mesos 40 Setup 41 Architecture 42 Dynamic Resource Allocation 44 Basic Setup Scenario 44 Comparison 46 Summary 50 Chapter 3 Performance Tuning 53 Spark Execution Model 54 Partitioning 56 Controlling Parallelism 56 Partitioners 58 Shuffling Data 59 Shuffling and Data Partitioning 61 Operators and Shuffl ing 63 Shuffling Is Not That Bad After All 67 Serialization 67 Kryo Registrators 69 Spark Cache 69 Spark SQL Cache 73 Memory Management 73 Garbage Collection 74 Shared Variables 75 Broadcast Variables 76 Accumulators 78 Data Locality 81 Summary 82 Chapter 4 Security 83 Architecture 84 Security Manager 84 Setup Configurations 85 ACL 86 Configuration 86 Job Submission 87 Web UI 88 Network Security 95 Encryption 96 Event logging 101 Kerberos 101 Apache Sentry 102 Summary 102 Chapter 5 Fault Tolerance or Job Execution 105 Lifecycle of a Spark Job 106 Spark Master 107 Spark Driver 109 Spark Worker 111 Job Lifecycle 112 Job Scheduling 112 Scheduling within an Application 113 Scheduling with External Utilities 120 Fault Tolerance 122 Internal and External Fault Tolerance 122 Service Level Agreements (SLAs) 123 Resilient Distributed Datasets (RDDs) 124 Batch versus Streaming 130 Testing Strategies 133 Recommended Confi gurations 139 Summary 142 Chapter 6 Beyond Spark 145 Data Warehousing 146 Spark SQL CLI 147 Thrift JDBC/ODBC Server 147 Hive on Spark 148 Machine Learning 150 DataFrame 150 MLlib and ML 153 Mahout on Spark 158 Hivemall on Spark 160 External Frameworks 161 Spark Package 161 XGBoost 163 spark-jobserver 164 Future Works 166 Integration with the Parameter Server 167 Deep Learning 175 Enterprise Usage 182 Collecting User Activity Log with Spark and Kafka 183 Real-Time Recommendation with Spark 184 Real-Time Categorization of Twitter Bots 186 Summary 186 Index 189


Best Sellers


Product Details
  • ISBN-13: 9781119254010
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Depth: 10
  • Language: English
  • Returnable: N
  • Spine Width: 13 mm
  • Weight: 372 gr
  • ISBN-10: 1119254019
  • Publisher Date: 29 Apr 2016
  • Binding: Paperback
  • Height: 236 mm
  • No of Pages: 216
  • Series Title: English
  • Sub Title: Big Data Cluster Computing in Production
  • Width: 188 mm


Similar Products

How would you rate your experience shopping for books on Bookswagon?

Add Photo
Add Photo

Customer Reviews

REVIEWS           
Click Here To Be The First to Review this Product
Spark: Big Data Cluster Computing in Production
John Wiley & Sons Inc -
Spark: Big Data Cluster Computing in Production
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Spark: Big Data Cluster Computing in Production

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book
    Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!
    ASK VIDYA