Home > Computing and Information Technology > Databases > Data mining > Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
29%
Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

          
5
4
3
2
1

International Edition


Premium quality
Premium quality
Bookswagon upholds the quality by delivering untarnished books. Quality, services and satisfaction are everything for us!
Easy Return
Easy return
Not satisfied with this product! Keep it in original condition and packaging to avail easy return policy.
Certified product
Certified product
First impression is the last impression! Address the book’s certification page, ISBN, publisher’s name, copyright page and print quality.
Secure Checkout
Secure checkout
Security at its finest! Login, browse, purchase and pay, every step is safe and secured.
Money back guarantee
Money-back guarantee:
It’s all about customers! For any kind of bad experience with the product, get your actual amount back after returning the product.
On time delivery
On-time delivery
At your doorstep on time! Get this book delivered without any delay.
Quantity:
Add to Wishlist

About the Book

A hands on guide to web scraping and text mining for both beginners and experienced users of R Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. Provides basic techniques to query web documents and data sets (XPath and regular expressions). An extensive set of exercises are presented to guide the reader through each technique. Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. Case studies are featured throughout along with examples for each technique presented. R code and solutions to exercises featured in the book are provided on a supporting website.

Table of Contents:
Preface xv 1 Introduction 1 1.1 Case study: World Heritage Sites in Danger 1 1.2 Some remarks on web data quality 7 1.3 Technologies for disseminating, extracting, and storing web data 9 1.4 Structure of the book 13 Part One A Primer on Web and Data Technologies 15 2 HTML 17 2.1 Browser presentation and source code 18 2.2 Syntax rules 19 2.3 Tags and attributes 24 2.4 Parsing 32 3 XML and JSON 41 3.1 A short example XML document 42 3.2 XML syntax rules 43 3.3 When is an XML document well formed or valid? 51 3.4 XML extensions and technologies 53 3.5 XML and R in practice 60 3.6 A short example JSON document 68 3.7 JSON syntax rules 69 3.8 JSON and R in practice 71 4 XPath 79 4.1 XPath--a query language for web documents 80 4.2 Identifying node sets with XPath 81 4.3 Extracting node elements 93 5 HTTP 101 5.1 HTTP fundamentals 102 5.2 Advanced features of HTTP 116 5.3 Protocols beyond HTTP 124 5.4 HTTP in action 126 6 AJAX 149 6.1 JavaScript 150 6.2 XHR 154 6.3 Exploring AJAX with Web Developer Tools 158 7 SQL and relational databases 164 7.1 Overview and terminology 165 7.2 Relational Databases 167 7.3 SQL: a language to communicate with Databases 175 7.4 Databases in action 188 8 Regular expressions and essential string functions 196 8.1 Regular expressions 198 8.2 String processing 207 8.3 A word on character encodings 214 Part Two A Practical Toolbox forWeb Scraping and Text Mining 219 9 Scraping the Web 221 9.1 Retrieval scenarios 222 9.2 Extraction strategies 270 9.3 Web scraping: Good practice 278 9.4 Valuable sources of inspiration 290 10 Statistical text processing 295 10.1 The running example: Classifying press releases of the British government 296 10.2 Processing textual data 298 10.3 Supervised learning techniques 307 10.4 Unsupervised learning techniques 313 11 Managing data projects 322 11.1 Interacting with the file system 322 11.2 Processing multiple documents/links 323 11.3 Organizing scraping procedures 328 11.4 Executing R scripts on a regular basis 334 Part Three A Bag of Case Studies 341 12 Collaboration networks in the US Senate 343 12.1 Information on the bills 344 12.2 Information on the senators 350 12.3 Analyzing the network structure 353 12.4 Conclusion 358 13 Parsing information from semistructured documents 359 13.1 Downloading data from the FTP server 360 13.2 Parsing semistructured text data 361 13.3 Visualizing station and temperature data 368 14 Predicting the 2014 Academy Awards using Twitter 371 15 Mapping the geographic distribution of names 380 15.1 Developing a data collection strategy 381 15.2 Website inspection 382 15.3 Data retrieval and information extraction 384 15.4 Mapping names 387 15.5 Automating the process 389 16 Gathering data on mobile phones 396 16.1 Page exploration 396 16.2 Scraping procedure 404 16.3 Graphical analysis 406 16.4 Data storage 408 17 Analyzing sentiments of product reviews 416 17.1 Introduction 416 17.2 Collecting the data 417 17.3 Analyzing the data 426 17.4 Conclusion 434 References 435 General index 442 Package index 448 Function index 449


Best Sellers


Product Details
  • ISBN-13: 9781118834817
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Depth: 32
  • Language: English
  • Returnable: N
  • Sub Title: A Practical Guide to Web Scraping and Text Mining
  • Width: 175 mm
  • ISBN-10: 111883481X
  • Publisher Date: 26 Dec 2014
  • Binding: Hardback
  • Height: 249 mm
  • No of Pages: 480
  • Spine Width: 33 mm
  • Weight: 870 gr


Similar Products

How would you rate your experience shopping for books on Bookswagon?

Add Photo
Add Photo

Customer Reviews

REVIEWS           
Click Here To Be The First to Review this Product
Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
John Wiley & Sons Inc -
Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book
    Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!
    ASK VIDYA