Learn the art of efficient web scraping and crawling with Python
If you are a software developer, data scientist, NLP or machine-learning enthusiast or just need to migrate your company's wiki from a legacy platform, then this book is for you. It is perfect for someone , who needs instant access to large amounts of semi-structured data effortlessly.
Scrapy is a python based, open source framework used mainly for web crawling. It provides a code re-use functionality to build and scale robust crawling projects. Scrapy can also be used to extract data using various APIs and to perform real time analytics on the data. It has a very large community and has become one of the most popular web crawler in the past few years.
This book covers the long awaited Scrapy v 1.0 that empowers you to extract useful data from virtually any source with very little effort. It starts off by explaining the fundamentals of Scrapy framework, followed by a through description of how to extract data from any source, clean it up, shape it as per your requirement using Python and 3rd party APIs. Next you will be familiarised with the process of storing the scrapped data in databases as well as search engines and performing real time analytics on them with Spark Streaming. By the end of this book, you would have learned enough to scrap any data for your application with ease.
"synopsis" may belong to another edition of this title.
Dimitrios Kouzis-Loukas has over fifteen years experience as a topnotch software developer. He uses his acquired knowledge and expertise to teach a wide range of audiences how to write great software, as well. He studied and mastered several disciplines, including mathematics, physics, and microelectronics. His thorough understanding of these subjects helped him raise his standards beyond the scope of "pragmatic solutions." He knows that true solutions should be as certain as the laws of physics, as robust as ECC memories, and as universal as mathematics. Dimitrios now develops distributed, low-latency, highly-availability systems using the latest datacenter technologies. He is language agnostic, yet has a slight preference for Python, C++, and Java. A firm believer in open source software and hardware, he hopes that his contributions will benefit individual communities as well as all of humanity.
"About this title" may belong to another edition of this title.
£ 8.43 shipping from U.S.A. to United Kingdom
Destination, rates & speeds£ 22.22 shipping from U.S.A. to United Kingdom
Destination, rates & speedsSeller: ThriftBooks-Dallas, Dallas, TX, U.S.A.
Paperback. Condition: Very Good. No Jacket. May have limited writing in cover pages. Pages are unmarked. ~ ThriftBooks: Read More, Spend Less 1.25. Seller Inventory # G1784399787I4N00
Quantity: 1 available
Seller: WorldofBooks, Goring-By-Sea, WS, United Kingdom
Paperback. Condition: Fine. Seller Inventory # GOR014399419
Quantity: 1 available
Seller: Zoom Books Company, Lynden, WA, U.S.A.
Condition: very_good. Book is in very good condition and may include minimal underlining highlighting. The book can also include "From the library of" labels. May not contain miscellaneous items toys, dvds, etc. . We offer 100% money back guarantee and 24 7 customer service. Seller Inventory # ZBV.1784399787.VG
Quantity: 1 available
Seller: Half Price Books Inc., Dallas, TX, U.S.A.
paperback. Condition: Very Good. Connecting readers with great books since 1972! Used books may not include companion materials, and may have some shelf wear or limited writing. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_438448849
Quantity: 1 available
Seller: HPB-Red, Dallas, TX, U.S.A.
paperback. Condition: Good. Connecting readers with great books since 1972! Used textbooks may not include companion materials such as access codes, etc. May have some wear or writing/highlighting. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_368858042
Quantity: 1 available
Seller: Toscana Books, AUSTIN, TX, U.S.A.
Paperback. Condition: new. Excellent Condition.Excels in customer satisfaction, prompt replies, and quality checks. Seller Inventory # Scanned1784399787
Quantity: 1 available