Full description not available
A**R
Five Stars
Great resource!
K**H
At least, you can practice with a real example website.
As reviewed, some examples of the first chapter (e.g. 'sitemap_crawler') doesn't work, and subsidiary bitbucket repository doesn't function. It's a shame.But, almost every other examples are working well( (I'm starting chapter 4 now). I think this book has very good resources for anyone interested in web scraping. At least, you can practice with a real example website.The O'reilly web scraping book and this one can be complementary. No book guides perfectly.Notice: The example webpage addresses changed a little. Therefore you should change the addresses in your code, too.
K**A
Four Stars
Poor quality writing - Not worth the money. It is better to use BeautifulSoup package in Python
J**E
I had to wrote compiled programs and any change was boring. Now with Python
If you need to collect (on a regular basis) data on a (or more) website(s), this book is for you. It explains how to load, extract, transform and store data in a convenient (and respectful) way. You will learn and practice this with Python. Prior to this, I had to wrote compiled programs and any change was boring. Now with Python, I can easily change my scripts without having to rebuild each program. The chapters about CAPTCHA and Scrapy were very useful for me... but don't tell it : I like to think I can do things that others can't :-)
P**S
Won't work for Python 3
This would be a good book if only it was updated to Python 3. As it is it's dated, most examples won't work at all (unless you fall back to Python 2). Some code can be tweaked to work with modern Python but at times like in chapter 6 the damage renders it useless. Overall a good book but for a previous age.
A**R
Whirlwind Tour of Web Scraping Employing Python
The background requirement for the book would be a basic understanding of websites and python.Specifically about websites one should know that content is laid out in HTML and JavaScript allows dynamic things that may alter HTML. Also useful but not required, how requests are sent back and forth from a user client to a web server.Specifically about python one should know how to load and use functions from a module and how to install new modules.The book gives a world wind tour of the process and existing python tools to enable: web crawling (how to grab html content from a server) andweb scraping (how to extract actual data from the content)The first two chapters introduce the basic web scraping scenario and the later chapters introduce complications to the process and tools/approaches to handle them.The book provides a great high level idea of web scraping and provides a self contained python starter kit to get up and running. It’s a short read at 175 pages with very accessible content with links to get more detailed documentation. The site also offers a practice website to try out scraping techniques.I came to the book with previous exposure to various python web scraping tools that I pieced together from the web (from tutorials, blogs, stackoverflow). I really enjoyed the self contained nature of the whole web scraping process that the book provides. This book would have saved me a LOT of time and pain had it been available when I got started with web scraping. Having said that, I still learned a few things – specifically the CAPTCHA tools and some basic utilities that I can employ in my day to day python.I recommend this book for anyone who is just starting out with web scraping or is already familiar with scraping but wants to learn how to employ python to the cause. Once you read the book, there is significant opportunity to explore the individual tools further.
T**S
Fantastic resource
Hands down the best resource I've found for practical examples of how to write web scrapers in Python. The author's style is very easy to read and very practical focused. He also clearly knows the subject inside and out and does a great job of not only showing you actual working code to do everything but also covers multiple approaches for different situations as well as key pitfalls to avoid.
J**K
An up to date book that goes beyond hello world
This is a good up-to-date book on grabbing data from web sources..especially for python users who are not professional developers. I have found that any time you collect data, you normally end up either needing to auto-refresh the data, or you end up needing meta-data about the raw data. He has a good example, which is country statistics - something that you may need to 'normalize' other statistical data you are already working with.Typically python libs dealing with url fetching have very simple examples of the parser.parse(<html>"hello world"</html>) type, so you have no idea what to do with a real web page, like a 'needs login' page.The author, to his credit, does not tell you to download a 'magic' python library.In these cases he gives a thorough walk-through on how to research the structure or scripting of the page and then go about fetching it via python. Most of the book is in fact devoted to analysis, rather than action.The only parts that didn't seem particularly useful to me, were the chapters on creating crawlers and spiders. I don't see myself doing any of that and I don't see why an amateur or professional would use them. A professional would probably use Elasticsearch for instance.Other than that the book will probably be useful for a long time.
C**S
Five Stars
Great product.
Trustpilot
1 day ago
3 weeks ago