Deliver to Belgium
IFor best experience Get the App
Full description not available
E**R
Good introduction to Hadoop ecosystem
After checking out reviews of what O'Reilly and Apress had to offer with regard to Hadoop, I ended up purchasing this book based on positive reviews, my past positive experiences with the Manning "In Action" series of texts in general, such as "Spring in Action" and "Java Persistence with Hibernate", formerly "Hibernate in Action" (see my reviews), and the fact that this book was the most recently published on the subject. In short, this text is well organized, and covers its focus on Hadoop well, but potential readers should be aware that about one-third of what Lam has to offer here are ancillary to Hadoop, and not with regard to Hadoop itself. Inclusion of the larger ecosystem within which Hadoop sits personally makes sense, and I do not think this aspect of the book detracts from what the author provides in any way.The author provides a good introduction to Hadoop in the first three chapters, which includes a discussion on differences between Hadoop and traditional technologies in this space, such as relational databases, as well as a tour of Hadoop building blocks, working with files in the Hadoop Distributed File System (HDFS), and the anatomy of a MapReduce program. The next three chapters contain the bulk of the text, which focuses on writing MapReduce programs, and includes segments on chaining MapReduce jobs, joining data from different sources, creating a Bloom filter, and monitoring, debugging, and tuning.The next two chapters offer a short cookbook in which the author presents 5 different general MapReduce techniques (Lam admits that specialized MapReduce techniques can be found rather easily by Googling, and that he does not intend this cookbook to be comprehensive in any way), as well as a chapter on managing Hadoop, followed by four chapters on running Hadoop in the cloud, brief introductions on programming with Pig (a Hadoop extension that provides a language called Pig Latin) and using Hive (a package built on top of Hadoop that provides a SQL-like language called HiveQL). and a chapter that discusses four Hadoop case studies from the New York Times, China Mobile, StumbleUpon, and IBM (the case study from IBM takes up about 50% of the discussion, and the case study from the New York times is less than a page).Be aware that at the time of this review, this book was published over a year ago. One of the common complaints I read about what O'Reilly and Apress have to offer in this space is that their counterparts to this book cover older versions of Hadoop. In chapter 4, Lam mentions that "one of the main design goals driving toward Hadoop's major 1.0 release is a stable and extensible MapReduce API. As of this writing, version 0.20 is the latest release and is considered a bridge between the older API (that we use throughout this book) and this upcoming stable API. The 0.20 release supports the future API while maintaining backward-compatibility with the old one by marking it deprecated.""Future releases after 0.20 will stop supporting the older API. As of this writing, we don't recommend jumping into the new API yet for a couple reasons: (1) Many of Hadoop's own library classes in 0.20 aren't written under the new API yet. You won't be able to use those classes if your MapReduce code uses the new API in 0.20. (2) Many still consider the most production-ready and stable version of Hadoop as of this writing to be 0.18.3. Some users are warming up to version 0.20, but we suggest you wait a little longer before going full production with it." The author follows up by writing that "by the time you read this the situation may be different. In this section we cover the changes the new API presents. Fortunately, almost all the changes affect only the basic MapReduce template. We rewrite the template under the new API to enable you to use it in the future."Exactly two weeks ago today, Hadoop 1.0.1 was released after 6 years of development. Inbetween the version that this book covers, and this most recent version, several intermediary versions were released, which provide bug fixes, improvements, optimizations, and new features, as well as support for some of the offerings in the Hadoop ecosystem. More timely information on open source technologies that enjoy wide community support is always going to be more readily available on the internet, especially via blog posts, but in my opinion this fact does not detract from the value of this text, which still serves as a good introduction to the Hadoop ecosystem, especially for those more comfortable starting out with a published text. Just be aware that you will be quickly referring to other materials after you make your way through this text.The portions that I especially appreciated about what Lam has to offer include his presentations in chapter 5 on reduce-side joining and creating a Bloom filter, the cookbook that he provides in chapter 7 that includes segments on passing job-specific parameters to tasks, probing for task-specific information, partitioning into multiple output files, inputting from and outputting to a database, and keeping all output in sorted order, as well as chapters 9, 10, 11, which discuss the larger Hadoop ecosystem, especially the introduction to Pig Latin. Recommended to anyone looking for an introduction to the Hapoop ecosystem of technologies who understands that published texts such as this one cannot contain information about the latest releases.
K**E
Solid high-level intro
I bought this book for a project at work, to prototype a log analysis system using Hadoop. I haven't bought very many technical books in the last few years, but the quality of most online documentation for Hadoop is poor and books seemed like a better option. It was useful, and I kept it open on my desk for quite a while as I worked to get the infrastructure set up. Consider it a high-level intro to lots of different Hadoop topics, and you'll be happy with it. Just don't expect it to answer all of your questions. You'll probably still end up doing a lot of digging through other online sources, because the Hadoop ecosystem is large and complicated, and no book can really cover all of it. Besides this book, I also bought Hadoop: The Definitive Guide (larger than this book, and a bit more useful) and Data Intensive Text Processing With MapReduce (which gave me a good intro to the Map Reduce algorithm, but wasn't that useful once I had a general idea what was going on).
P**H
Hadoop book for normal people
I really love this book, is made for normal people just trying to get something done. The streaming coverage is perty good, it's the best book for python type of people I've seen. Lot of configuration information - very practical. I can't really review the java examples, but i did like the very practical examples on simple combiners. I think this book in combination with the newer version of "definitive guide"(make sure to get the recent one), really makes a solid statement on the hadoop front. I think both books are mandatory for anyone doing anything serious in hadoop.
K**K
Very good, but dated
Succinct and to the point, but a few years out of date. Still, it achieved what I needed: a medium depth dive into the technology in a book small enough to take on a plane.
R**H
Must have for anyone running Hadoop in production
No cruft, here! Every page is loaded with gold. Highly relevant, useful examples. I teach Cloudera Hadoop for Global Knowledge, and frequently both reference this book & recommend students buy it.
G**N
outdated!
This book covers Hadoop version 0.20, which is quite outdated relative to the current stable version of 1.x and coming 2.x. It should have been made clear on the cover or in the preface or product description but actually none until page 28 when starting talking about configuring Hadoop. In my opinion, authors and publishers should make it clear right upfront about what version of the product is covered to help readers make an informed choice. In addition, the examples are mostly based on the ubiquitous word-counting example that everyone uses, which is quite boring. If you just want to read about Hadoop and don't plan to actually run any samples, this book is fine. But if you also want to try some samples not based on the word-counting example, you might want to check out another book titled "Hadoop Essentials: A Quantitative Approach" Hadoop Essentials: A Quantitative Approach , which is based on the latest stable release of 1.0.3.
N**A
Great book, but outdated.
Great intro to Hadoop and the Hadoop Ecosystem. The reason I give this 4 stars is because this book is fairly outdated. The Hadoop world is moving at the speed of light, and a book published 3-4 years ago will not give you the necessary skills to work with today's versions/APIs of MapReduce/HDFS/etc. If you want more than a conceptual understanding of Hadoop, I would wait for the second edition (that will is expected to come out next year) or find another book.
A**R
The python is particularly useful to me for the class I am in
Using it. The python is particularly useful to me for the class I am in.
T**A
Good condition
Good condition and reasonable price
D**A
Ottimo per iniziare
Lascio questo commento a beneficio di chi, come è capitato a me a suo tempo, avesse bisogno di un buon testo per diventare rapidamente operativi su Hadoop. Esposizione chiara, lettura piuttosto scorrevole e ho apprezzato molto la presenza di piccoli esempi in Python. Mi sento di consigliarlo ai principianti più di "Hadoop: the definitive guide" che si presta meglio a chi vuole sfruttare le potenzialità di Hadoop fino in fondo conoscendone già i principi di funzionamento. Infine, ho apprezzato molto la scelta dell'editore di dare l'accesso gratuito illimitato alla versione ebook a chi possiede già quella cartacea.
L**O
bon ouvrage technique
Bon ouvrage. Mais technique. Pas forcément le meilleur pour comprendre les concepts, par contre pour mettre les mains dans le cambouis, un must. D'excellente séances de Travaux Pratiques en perspective !
Trustpilot
1 week ago
5 days ago