Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition (The Morgan Kaufmann Series in Multimedia Information and Systems)
I**K
Two decades later, it's still the best introduction to Information Retrieval
While this book was published almost two decades ago, it is still the best introductory text to the topic of information retrieval. What distinguishes this book from is that it doesn't assume any previous knowledge - technical or otherwise - on the topic, and builds all ideas and concepts presented from the ground up.While the some of the algorithms presented have since become obsolete, many are still relevant and in use today, and all provide the reader with a solid foundation. It also explains in a clear and concise manner the problems and challenges that all IR-systems and search engines have to solve, even today.
P**G
The Wonderful Thing Is: It's the Only One
This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition!
Z**Z
one of the best book on search engineering
It has been 8 years since it was published and I could see it is still one of the best in IR field. Without much long magic equations, it is not hard for common user to pick it up. There are mainly 2 parts in the book, the first book is compression, most of them are just principle introduction since it does not make sense for the read to invent or implement an algorithm. The second part is indexing (plus some query) which I highly recommended because it is "practical". The authors are smart guys who could do sth, google mg for their website and mg4j for the ported java implementation.
G**S
A Comprehensive Introduction To Text Retrieval Systems
A wonderful feature of this book spans out practicality for various topics including compresion algorithms and theory, document and imaging system and information retrieval. On my personal interest, the authors highlight a vast list of not only the theory but present it in a simple common sense logic.There are several examples that break down complex processes into simple and easy to understand logic and the pages provides a smooth flow of the structured topics. Well organised, presented and fully informative.Truly an ideal book. This serves as a superior text for students studying document and imaging systems, processing and information and multimedia retrieval subjects. Beautiful!!!Just on a personal note, it would be great to see some emphasis in the future editions in regards to web mining applications.
A**G
Very clear, but misses some key real-world issues
As others have said, MG is a good introductory text for Information Retrieval. However I think it spends a little too much time on compression techniques and lacks a good discussion of incremental or on-line indexing. The book tends to assume that the set of texts to be searched is static - if new documents can be added or old ones deleted it makes the whole problem much harder and many of MG's techniques are no longer relevant. That said, I strongly look forward to Managing Terabytes (if it ever appears).
N**U
Well, written, with plenty of nuts and bolts
I found MG exceedingly readable, and particularly useful. The ideas are very well explained, and the problems are solved in a stepwise fashion, leading from a simple, inefficient solution to a problem to a more complex, efficient one. Where appropriate, pseudocode is included to communicate the algorithms unambiguously. I use the free MG software in my research on information retrieval, and this book is an indispensible supplement to the software. The ideas on compression and efficiency described in the book and implemented in the software are the best that I know of in the public domain, and I've looked!
A**T
Good introduction to searching/indexing in data.
MG gave a good introduction to the components of practical Information Retrieval (IR). You can clearly see that the authors have a genuine interest in the field! But, I would like some more theoretical analysis of the algorithms used(i.e. O-notation), and more focus on parallell implementations of IR systems. Another book related to the same area worth mentioning is "Modern Information Retrieval".
R**S
old but the more interesting for it.
I am a student again & this was/is one of the required texts. Written after I was last at college tho only just. Well written.
電**ん
テキスト圧縮の研究用に購入.内容は良いが,英語が難しい.
テキスト圧縮と画像などの圧縮について書いてあります.私はテキスト圧縮のみに興味があるのでそこの部分しか読んでません.内容は盛りだくさん(というか盛りだくさんすぎるきらいはある)です.もう少し,整理して書いてあればいいのにと思うのと,著者がオーストラリアの人なのか英語がイギリス英語の人がつかいそうな感じの文です.はっきりいって,アメリカ人が書くわかりやすい(世界で売る気まんまんの)本より読みにくいです.また,証明など難しいところは右側に線が引いてあってそれは工夫としてはよいと思いました.もっと例題がほしいところです.ただ,この手の本の詳しい本があまりないので,研究するには必須の本なのかなーという感じです.画像圧縮については,もっと違う本も沢山出ていますのでそれが欲しいなら他の本をあたるということで星1個さげて4としました.
Trustpilot
4 days ago
5 days ago