Solr Beginner's Guide

Author: Alfredo Serafini, ISBN: 9781782162520 [search by ISBN]

I wrote the Apache Solr Beginner's Guide to suggest to the reader how to design and configure a search experience. Using real data as an example, start writing step-by-step, simple, real-world configurations, the idea was in see where and how this technology can be useful. The book covers Solr 4 version.

NOTE (dec 2017): I'm working on a second edition of the book, updated to the latest version of Solr, but there is no certain date for publishing yet.
Meanwhile, for reason I can't understand (people are strange, sometimes, you know), someone published a fake 2nd edition on amazon and other market: I have no involvement at all with that (pale blu cover) book.

This is not a recipe book, however: I wrote it thinking about people who wants to explore things, possibly in a team. It's not designed to be used for simple copy and paste of code, it is only designed to suggest directions to readers, not to provide specific solutions, even if there are of course simple examples. There are many excellent books on Solr at this time, and a must-read reference guide: I feel beginners are often lost in the manuals, so I liked the idea of introducing a little even slightly advanced concepts, widely used in real world context. I hope this will be a useful path for start studying features and concepts whose comprehension can be improved later. I hope that you will not be disappointed finding also examples citing things which seems very far from the Lucene/Solr perceived context. Even if Solr is the main topic, I always feel important to discover how a tool can be used in many different ways: I hope you will be curious enough (and not lazy!) to accept and have fun with this game ;-)

What is Solr?

Solr is an open source enterprise search platform from the Apache Lucene project, well-known for providing features like:

full-text search, faceted search
spell-checking and hits highlighting
dynamic clustering
indexing of external data sources: databases, rich documents (pdf, word, csv, html)

Why Solr?

Solr is written in Java and runs as a standalone full-text search server (using an embedded Jetty container), based on the Lucene Java search library at its core for full-text indexing and search. It also exposes REST-like APIs that make it usable with most popular programming languages, using JSON or XML.

Solr is highly scalable thanks to its distributed search, index replication and other new features introduced from the Solr Cloud version on.

Solr can be easily customized (using java, scala or other JVM languages) writing new plugins to extends its functionalities, but no java code is required to its configuration.

How Solr is used in the book?

The reader will learn the basics of Solr, focusing on real-world examples and practical configurations.

Using data from DBpedia (and some from Wikipedia itself, Web Gallery Of Arts and Open Street Map sites), some different configurations will be used, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing.
The idea is to see how to configure different analysers for handling different data types, without programming. Programming skills are needed only in the last part Chapter, which is designed to briefly introduce you to customizations, and in the Appendix, where there are sections designed to give you an idea on how to write a client in java/scala and other languages.

Conducting several technical courses I have made myself clear that beginners do not need only to copy and paste code: someone may need to start exploring the topic reproducing simple examples in order to understand how to use these technologies. Some others may need to find how to adapt the technology to their needs, and there cannot be recipes which are good for all. My aim was then to provide a narrative path which I hope can be used to open new window of interests about advanced topics, while exploring common use cases with simple examples.

Unfortunately there can be still typos as well as weird phrase deriving from editing errors. If you find some error please submit an errata. Thanks in advance.