While many users of Blur will find the search system sufficient for their needs out of the box, the Blur platform exposes a simple set of lower-level primitives that allow the user to easily and quickly introduce new system behavior.

With this release, we expose the initial read-only constructs for the platform. Future releases, will allow introduce more rich read-write constructs.

NOTE: In 0.2.4, the platform capability described here exists, but existing functionality of Blur has not yet been ported to use it.

In modern open source search platforms, we find Lucene at the very core and a monolithic application stack implemented on top of it handling the distributed indexing, searching, failures, features, etc. Indeed, this was true of Blur as well.

We wanted more flexibility. We wanted to rapidly be able to introduce brand new features into the system. So, we supposed it would be helpful if an intermediate abstraction could be introduced providing the primitives for a distributed Lucene server on which specific search applications could be built.

Some specific goals we had in mind:

  • To allow for indexing/searching based on other/new data models (e.g. more than just the Row/Record constructs).
  • Allow implementations to build whole new APIs given direct access to the Lucene primitives.
  • Allow flexibility to build totally custom applications.
  • Remove the complexities of threading, networking and concurrency from new feature creation.

The Blur platform provides a set of Command classes that can be implemented to achieve new functionality. A basic understanding of how Blur works will greatly help in understanding how to implement commands. So let's take a moment to review.

In Blur, we refer to a logical Lucene index as a table. Tables are typically very large, and so we divide them up into 'shards'. Now, each shard is exposed through a Shard Server, which is sort of a container of shards. The Shard Server(s) are organized into a cluster that work together to make all the shards of the table(s) available. For scalability, we've divided up the logical table into shards spread across the Shard Server(s). We then put another type of server, called a Controller, in front of the cluster to present all the shards as a single logical table.

For the controller to present all the shards as a single index, it needs to accept a request, then scatter the request to all the shard servers, combine the results in some meaningful way, and send them back to the client.

@TODO

As we've gathered from above, the heart of a distributed search system is the ability to execute some function across a set of indices and combine the results in a logical way to be returned to the user. Not surprisingly, this is also at the heart of the Blur Platform. As an introduction, we'll explore how to take a look at finding the number of documents that contain a particular term across all shards in a table.

Our first step will be to find the answer for a single shard/index. Lucene's IndexReader, to which we'll have access in our command, conveniently gives us that. Getting the answer for a single index requires implementing an execute method.

@Override
public Long execute(IndexContext context) throws IOException {
  return (long) context.getIndexReader().numDocs();
}

We'll learn where the field name and term are defined later in the Arguments section. Inside of the execute method, we're focused on finding the answer for a single shard/index. To find our answer, we're given an IndexContext which provides us access to the underlying Lucene index, so for our trivial command we can simply return the answer directly from the IndexReader.

Now we need to let Blur know how to combine the results from the individual shards into a single logical response. We do this by implementing the combine method.

@Override
public Long combine(CombiningContext context, Map<? extends Location<?>, Long> results) throws IOException {
  long total = 0;
  for (Long l : results.values()) {
    total += l;
  }
  return total;
}

Again, we're given some execution context (which we don't need for our sample command) and we're given an Map<? extends Location<?>, Long> of result values.

Recall from above that in the execute method we were able to use some member variables that were treated like arguments to the command. Now, let's take a closer look how they were provided.

We've kept it very simple for you to declare arguments and for your users to provide arguments. We provide two simple annotations that you can place right on your member field declarations indicating whether they are required or optional. You can [and are encouraged to] provide some helpful documentation on the intent of the argument. As an example, by extending the TableReadCommand you get the table argument as required for free. Let's look at how it's declared:

@RequiredArgument("The name of the table.")
private String table;						
						

Naturally, we can also declare optional arguments as well:

@OptionalArgument("The number of results to be returned. default=10")
private short size = 10;						
						

By annotating your parameters, the Blur Platform is able to do the basic requirement checking for you allowing you to keep the inside of your execute/combine clean of argument validation.

@TODO

Commands should be self-documenting starting with a good name. But good naming is not sufficient, so Blur offers a @Description annotation to provide a nice way to better express what your command does. It's simply used like so:

@Description("Returns the number of documents containing the term in the given field.")
public class DocFreqCommand extends TableReadCommand {
  ...
}						
						

See Using Blur -> Shell -> Platform Commands.