Chapter 5. Lucene indexing support

5.1. Configure indices

The support provides facilities in order to configure different kinds of directories in order to access indices. Both RAM and filesystem directories are supported thanks to the RAMDirectoryFactoryBean and FSDirectoryFactoryBean classes.

These two classes allow to handle correctly directories, i.e. they are closed on the shutdown of the application context of Spring. However, these entities don't allow the creation new indices when they don't exist. To do that, you must use an implementation of the IndexFactory interface like the SimpleIndexFactory class and its create property. In this case, the index structure is created at the first access.

The following code shows how to configure a RAM directory:

<bean id="ramDirectory" class="org.springframework.lucene.index.support.RAMDirectoryFactoryBean"/>

<bean id="indexFactory" class="org.springframework.lucene.index.factory.SimpleIndexFactory">
    <property name="directory" ref="ramDirectory"/>
    <property name="create" value="true"/>
</bean>

The following code shows how to configure a filesystem directory:

<bean id="fsDirectory" class="org.springframework.lucene.index.support.FSDirectoryFactoryBean"/>

<bean id="indexFactory" class="org.springframework.lucene.index.factory.SimpleIndexFactory">
    <property name="directory" ref="ramDirectory"/>
    <property name="create" value="true"/>
</bean>

5.2. Root entities

indirection level. change the strategy of management of resource in the configuration without any change in the code. TODO: IndexFactory and abstraction layer TODO: IndexReaderWrapper/IndexWriterWrapper and abstraction layer

Map of the entities of the Spring Lucene support dedicated to the indexing

The following code describes the content of the IndexFactory interface, entity allowing the access to the resources in order to interact with the index:

public interface IndexFactory {
    IndexReaderWrapper getIndexReader();
    IndexWriterWrapper getIndexWriter();
}

The following table shows the different implementations of the IndexFactory interface provided by the support:

Table 5.1.

Implementation Description

SimpleIndexFactory The simplest and by default implementation of the interface which is based on the SimpleIndexReaderWrapper and SimpleIndexWriterWrapper implementations. For each ask of a resource, a new one is created basing on the injected directory. This implementation supports too the creation of the structure for a new index and the locking resolution. In this case, you need to use respectively the create and resolveLock properties.

LockIndexFactory The concurrent implementation based on a lock strategy. This class is a delegating implementation that encapsulate a target index factory. The implementation put a lock at the acquiring of a resource and leave it after its use.

Implementation	Description
SimpleIndexFactory	The simplest and by default implementation of the interface which is based on the `SimpleIndexReaderWrapper` and `SimpleIndexWriterWrapper` implementations. For each ask of a resource, a new one is created basing on the injected directory. This implementation supports too the creation of the structure for a new index and the locking resolution. In this case, you need to use respectively the `create` and `resolveLock` properties.
LockIndexFactory	The concurrent implementation based on a lock strategy. This class is a delegating implementation that encapsulate a target index factory. The implementation put a lock at the acquiring of a resource and leave it after its use.

The IndexFactory interface allows the creation of wrappers for Lucene reader and writer, respectively of types IndexReaderWrapper and IndexWriterWrapper. These wrappers provide the same methods as the target classes but are interface oriented and allow interceptions with proxies.

The following code describes the content of the IndexReaderWrapper interface:

public interface IndexReaderWrapper {
	void close() throws IOException;
	Directory directory();
	int docFreq(Term t) throws IOException;
	Document document(int n) throws IOException;
	Collection getFieldNames(IndexReader.FieldOption fldOption);
	TermFreqVector getTermFreqVector(int docNumber, String field) throws IOException;
	TermFreqVector[] getTermFreqVectors(int docNumber) throws IOException;
	long getVersion();
	boolean hasNorms(String field) throws IOException;
	boolean isCurrent() throws IOException;
	int maxDoc();
	byte[] norms(String field) throws IOException;
	void norms(String field, byte[] bytes, int offset) throws IOException;
	int numDocs();
	void setNorm(int doc, String field, byte value) throws IOException;
	void setNorm(int doc, String field, float value) throws IOException;
	TermDocs termDocs() throws IOException;
	TermDocs termDocs(Term term) throws IOException;
	TermPositions termPositions() throws IOException;
	TermPositions termPositions(Term term) throws IOException;
	TermEnum terms() throws IOException;
	TermEnum terms(Term t) throws IOException;
	SearcherWrapper createSearcher();
}

The following code describes the content of the IndexWriterWrapper interface:

public interface IndexWriterWrapper {
	void addDocument(Document doc) throws IOException;
	void addDocument(Document doc, Analyzer analyzer) throws IOException;
	void addIndexes(Directory[] dirs) throws IOException;
	void addIndexes(IndexReader[] readers) throws IOException;
	void close() throws IOException;
	void commit() throws IOException;
	int docCount();
	void deleteDocuments(Term term) throws IOException;
	Analyzer getAnalyzer();
	Directory getDirectory();
	PrintStream getInfoStream();
	int getMaxBufferedDocs();
	int getMaxFieldLength();
	int getMaxMergeDocs();
	int getMergeFactor();
	Similarity getSimilarity();
	int getTermIndexInterval();
	boolean getUseCompoundFile();
	long getWriteLockTimeout();
	void optimize() throws IOException;
	void rollback() throws IOException;
	void setInfoStream(PrintStream infoStream);
	void setMaxBufferedDocs(int maxBufferedDocs);
	void setMaxFieldLength(int maxFieldLength);
	void setMaxMergeDocs(int maxMergeDocs);
	void setMergeFactor(int mergeFactor);
	void setSimilarity(Similarity similarity);
	void setTermIndexInterval(int interval);
	void setUseCompoundFile(boolean value);
	void undeleteAll() throws IOException;
	void updateDocument(Term term, Document doc) throws IOException;
	void updateDocument(Term term, Document doc, Analyzer analyzer) throws IOException; 
}

The selection of the implementations of these interfaces is automatically done by the implementation of IndexFactory used.

5.3. Index root entities

The Lucene indexing support adds other abstractions in order to TODO:

Table 5.2.

Entity	Interface	Description
Document creator	`DocumentCreator`	This entity enables to create a document in order to use it in an indexing process (add or update).
Document creator using an InputStream	`InputStreamDocumentCreator`	This entity enables to create a document using an InputStream (related to a file or others) in order to use it in an indexing process (add or update).
Documents creator	`DocumentsCreator`	This entity enables to create several documents in order to use it in an indexing process (add or update).
Reader callback	`ReaderCallback`	This entity corresponds to a callback interface in order to enable the use of the underlying resource, Lucene reader wrapper, managed by the Lucene support.
Writer callback	`WriterCallback`	This entity corresponds to a callback interface in order to enable the use of the underlying resource, Lucene writer wrapper, managed by the Lucene support.

The central entity of the support used to execute indexing operations is the Lucene indexing template. It offers several ways to configure indexing according to your needs and your knowledge of the underlying API of Lucene. In this context, we can distinguish three levels to implement indexing in the support:

Using the abstraction level provided by the template;
Using the template with the Lucene entities like Document and Term;
Advanced indexing using the underlying writer instance.

The template allows you to handle several kinds of operations, as described in the following list:

Add operations of document in the index;
Update operations of documents in the index;
Delete operations of documents in the index;
Extra operations like optimizing the index.

We will describe now all these features with these different approaches in the following sections.

5.3.1. Simple indexing based on Lucene entities

The support lets you use directly the Lucene entities like the Document and/ Term classes in order to manipulate the index. In this case, you have the responsability to create the documents to add or update and the term. The latters are then passed to the addDocument(s) and updateDocument(s) methods. In this case, the template has the responsability to handle the underlying Lucene resources in order to manage the operation.

Related to the adding, the support provides the ability to add only one document with the addDocument methods. In order to add several documents, addDocuments methods are provided too.

The following example shows how to create a Lucene document and add it to the index using the addDocument method:

Document document = new Document();
document.add(new Field("field", "a sample 1", Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
getLuceneIndexTemplate().addDocument(document);

The following example shows how to create a list of Lucene documents and add it to the index using the addDocuments method:

List<Document> documents = new ArrayList<Document>();

Document document1 = new Document();
document1.add(new Field("field", "a sample 1", Field.Store.YES, Field.Index.ANALYZED));
document1.add(new Field("sort", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
documents.add(document1);

Document document2 = new Document();
document2.add(new Field("field", "a sample 2", Field.Store.YES, Field.Index.ANALYZED));
document2.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
documents.add(document2);

getLuceneIndexTemplate().addDocuments(documents);

In the case of updating documents, the template does a smart update because it checks if the term corresponds exactly to a document in the case of the updateDocument method and at least to one document in the case of the updateDocuments method.

The update methods of the template use internally the update methods of the index writer. These latters make a delete of documents based on the specified term and then add the document. Thus, the corresponding methods of the template follow the same mechanism.

The following code shows the use of the updateDocument method:

Document document = new Document();
document.add(new Field("field", "a Lucene sample", Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
getLuceneIndexTemplate().updateDocument("field:lucene", document);

The following code shows the use of the updateDocuments method in order to update several documents with one operation of the template:

List<Document> documents = new ArrayList<Document>();

Document document1 = new Document();
document1.add(new Field("field", "a sample", Field.Store.YES, Field.Index.ANALYZED));
document1.add(new Field("sort", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
documents.add(document1);

Document document2 = new Document();
document2.add(new Field("field", "a Lucene sample", Field.Store.YES, Field.Index.ANALYZED));
document2.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
documents.add(document2);

getLuceneIndexTemplate().updateDocuments("field:sample", documents);

The delete methods follow the same mechanisms of the update methods according the checks of documents. Both deleteDocument and deleteDocuments are provided by the template in order to delete one or more documents.

The following code shows the use of deleteDocument and deleteDocuments methods:

getLuceneIndexTemplate().deleteDocument("field:lucene");
getLuceneIndexTemplate().deleteDocuments("field:sample");

5.3.2. Indexing using the abstraction methods of the template

The template of the Lucene support provides too the ability to use an abstraction layer in the process of Lucene document creation. The main advantages of this approach consists in the handling of resources according to the exception eventually thrown during the document creation. The template supports both simple document creators and input stream based document creators.

The template offers the possibility to use this mechanism with addDocument(s) and updateDocument(s) methods. We will describe now this feature.

The following code describes the content of the simplest document creator, which provides a simple way to create a Lucene document:

public interface DocumentCreator {
    Document createDocument() throws Exception;
}

The following code shows the way to use the DocumentCreator interface with the addDocument method of the template:

getLuceneIndexTemplate().addDocument(new DocumentCreator() {
    public Document createDocument() throws Exception {
        Document document = new Document();
        document.add(new Field("field", "a Lucene sample", Field.Store.YES, Field.Index.ANALYZED));
        document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
        return document;
    }
});

The InputStreamDocumentCreator interface provides a dedicated entity in order to create a Lucene document from an InputStream. When using this interface, you need to specify how to obtain the InputStream and to create a document from this InputStream.

The template has the responsability to correctly handle the InputStream and to close it in every case. The following code describes the content of the InputStreamDocumentCreator interface:

public interface InputStreamDocumentCreator {
    InputStream createInputStream() throws IOException;
    Document createDocumentFromInputStream(InputStream inputStream) throws Exception;
}

The following code shows the way to use the InputStreamDocumentCreator interface with the addDocument method of the template:

template.addDocument(new InputStreamDocumentCreator() {
    public InputStream createInputStream() throws IOException {
        ClassPathResource resource = new ClassPathResource(
                     "/org/springframework/lucene/index/core/test.txt");
        return resource.getInputStream();
    }

    public Document createDocumentFromInputStream(InputStream inputStream) throws Exception {
        String contents = IOUtils.getContents(inputStream);

        Document document = new Document();
        document.add(new Field("field", contents, Field.Store.YES, Field.Index.ANALYZED));
        document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
        return document;
    }
});

Finally the support provides an entity in order to create a list of Lucene documents to be added, the DocumentsCreator interface. It is similar to the DocumentCreator interface. The following code describes this interface:

public interface DocumentsCreator {
    List<Document> createDocuments() throws Exception;
}

The following code shows the way to use the DocumentsCreator interface with the addDocuments method of the template:

template.addDocuments(new DocumentsCreator() {
    public List<Document> createDocuments() throws Exception {
        List<Document> documents = new ArrayList<Document>();

        Document document1 = new Document();
        document1.add(new Field("field", "a Lucene sample", Field.Store.YES, Field.Index.ANALYZED));
        document1.add(new Field("sort", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
        documents.add(document1);

        Document document2 = new Document();
        document2.add(new Field("field", "a sample", Field.Store.YES, Field.Index.ANALYZED));
        document2.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
        documents.add(document2);

        return documents;
    }
});

5.3.3. Analyzer specification

There are two approaches in order to specify an analyzer during the indexing process. On one hand, you can set a global analyzer for the template which is used as default analyzer. Thus, with methods without an analyzer parameter, the global analyzer of the template is used. The following code describes the use of this approach:

Document document = new Document();
document.add(new Field("field", "a sample 1", Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
getLuceneIndexTemplate().addDocument(document);

In the code above, the default analyzer injected in the template is used to add the document. If no analyzer is defined, an exception is thrown.

The support provides too methods with a analyzer parameter. In this case, the specified analyzer is used instead of the default one. The following code describes the use of this approach with the same operation:

SimpleAnalyzer analyzer = new SimpleAnalyzer();

Document document = new Document();
document.add(new Field("field", "a sample 1", Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
getLuceneIndexTemplate().addDocument(document, analyzer);

5.3.4. Other indexing operations

TODO: describe the use of optimize, getXXX methods

5.3.5. Use of the underlying resource

The aim of the indexing template is to integrate and hide the use of Lucene API in order to make easy the indexing. This entity provides to the developer all the common operations in this context. However, if you need to go beyond these methods, the template enables to provide the underlying reader and writer entities in order to use it explicitely.

The feature is based on the ReaderCallback and WriterCallback interfaces described above. When using the entity, you need to implement respectively the doWithReader and doWithWriter methods which gives you the underlying instance corresponding the reader and the writer. These latter can be used in your indexing.

The Lucene support continues however to handle and manage these resources. The code below describes the content of the interface ReaderCallback:

public interface ReaderCallback {
    Object doWithReader(IndexReaderWrapper reader) throws Exception;
}

The code below describes the content of the interface WriterCallback:

public interface WriterCallback {
    Object doWithWriter(IndexWriterWrapper writer) throws Exception;
}

This entity can be used as parameter of the read and writer methods of the template, as shown in the following code:

getLuceneIndexTemplate().write(new WriterCallback() {
    public Object doWithWriter(IndexWriterWrapper writer) throws Exception {
        Document document = new Document();
        document.add(new Field("field", "a sample", Field.Store.YES, Field.Index.ANALYZED));
		
        writer.addDocument(document);

        return null;
    }
});

When using the underlying writer, the analyzer specified for the index template is not used. You need to explicitely specify it on your calls or configure it on the index factory used. In the latter case, the analyzer is set during the creation of the writer.

5.4. DAO support class

Like in the other dao supports of Spring, the Lucene support provides a dedicated entity in order to make easier the injection of resources in the entities implementing Lucene indexing. This entity, the LuceneIndexDaoSupport class, allows to inject a IndexFactory and an analyzer. You don't need anymore to create the corresponding injection methods.

In the same time, the class provides too the getLuceneIndexTemplate method in order to have access to the index template of the support.

The following code describes the use of the LuceneIndexDaoSupport in a class implementing a search:

public class SampleSearchService extends LuceneIndexDaoSupport {

    public void indexDocuments() {
        Document document = new Document();
        document.add(new Field("field", "a sample 1", Field.Store.YES, Field.Index.ANALYZED));
        document.add(new Field("sort", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
        getLuceneIndexTemplate().addDocument(document);
    }
}

You can note that the class makes possible to directly inject a configured LuceneIndexTemplate in Spring.

5.5. Resource management and transactions

Lucene has the particularity to allow the creation of only one index writer for an index simultaneously. That's why you need to be careful when indexing documents.

On the other hand, Lucene provides now a support of transactions when updating the index. The support provides a dedicated implementation of the Spring PlatformTransactionManager in order to use the transactional support of the framework.

This implementation, the LuceneIndexTransactionManager class, must be configured using an instance of IndexFactory, as shown in the following code:

<bean id="indexFactory" class="org.springframework.lucene.index.factory.SimpleIndexFactory">
    (...)
</bean>

<bean id="transactionManager" class="org.springframework.lucene.index.factory.LuceneIndexTransactionManager">
    <property name="indexFactory" ref="indexFactory"/>
</bean>

This implementation of the PlatformTransactionManager supports read-only transactions, which allows to extend the scope of a Lucene index reader but doesn't allow the use of an index writer.

Prev	Home	Next
Part II. Lucene Support Reference Documentation	Sponsored by SpringSource	Chapter 6. Lucene search support