Scalable Full-Text Search Indexing using Lucene and CouchDB

The purpose of this post is to outline the steps I took in adding Full-Text Search indexing to a CouchDB installation. In addition I describe how to create a portable couchdb-lucene installation  that can be deployed to any server, runs in its own Java virtual machine and runs as a windows service using Java Service Wrapper.

The Facts

  • I am running a CouchDB server. Version 1.1.0 to be exact. For the configuration described here to work, you need to be using at least version 1.1.0 of CouchDB.
  • My couchDB sever is running at http://localhost:5984.
  • I wish to add full-text indexing to one or more of my databases on CouchDB.
  • I am running Windows 7.
  • I have java installed.

The Steps.

First of all, you need to download couchdb-lucene at least version 0.7.0 from the couchdb-lucene project on github.

You also need to download the Java Service Wrapper from Tanuki Software.  I am testing with 3.5.11 community version of the java service wrapper.

If you do not have java installed, do it now. Install, preferably, at least version 1.5 of the Java runtime environment.

Create a folder in the C: drive and name it dist. Inside this folder, extract couchdb-lucene 0.7.0. Also, copy the jre folder of the Java you have installed or just installed into this dist folder. So now you have a structure like the following:

dist

|–> couchdb-lucene.v.0.7.0

|–> jre6

Extract the java service wrapper into any folder of your choice but our dist folder from above. 🙂

Now its time to integrate the java service wrapper into couchdb-lucene. There are four directories that need to be present inside the couchdb-lucene root folder:

  • bin
  • lib
  • conf
  • logs

Copy the following files into couchdb-lucene’s bin folder.

{WRAPPER_HOME}\bin\wrapper.exe
{WRAPPER_HOME}\src\bin\App.bat.in
{WRAPPER_HOME}\src\bin\InstallApp-NT.bat.in
{WRAPPER_HOME}\src\bin\UninstallApp-NT.bat.in

Rename the three batch files, substituting App with your applications name, for example. Be sure to remove the .in extensions so that the files all end in .bat.
(Depending on how your file explorer is configured on your computer, you may not be able to see file extensions.)
You should now have:

{couchdblucene_HOME}\bin\wrapper.exe
{couchdblucene_HOME}\bin\couchdblucene.bat
{couchdblucene_HOME}\bin\Installcouchdblucene-NT.bat
{couchdblucene_HOME}\bin\Uninstallcouchdblucene-NT.bat

The wrapper.exe is the actual wrapper executable. couchdblucene.bat is used to run lucene in the console, and it’s Install and Uninstall counterparts are used to install and uninstall the windows service for couchdb lucene  respectively.

Then copy the following files into couchdb-lucene’s lib folder.

{WRAPPER_HOME}\lib\wrapper.jar
{WRAPPER_HOME}\lib\wrapper.dll

The wrapper.dll file is a native library file required by the portion of the Wrapper which runs within the JVM. The wrapper.jar file contains all of the Wrapper classes.
You should now have.

{couchdblucene_HOME}\lib\wrapper.jar
{couchdblucene_HOME}\lib\wrapper.dll

Now copy the following template for a wrapper.conf file into couchdb-lucene’s conf folder

{WRAPPER_HOME}\src\conf\wrapper.conf.in

now you should have

{couchdblucene_HOME}\conf\wrapper.conf

Create a logs directory inside couchdb-lucene’s root directory so you now have

{couchdblucene_HOME}\logs

Now it is time to modify your wrapper.conf to tell the java service wrapper how it is supposed to run couchdb-lucene in service mode.  I have mine below. Notice how the JAVA_HOME variable is set to the location of the jre6 folder we have in the dist folder (next to couchdb-lucene0.7.0).

set.COUCH_HOME=..\
set.JAVA_HOME=..\..\jre6
wrapper.java.command=%JAVA_HOME%/bin/java

# Tell the Wrapper to log the full generated Java command line.
#wrapper.java.command.loglevel=INFO

# Java Main class.  This class must implement the WrapperListener interface
#  or guarantee that the WrapperManager class is initialized.  Helper
#  classes are provided to do this for you.  See the Integration section
#  of the documentation for details.
wrapper.java.mainclass=org.tanukisoftware.wrapper.WrapperSimpleApp

# Java Classpath (include wrapper.jar)  Add class path elements as
#  needed starting from 1
wrapper.java.classpath.1=../lib/wrapper.jar
wrapper.java.classpath.2=../lib/*.jar

# Java Library Path (location of Wrapper.DLL or libwrapper.so)
wrapper.java.library.path.1=../lib

# Java Bits.  On applicable platforms, tells the JVM to run in 32 or 64-bit mode.
wrapper.java.additional.auto_bits=TRUE

# Java Additional Parameters
wrapper.java.additional.1=-server
wrapper.java.additional.2=-XX:MaxPermSize=256m

# Initial Java Heap Size (in MB)
wrapper.java.initmemory=256

# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=1024

# Application parameters.  Add parameters as needed starting from 1
wrapper.app.parameter.1=com.github.rnewson.couchdb.lucene.Main

#********************************************************************
# Wrapper Logging Properties
#********************************************************************
# Enables Debug output from the Wrapper.
# wrapper.debug=TRUE

# Format of output for the console.  (See docs for formats)
wrapper.console.format=PM

# Log Level for console output.  (See docs for log levels)
wrapper.console.loglevel=INFO

# Log file to use for wrapper output logging.
wrapper.logfile=../logs/wrapper.log

# Format of output for the log file.  (See docs for formats)
wrapper.logfile.format=LPTM

# Log Level for log file output.  (See docs for log levels)
wrapper.logfile.loglevel=INFO

# Maximum size that the log file will be allowed to grow to before
#  the log is rolled. Size is specified in bytes.  The default value
#  of 0, disables log rolling.  May abbreviate with the 'k' (kb) or
#  'm' (mb) suffix.  For example: 10m = 10 megabytes.
wrapper.logfile.maxsize=0

# Maximum number of rolled log files which will be allowed before old
#  files are deleted.  The default value of 0 implies no limit.
wrapper.logfile.maxfiles=0

# Log Level for sys/event log output.  (See docs for log levels)
wrapper.syslog.loglevel=NONE

#********************************************************************
# Wrapper General Properties
#********************************************************************
# Allow for the use of non-contiguous numbered properties
wrapper.ignore_sequence_gaps=TRUE

# Title to use when running as a console
wrapper.console.title=CouchDB Lucene 0.7.0

#********************************************************************
# Wrapper JVM Checks
#********************************************************************
# Detect DeadLocked Threads in the JVM. (Requires Standard Edition)
wrapper.check.deadlock=TRUE
wrapper.check.deadlock.interval=60
wrapper.check.deadlock.action=RESTART
wrapper.check.deadlock.output=FULL

# Out Of Memory detection.
# (Simple match)
wrapper.filter.trigger.1000=java.lang.OutOfMemoryError
# (Only match text in stack traces if -XX:+PrintClassHistogram is being used.)
#wrapper.filter.trigger.1000=Exception in thread "*" java.lang.OutOfMemoryError
#wrapper.filter.allow_wildcards.1000=TRUE
wrapper.filter.action.1000=RESTART
wrapper.filter.message.1000=The JVM has run out of memory.

#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
#  using this configuration file has been installed as a service.
#  Please uninstall the service before modifying this section.  The
#  service can then be reinstalled.

# Name of the service
wrapper.name=CouchDB Lucene 0.7.0

# Display name of the service
wrapper.displayname=CouchDB Lucene 0.7.0

# Description of the service
wrapper.description=CouchDB Lucene 0.7.0

# Service dependencies.  Add dependencies as needed starting from 1
wrapper.ntservice.dependency.1=

# Mode in which the service is installed.  AUTO_START, DELAY_START or DEMAND_START
wrapper.ntservice.starttype=AUTO_START

# Allow the service to interact with the desktop.
wrapper.ntservice.interactive=false

If you followed the steps, running Installcouchdblucene.bat from the bin folder would install couchdb-lucene as a windows service on your machine.

The next step is to tell our couchDB installation about the new buddy in town, couchdb-lucene.  Open the local.ini file in couchDB server’s installation directory. The path is %couchhome%\etc\couchdb\local.ini and add the following:

[couchdb]
timeout = 60000
[httpd_global_handlers]
_fti = {couch_httpd_proxy, handle_proxy_req, <<"http://localhost:5985">>}

Take note of the http://localhost:5985 url. That is the url of couchdb-lucene.

Restart couchDB and couchdblucene services and you should have both of them integrated on start up. The actual indexing technique is a topic for another blog post.

References:
https://github.com/rnewson/couchdb-lucene
http://wrapper.tanukisoftware.com/doc/english/integrate-simple-win.html
http://blog.foaa.de/2011/05/squeeze-couchdb-lucene/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s