by admin
Just wanted to take this error message off of the, “Hey, we’ve seen this before… now how did we resolve this..?” pile. This is the full text of the error:
WARNING Could not send batch to ESP content distributor, will retry automatically.
Reason given: process() failed: exception (no_resources) no doc procs registered to
process a batch with priority 0
At first glance, it looks pretty clear that you just need to [re]start your document processor(s). However, this won’t necessarily solve the problem. Turns out that the a likely reason for this to pop up is a bad Document Processing Pipeline (DPP) Stage. The docprocs fire up, hit the bad stage (e.g. python errors etc.) and don’t recover.
To debug your DPP Stage, take a look at the logs for the document processor(s). They’re usually located in $FASTSEARCH/var/log/procserver and, in our experience, there’s probably an uncaught python exception lurking somewhere in there.
Tags: custom esp stage, doc procs, docproc, document processing pipeline, document processor, dpp, esp, FAST ESP
Posted in FAST ESP | 1 Comment »
by admin

Add your own services to ESP's Node Controller
Background
Integrating our own custom services with Fast ESP’s Node Controller provides us with several benefits:
- Administrators without in-depth ESP knowledge can easily control services (e.g. start, stop, configure parameters)
- Services can be started at boot time with the rest of ESP
espdeploy can be used to install our services in a multi-node cluster
The components required for this system are:
- The ESP Node Controller (with config file
NodeConf.xml)
- The 3rd-party service (like a CherryPy server, log parser, etc.)
- A wrapper script (see below)
Steps for Integration
- Define the service you would like to integrate. It can be any script or binary that can be executed on the system. For example, the service might be a python script that takes command-line arguments and continues running itself (as is the case with a webserver).
- Create the wrapper script that sets up the proper environment and runs/stops the service properly. The wrapper script should be put in the $FASTSEARCH/bin directory (with executable permissions). Additionally, the wrapper script should pass $@ to your actual script so any/all arguments defined in $FASTSEARCH/etc/NodeConf.xml will be passed along properly from the Node Controller to your service. The following is an example of a wrapper script:
#!/bin/sh
# export the proper python path
export PYTHONPATH=":/path/to/python"
# run the script (backgrounded)
python $FASTSEARCH/lib/python2.6/yourmodule/yourservice.py $@ &
# determine the process id of the python script
SCRIPT_PID="$!"
# upon receiving a SIGTERM, forward it to the process
trap "kill -TERM $SCRIPT_PID" SIGTERM
# wait for SIGTERM from nctrl
wait
- Define the service in $FASTSEARCH/etc/NodeConf.xml
Add the following to the end of the <startorder> tag:
<proc>servicename</proc>
Add the following to the end of the <node> tag, customizing as appropriate:
<!-- My Custom Service -->
<process name="servicename" description="My Custom Service">
<start>
<executable>binaryname</executable>
<parameters>-p 16940 -v</parameters>
<port base="4004"/>
</start>
<outfile>servicename.scrap</outfile>
</process>
- Reload the Node Controller configuration with the following:
nctrl reloadcfg
And that’s it! Now you should be able to start, stop, configure, and deploy your services using Fast ESP tools. Enjoy!
Tags: bash, c++, custom, FAST ESP, java, nctrl, node controller, nodeconf, NodeConf.xml, python, sh
Posted in FAST ESP | No Comments »
by admin
Recently, we needed to iterate over a fairly large data set (on the order of millions) and do the ever-common If it’s not in the database, put it in. If it’s already there, just update some fields. It’s a pattern that is very common for things like log files (where, for example, only a timestamp needs to be updated in some cases).
The obvious way of doing a SELECT, followed by either an UPDATE or an INSERT is too slow for even moderately-large datasets. The better way to accomplish this is to use MySQL’s ON DUPLICATE KEY UPDATE directive. By simply creating a unique key on the fields that should be different per-row, this syntax provides two specific benefits:
- Allows batch (read: transaction) queries for large data
- Increases performance overall versus making two separate queries
These benefits are especially helpful when your dataset is too large to fit into memory. The obvious drawback to this method, however, is that it may put additional load on your database server. Like anything else, it’s worth testing out your individual situation but, for us, ON DUPLICATE KEY UPDATE was the way to go.
Tags: database, index, key, mysql, on duplicate key update, performance
Posted in Uncategorized | No Comments »
by admin
Recently, I was populating a database with lines from a number of log files. One of the key pieces of information in each of these log lines was a URL. Because URLs can be pretty much as long as they want to be (or can they?) I decided to make the URL field a Text type in my schema. Then, because I wanted fast lookups, I tried to add an index (key) on this field and ran into this guy:
ERROR 1170 (42000): BLOB/TEXT column ‘field_name’ used in key specification
without a key length
It turns out that MySQL can only index the first N characters of a Blob or Text field – but for a URL, that’s not good enough. After talking it over with my team members, we decided to instead add a field – url_md5. By storing the md5sum of each URL, we could index on the hash field and have both fast lookups and avoid worrying about domains like this fitting into a VARCHAR.
Tags: 1170, 42000, blob, error, index, indices, key, mysql, text
Posted in Uncategorized | 1 Comment »
by Michael McIntosh
We use FAST ESP to power a large industrial search engine listing over 1 million companies and over 3 million indexed documents and receiving millions of visitors every month. I have been working with ESP since 2003 (then known as FDS 3.2).
FAST ESP is extremely flexible and can deal with indexing many document types (html, pdf, word, etc). It has a very robust crawler for web documents and you can use their intermediary FastXML format to load custom document formats into the system or use their Content APIs. Read the rest of this entry »
Posted in FAST ESP | No Comments »