<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Enterprise Search Blog</title>
	<atom:link href="http:///enterprise-search-blog/enterprise-search-blog/2010/01/integrate-custom-services-with-the-fast-esp-node-controller/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tnrglobal.com/sys-admin-blog/</link>
	<description></description>
	<lastBuildDate>Mon, 26 Apr 2010 20:36:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Fast ESP Error: no doc procs registered to process a batch with priority 0</title>
		<link>http://www.tnrglobal.com/sys-admin-blog/2010/03/fast-esp-error-no-doc-procs-registered-to-process-a-batch-with-priority-0/</link>
		<comments>http://www.tnrglobal.com/sys-admin-blog/2010/03/fast-esp-error-no-doc-procs-registered-to-process-a-batch-with-priority-0/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 16:02:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[FAST ESP]]></category>
		<category><![CDATA[custom esp stage]]></category>
		<category><![CDATA[doc procs]]></category>
		<category><![CDATA[docproc]]></category>
		<category><![CDATA[document processing pipeline]]></category>
		<category><![CDATA[document processor]]></category>
		<category><![CDATA[dpp]]></category>
		<category><![CDATA[esp]]></category>

		<guid isPermaLink="false">http://www.tnrglobal.com/sys-admin-blog/2010/03/fast-esp-error-no-doc-procs-registered-to-process-a-batch-with-priority-0/</guid>
		<description><![CDATA[Just wanted to take this error message off of the, &#8220;Hey, we&#8217;ve seen this before&#8230; now how did we resolve this..?&#8221; pile.  This is the full text of the error:
WARNING    Could not send batch to ESP content distributor, will retry automatically.
Reason given: process() failed: exception (no_resources) no doc procs registered to
process a batch with priority [...]]]></description>
			<content:encoded><![CDATA[<p>Just wanted to take this error message off of the, &#8220;Hey, we&#8217;ve seen this before&#8230; now how did we resolve this..?&#8221; pile.  This is the full text of the error:</p>
<pre><strong><span style="color: #c19b00">WARNING</span></strong>    Could not send batch to ESP content distributor, will retry automatically.
Reason given: process() failed: exception (no_resources) no doc procs registered to
process a batch with priority 0</pre>
<p>At first glance, it looks pretty clear that you just need to [re]start your document processor(s).  However, this won&#8217;t necessarily solve the problem.  Turns out that the a likely reason for this to pop up is a bad Document Processing Pipeline (DPP) Stage.  The docprocs fire up, hit the bad stage (e.g. python errors etc.) and don&#8217;t recover.</p>
<p>To debug your DPP Stage, take a look at the logs for the document processor(s).  They&#8217;re usually located in <code>$FASTSEARCH/var/log/procserver</code> and, in our experience, there&#8217;s probably an uncaught python exception lurking somewhere in there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnrglobal.com/sys-admin-blog/2010/03/fast-esp-error-no-doc-procs-registered-to-process-a-batch-with-priority-0/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Integrate custom services with the Fast ESP Node Controller</title>
		<link>http://www.tnrglobal.com/sys-admin-blog/2010/01/integrate-custom-services-with-the-fast-esp-node-controller/</link>
		<comments>http://www.tnrglobal.com/sys-admin-blog/2010/01/integrate-custom-services-with-the-fast-esp-node-controller/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 15:11:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[FAST ESP]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[custom]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[nctrl]]></category>
		<category><![CDATA[node controller]]></category>
		<category><![CDATA[nodeconf]]></category>
		<category><![CDATA[NodeConf.xml]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sh]]></category>

		<guid isPermaLink="false">http://www.tnrglobal.com/sys-admin-blog/2010/01/integrate-custom-services-with-the-fast-esp-node-controller/</guid>
		<description><![CDATA[Background
Integrating our own custom services with Fast ESP&#8217;s Node Controller provides us with several benefits:

Administrators without in-depth ESP knowledge can easily control services (e.g. start, stop, configure parameters)
Services can be started at boot time with the rest of ESP
espdeploy can be used to install our services in a multi-node cluster

The components required for this system [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_55" class="wp-caption aligncenter" style="width: 617px"><img class="size-full wp-image-55" src="http://www.tnrglobal.com/media/wpmu/uploads/blogs.dir/3/files/2010/01/modlist.jpg" alt="Add your own services to ESP's Node Controller" width="607" height="137" /><p class="wp-caption-text">Add your own services to ESP&#39;s Node Controller</p></div>
<h3>Background</h3>
<p>Integrating our own custom services with Fast ESP&#8217;s Node Controller provides us with several benefits:</p>
<ul>
<li>Administrators without in-depth ESP knowledge can easily control services (e.g. start, stop, configure parameters)</li>
<li>Services can be started at boot time with the rest of ESP</li>
<li><code>espdeploy</code> can be used to install our services in a multi-node cluster</li>
</ul>
<p>The components required for this system are:</p>
<ol>
<li>The ESP Node Controller (with config file <code>NodeConf.xml</code>)</li>
<li>The 3rd-party service (like a <a href="http://www.cherrypy.org/" target="_blank">CherryPy</a> server, log parser, etc.)</li>
<li>A wrapper script (see below)</li>
</ol>
<h3>Steps for Integration</h3>
<ol>
<li>Define the service you would like to integrate. It can be any script or binary that can be executed on the system. For example, the service might be a python script that takes command-line arguments and continues running itself (as is the case with a webserver).</li>
<li>Create the wrapper script that sets up the proper environment and runs/stops the service properly. The wrapper script should be put in the <tt>$FASTSEARCH/bin</tt> directory (with executable permissions).  Additionally, the wrapper script should pass <tt>$@</tt> to your actual script so any/all arguments defined in <tt>$FASTSEARCH/etc/NodeConf.xml</tt> will be passed along properly from the Node Controller to your service.  The following is an example of a wrapper script:
<pre><span class="c">#!/bin/sh
</span>
<span class="c"># export the proper python path
</span><span class="nb">export </span><span class="nv">PYTHONPATH</span><span class="o">=</span><span class="s2">":/path/to/python"</span>

<span class="c"># run the script (backgrounded)
</span>python <span class="nv">$FASTSEARCH</span>/lib/python2.6/yourmodule/yourservice.py <span class="nv">$@</span> &amp;

<span class="c"># determine the process id of the python script
</span><span class="nv">SCRIPT_PID</span><span class="o">=</span><span class="s2">"$!"</span>

<span class="c"># upon receiving a SIGTERM, forward it to the process
</span><span class="nb">trap</span> <span class="s2">"kill -TERM $SCRIPT_PID"</span> SIGTERM

<span class="c"># wait for SIGTERM from <span class="searchword0">nctrl</span>
</span><span class="nb">wait</span></pre>
</li>
<li>Define the service in <tt>$FASTSEARCH/etc/NodeConf.xml</tt><br />
Add the following to the end of the <tt>&lt;startorder&gt;</tt> tag:</p>
<pre><span class="nt">&lt;proc&gt;</span>servicename<span class="nt">&lt;/proc&gt;</span></pre>
<p>Add the following to the end of the <tt>&lt;node&gt;</tt> tag, customizing as appropriate:</p>
<pre><span class="c">&lt;!-- My Custom Service --&gt;</span>
<span class="nt">&lt;process</span> <span class="na">name=</span><span class="s">"servicename"</span> <span class="na">description=</span><span class="s">"My Custom Service"</span><span class="nt">&gt;</span>
        <span class="nt">&lt;start&gt;</span>
                <span class="nt">&lt;executable&gt;</span>binaryname<span class="nt">&lt;/executable&gt;</span>
                <span class="nt">&lt;parameters&gt;</span>-p 16940 -v<span class="nt">&lt;/parameters&gt;</span>
                <span class="nt">&lt;port</span> <span class="na">base=</span><span class="s">"4004"</span><span class="nt">/&gt;</span>
        <span class="nt">&lt;/start&gt;</span>
        <span class="nt">&lt;outfile&gt;</span>servicename.scrap<span class="nt">&lt;/outfile&gt;</span>
<span class="nt">&lt;/process&gt;</span></pre>
</li>
<li>Reload the Node Controller configuration with the following:
<pre><span class="searchword0">nctrl</span> reloadcfg</pre>
</li>
</ol>
<p>And that&#8217;s it!  Now you should be able to start, stop, configure, and deploy your services using Fast ESP tools.  Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnrglobal.com/sys-admin-blog/2010/01/integrate-custom-services-with-the-fast-esp-node-controller/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A better way to add or update MySQL rows</title>
		<link>http://www.tnrglobal.com/sys-admin-blog/2009/10/a-better-way-to-add-or-update-mysql-rows/</link>
		<comments>http://www.tnrglobal.com/sys-admin-blog/2009/10/a-better-way-to-add-or-update-mysql-rows/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 16:05:46 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[key]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[on duplicate key update]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.tnrglobal.com/sys-admin-blog/2009/10/a-better-way-to-add-or-update-mysql-rows/</guid>
		<description><![CDATA[Recently, we needed to iterate over a fairly large data set (on the order of millions) and do the ever-common If it&#8217;s not in the database, put it in.  If it&#8217;s already there, just update some fields. It&#8217;s a pattern that is very common for things like log files (where, for example, only a timestamp [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, we needed to iterate over a fairly large data set (on the order of millions) and do the ever-common <em>If it&#8217;s not in the database, put it in.  If it&#8217;s already there, just update some fields.</em> It&#8217;s a pattern that is very common for things like log files (where, for example, only a timestamp needs to be updated in some cases).</p>
<p>The obvious way of doing a SELECT, followed by either an UPDATE or an INSERT is too slow for even moderately-large datasets.  The better way to accomplish this is to use MySQL&#8217;s <a href="http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html" target="_blank">ON DUPLICATE KEY UPDATE</a> directive.  By simply creating a unique key on the fields that should be different per-row, this syntax provides two specific benefits:</p>
<ul>
<li>Allows batch (read: transaction) queries for large data</li>
<li>Increases performance overall versus making two separate queries</li>
</ul>
<p>These benefits are especially helpful when your dataset is too large to fit into memory.  The obvious drawback to this method, however, is that it may put additional load on your database server.  Like anything else, it&#8217;s worth testing out your individual situation but, for us, ON DUPLICATE KEY UPDATE was the way to go.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnrglobal.com/sys-admin-blog/2009/10/a-better-way-to-add-or-update-mysql-rows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL Error: BLOB/TEXT used in key specification without a key length</title>
		<link>http://www.tnrglobal.com/sys-admin-blog/2009/10/mysql-error-blobtext-used-in-key-specification-without-a-key-length/</link>
		<comments>http://www.tnrglobal.com/sys-admin-blog/2009/10/mysql-error-blobtext-used-in-key-specification-without-a-key-length/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 18:23:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[1170]]></category>
		<category><![CDATA[42000]]></category>
		<category><![CDATA[blob]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[indices]]></category>
		<category><![CDATA[key]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[text]]></category>

		<guid isPermaLink="false">http://www.tnrglobal.com/sys-admin-blog/2009/10/mysql-error-blobtext-used-in-key-specification-without-a-key-length/</guid>
		<description><![CDATA[Recently, I was populating a database with lines from a number of log files.  One of the key pieces of information in each of these log lines was a URL.  Because URLs can be pretty much as long as they want to be (or can they?) I decided to make the URL field a Text [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I was populating a database with lines from a number of log files.  One of the key pieces of information in each of these log lines was a URL.  Because URLs can be pretty much as long as they want to be (<a title="Microsoft thinks not!" href="http://support.microsoft.com/kb/q208427/" target="_blank">or can they?</a>) I decided to make the URL field a <code>Text</code> type in my schema.  Then, because I wanted fast lookups, I tried to add an index (key) on this field and ran into this guy:</p>
<pre>ERROR 1170 (42000): BLOB/TEXT column ‘field_name’ used in key specification
without a key length</pre>
<p>It turns out that MySQL can only index the first N characters of a <code>Blob</code> or <code>Text</code> field &#8211; but for a URL, that&#8217;s not good enough.  After talking it over with my team members, we decided to instead add a field &#8211; <code>url_md5</code>.  By storing the md5sum of each URL, we could index on the hash field and have both <strong>fast lookups</strong> and avoid worrying about domains like <a href="http://thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com">this</a> fitting into a <code>VARCHAR</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnrglobal.com/sys-admin-blog/2009/10/mysql-error-blobtext-used-in-key-specification-without-a-key-length/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>FAST ESP Overview</title>
		<link>http://www.tnrglobal.com/sys-admin-blog/2009/02/fast-overview/</link>
		<comments>http://www.tnrglobal.com/sys-admin-blog/2009/02/fast-overview/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 20:35:50 +0000</pubDate>
		<dc:creator>Michael McIntosh</dc:creator>
				<category><![CDATA[FAST ESP]]></category>

		<guid isPermaLink="false">http://www.tnrglobal.com/sys-admin-blog/2009/02/fast-overview/</guid>
		<description><![CDATA[We use FAST ESP to power a large industrial search engine listing over 1 million companies and over 3 million indexed documents and receiving millions of visitors every month. I have been working with ESP since 2003 (then known as FDS 3.2).
FAST ESP is extremely flexible and can deal with indexing many document types (html, [...]]]></description>
			<content:encoded><![CDATA[<p>We use FAST ESP to power a large industrial search engine listing over 1 million companies and over 3 million indexed documents and receiving millions of visitors every month. I have been working with ESP since 2003 (then known as FDS 3.2).</p>
<p>FAST ESP is extremely flexible and can deal with indexing many document types (html, pdf, word, etc). It has a very robust crawler for web documents and you can use their intermediary FastXML format to load custom document formats into the system or use their Content APIs. <span id="more-1"></span></p>
<p>One of my favorite parts of the engine is its Document Processing Pipeline which lets you make use of dozens of out-of-the-box processing plugins as well as using a Python API to write your own custom document processing stages. An example of a custom stage we wrote was one that looks at a web site URL and tries to identify which company it belongs to so additional metadata can be attached to a web document.</p>
<p>It has a very robust programming/integration SDK in several popular languages (C++/C#/Java) for adding content and performing queries as well as fetching system status and managing cluster services.</p>
<p>ESP has a query language called FAST Query Language (FQL) that is very robust and allows you to do basic Boolean searches (AND, OR, NOT) as well as phrase and term proximity searches. In addition to that, it has something called &#8220;scope search&#8221; which can be used to search document metadata (XML) that has a format that can vary from document to document.</p>
<p>In terms of performance, it scales fairly linearly. If you benchmark it to determine how it performs on one machine, if you add another machine it generally can double performance. You can run the system on one machine (only recommended for development), or many (for production). It is fault-tolerant (it can still serve some results if one of your load-balanced indices goes offline) and it has full fail-over support (one or more critical machines could die or be taken offline for maintenance and the system will continue to function properly)</p>
<p>So, its very powerful. The documentation nowadays is pretty good. So, you ask, what are the downsides?</p>
<p>Well, if the data you need to make searchable has a format that changes frequently, that might be a pain. ESP has something called an &#8220;Index Profile&#8221; which is basically a config file it uses to determine what document fields are important and should be used for indexing. Everything fed into ESP is a &#8220;document&#8221;, even if your loading database table rows into it. Each document has several fields, typical fields being: title, body, keywords, headers, documentvectors, processingtime, etc. You can specify as many of your own custom fields as you wish.</p>
<p>If your content maintains mostly the same format (like web documents) its not a big issue. But if you have to make big changes to which fields should be indexed and how they should be treated, you probably need to edit the Index Profile. Some changes to the index profile are &#8220;Hot Updates&#8221;, meaning you can make the change and not interrupt service. But, some of the bigger changes are &#8220;Cold Updates&#8221; which requires a full data refeed and indexing before the change takes effect. Depending on the size of your dataset and how many machines are in your cluster, this operation could take hours or days. Cold Updates are a pain to schedule unless you have plenty of cash for extra hardware that you can bring online while your production systems are performing a cold update and reloading the data. Having to do that on production clusters more than once or twice a year requires a fair amount of planning to get right with minimum or 0% downtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnrglobal.com/sys-admin-blog/2009/02/fast-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

