Migration from Microsoft FAST to Apache Lucene Solr

Is your company using Microsoft FAST ESP on a Linux platform?  Unfortunately, Microsoft announced in 2010 they will cease technical support for FAST ESP 5.3 after it’s 5 year life cycle for anyone using Linux as their operation system. Migration to another search platform will be a priority, and business leaders and technology professionals are looking closely at Lucene Solr as a solution.

We can assist your organization in any stage of a migration. We can perform an evaluation of your current architecture, draft a plan for migration, work with your internal team on the migration or just consult as needed. Whatever your specific needs are, we can help you achieve your goals. Read our White Paper released February 2012 that presents a Case Study on migration. The paper discusses:

  • Loading millions of documents into Solr indexes
  • Evaluation and recommendations for tools to bridge the features gap
  • Migrating custom pipeline code to Pypes with minimal changes
  • Proven ROI after a complete migration

Additionally, we have presented on the subject of FAST ESP to Lucene Solr migrations for the Lucene Revolution Conference in Boston, MA (2010 Slides: Migration from FAST ESP to Lucene Solr (PDF) (pdf:4,067,091) ) and at the Apache Lucene Eurocon (web site dead) Barcelona (October 2011). Watch our VP of Search Technology Michael McIntosh’s presentation on FAST to Lucene Solr Migration below. If you like what you see, contact us to explore a Solr migration solution.


                                                                                                                     

Slide presentation 

From Microsoft FAST to Lucene/Solr – Barcelona

Fast_ESP_to_Lucene_SolrTNR Global presented at the Apache Lucene Eurocon in Barcelona, Spain. Michael McIntosh, VP of Enterprise Search Technologies, spoke on the migration from Microsoft FAST ESP to Lucene/Solr open source search.

View the presentation
Migration from Microsoft FAST to Apache Lucene Solr
.

Our White Paper on Microsoft FAST ESP to Lucene/Solr will be released in January, 2012.  To receive your free White Paper, email contact information to fast2solr@tnrglobal.com or subscribe here to receive the White Paper and our newsletter on FAST to Lucene Solr Migration.

Microsoft FAST ESP

logo-FASTesp

TNR Global employs Microsoft FAST ESP (Enterprise Search Platform).  Our implementations support custom and standard formats such as text, HTML, XML, and PDF. We have configured, mirrored, scaled and maintained the ESP system in a rigorous production environment both for Linux and SharePoint.

At TNR Global, we implement and customize the Microsoft FAST Search solution to empower our customers to reach their business goals.

Our expertise with Fast ESP

  • Configuring, mirroring and scaling Microsoft FAST ESP systems using various architectural layouts
  • Custom document tagging pipeline stage development for associating database content with web content based upon document URL
  • Custom dependency-based content build and feeding systems
  • Access to low-level undocumented ESP XML-RPC APIs for better integration
  • ESP benchmarking and performance tuning
  • FAST Index XML repartitioning tools for content volume scaling
  • Proven ESP content backup & recovery techniques
  • Handling of extensive or unplanned system changes without impacting service availability
  • Ajax / Web 2.0 ESP-Suggest functionality integration that uses actual ESP query logs
  • Seamless handling of hardware failure through service mirroring and failover modes
  • Expertise with low-level search engine architecture and search/relevancy algorithms

See an example of the Microsoft FAST ESP Search implementation by TNR Global and CMG at ThomasNet.

TNR has worked with the FAST ESP product since 2004, from version FAST Data Search (FDS) 3.2 up through version FAST ESP 5.3. In 2007, FAST Search and Transfer was acquired by Microsoft. It is Microsoft’s plan to use powerful FAST style technology for their public search engine, Bing. FAST’s flexible and scalable enterprise search platform elevates the search capabilities of enterprise customers and connects people to the relevant information they seek regardless of medium. This drives revenues and reduces total cost of ownership by effectively leveraging IT infrastructure. FAST ESP is known for its scalability, relevancy, and reliability. More than 2,600 customers worldwide use FAST solutions. Contact us for a free consultation.

Microsoft FAST ESP Overview

February 10, 2009
By Michael McIntosh, Senior Search Software Engineer
TNR Global
We use Microsoft FAST ESP to power a large industrial search engine listing over 1 million companies and over 3 million indexed documents and receiving millions of visitors every month. I have been working with ESP since 2003 (then known as FDS 3.2).
Microsoft FAST ESP is extremely flexible and can deal with indexing many document types (html, pdf, word, etc). It has a very robust crawler for web documents and you can use their intermediary FastXML format to load custom document formats into the system or use their Content APIs.
One of my favorite parts of the engine is its Document Processing Pipeline which lets you make use of dozens of out-of-the-box processing plugins as well as using a Python API to write your own custom document processing stages. An example of a custom stage we wrote was one that looks at a web site URL and tries to identify which company it belongs to so additional metadata can be attached to a web document.
It has a very robust programming/integration SDK in several popular languages (C++/C#/Java) for adding content and performing queries as well as fetching system status and managing cluster services.
ESP has a query language called FAST Query Language (FQL) that is very robust and allows you to do basic Boolean searches (AND, OR, NOT) as well as phrase and term proximity searches. In addition to that, it has something called “scope search” which can be used to search document metadata (XML) that has a format that can vary from document to document.
In terms of performance, it scales fairly linearly. If you benchmark it to determine how it performs on one machine, if you add another machine it generally can double performance. You can run the system on one machine (only recommended for development), or many (for production). It is fault-tolerant (it can still serve some results if one of your load-balanced indices goes offline) and it has full fail-over support (one or more critical machines could die or be taken offline for maintenance and the system will continue to function properly)
So, its very powerful. The documentation nowadays is pretty good. So, you ask, what are the downsides?
Well, if the data you need to make searchable has a format that changes frequently, that might be a pain. FAST ESP has something called an “Index Profile” which is basically a config file it uses to determine what document fields are important and should be used for indexing. Everything fed into ESP is a “document”, even if your loading database table rows into it. Each document has several fields, typical fields being: title, body, keywords, headers, documentvectors, processingtime, etc. You can specify as many of your own custom fields as you wish.
If your content maintains mostly the same format (like web documents) its not a big issue. But if you have to make big changes to which fields should be indexed and how they should be treated, you probably need to edit the Index Profile. Some changes to the index profile are “Hot Updates”, meaning you can make the change and not interrupt service. But, some of the bigger changes are “Cold Updates” which requires a full data refeed and indexing before the change takes effect. Depending on the size of your dataset and how many machines are in your cluster, this operation could take hours or days. Cold Updates are a pain to schedule unless you have plenty of cash for extra hardware that you can bring online while your production systems are performing a cold update and reloading the data. Having to do that on production clusters more than once or twice a year requires a fair amount of planning to get right with minimum or 0% downtime.  Learn more about some of the ways we help our customers get the most from their FAST installations.
 

Open Source Search Solutions

TNR Global provides enterprise search implementation services throughout the entire implementation cycle.
We help evaluate different vendor options, audit existing solutions, implement new solutions, upgrade existing solutions, and provide ongoing support for implemented solutions.

We specialize in Lucene Solr development and implementations. We also have experience with other open source search systems: ElasticSearch for Big Data, SearchBlox, Sphinx, Hadoop, HBase, Lemur/Indri, Nutch, SWISH-E, and OpenFTS. Contact us for a free consultation.

solr_FCelasticsearch_smalllucene_logo1             hadoop_small

 imagesCABWQ4PZ                  logo_redhat               logo_mysql

 openfts                   logo_lemur_sm              logo_linux

      

.