Building for Enterprise Search: A Systems View, Part 2

“It’s important to incorporate expected behaviors into modeling and monitoring on both applications and systems sides and how they interact with one another.”

When we left off, Michael Klatsky, VP of Systems Administration was telling me how important communication between the systems side and search side of is to developing an enterprise search solution. The process of building, testing, monitoring, adjusting, more testing, and more monitoring ensures systems function that way they are intended to function. Let’s resume our conversation where Michael discusses the tools he uses to ensure the system he’s building works the way the client wants it to. This is the second portion of a two part blog post.
*********************************************************************************************************************
Tools for BDD: Part 2

Karen: It’s sounding like the Search Team and Sys Admin Team need to have a good relationship and communicate often to ensure the system will accommodate the work the search team does.

Michael: Yes, search sometimes has to construct their scripts to conforms to systems. Testing is run on both sides, but small changes can affect others down the line, so it’s important to incorporate expected behaviors into modeling and monitoring on both applications and systems sides and how they interact with one another.

Karen: How do you make sure that happens?

Michael: We’re exploring some tools to help us make sure the machine will act just as we expect it to, like cucumber and cucumber nagios We’re using certain tools to facilitate the systems behaves in the way that we expect it to. We’re exploring cucumber for basic modeling and for testing. Cucumber is cool for testing because it returns values to you in colors. Red, meaning it failed, yellow meaning there’s problem, and green meaning its good. According to their docs, they instruct you to “keep running it until it’s a cucumber.”

Karen: Ah, I get it.

Michael: Right. And what cucumber nagios does is it takes cucumber and allows you to create a nagios monitoring check script. So if you pass, great, if you god red, nagios will throw an alert to the systems administrator so we have an opportunity to fix it before more is built.

Karen: Sounds like it’s an attentive way to build a system.

Michael: The only way to scale is to have machines do things for themselves. That’s the way to do it.

Karen: To automate.

Michael: Yes. Automation. Not to just set things up to automatically do configuration management beforehand, but to test afterwards to determine that your machine is behaving just as you (and your client) envisioned it.

For more information on how you can plan your enterprise search in cooperation with your systems administration team, contact us for a free consultation.

Open Source Search: Isn’t It Expensive?

You’ve heard the debate on open source search vs. proprietary search. One question that constantly comes up for prospective clients is “What’s all this going to cost me?”

In these times, it’s a good question. Because proprietary has neatly packaged, practically shrink wrapped plans, it’s much easier to discern how much you will spend on a solution. But how much will it cost? That’s an entirely different question.

I see you cocking your head sideways.

Proprietary search has hidden costs. What if the software doesn’t perform the way you need it to? Does the software understand the nuances of your business? How adaptable is it? How much will it cost to adapt that software to get it to perform the way my business needs it to? Questions like this need to be asked, and answered. Eventually you will ask yourself….why am I paying for all of this? And your developer will ask, “why can’t I access the source code?”

What I’m getting at is this: It is a reassuring feeling for a customer to see what a package costs, to understand what services you will get with a solution, and to anticipate what the licensing fee will cost on an annual basis. If it’s your job to research a solution and present findings to your executive team to make a decision, then proprietary search, on the surface, seems a more secure choice. But rarely, if ever, are these solutions a perfect fit for the customer. It’s like buying a Ferrari, with all the brand recognition and polish a Ferrari offers, and not ever driving it past second gear, or cutting the wheel more than 15 degrees, or getting a chance to have your trusted mechanic look under the hood. This is why open source is such a good solution for businesses who want their IT to move quickly.

We’re hearing more buzz about companies waking up to the agility of an open source solution. Most recently, with the acquisition of Autonomy by HP, the industry is telling stories of ex Autonomy customers migrating to Solr (open source search) with only the annual licencing budget to finance the migration. Without an annual expenditure of cash for licensing, and the freedom of not being under a licensing agreement, companies quickly recoup the initial expenditure of a migration.

What kind of car does your company drive?

If you are examining the different choices for implementing search technology in your organization, contact us.  We’re happy to talk to you about the best solution for your business.


Migration Still Looms Large on the Horizon for FAST ESP Customers

“Designing a non-trivial search solution to fully meet your needs from scratch is hard enough on its own. If you are migrating an existing solution, it is very unlikely that you will find a one to one mapping of all of the features in a new search engine that you have come to depend upon with your existing implementation.” –Michael McIntosh, VP of Search Technologies, TNR Global, LLC

Microsoft acquired FAST all the way back in 2008 and then in early 2010 disclosed it’s plans to stop updating the FAST product on a Linux operating system after 2010, making FAST ESP 5.3 the latest and greatest, and very last update Linux users will see involving any improvements to the proprietary search platform. It was clear to anyone on Linux that a migration would need to occur, and as content grows, depending upon the size of your organization, that migration should probably happen sooner than later.

Buzz about migration ensued–an inevitable certainty for many companies, especially ones with huge amounts of data. But how many companies have jumped in with both feet? I had the opportunity to speak with an open source search engine expert who, along with the industry, believed that the move from Microsoft was a windfall for anyone in the business of enterprise search design and implementation. However, she admitted “we haven’t seen as large a response as we expected.”

This isn’t exactly surprising to everyone. “It’s coming” says our VP of Search Technologies, Michael McIntosh. “Corporations have an enormous investment in FAST ESP and it makes sense that they would be reluctant to move to something new until they absolutely have to.” That means, when their licenses expire.

“They will likely weigh the performance and support, or lack thereof, for the FAST ESP technical team with the timing of renewing a license and wait until they absolutely have to change to something else,” says McIntosh.

The purchase of Autonomy and the shift of HP from hardware to software could signal a recognition from Goliath HP the kind of growth opportunity enterprise search software offers, and that the “great shift” from FAST ESP to another search platform is very much on the horizon.

But as the clock continues to tick, companies using FAST ESP should be strategizing for migration now. “It’s an enormous undertaking to migrate an entire search solution from FAST to another platform. Designing a non-trivial search solution to fully meet your needs from scratch is hard enough on its own. If you are migrating an existing solution, it is very unlikely that you will find a one to one mapping of all of the features in a new search engine that you have come to depend upon with your existing implementation. Solving challenging issues like that requires both creativity and expertise to address your needs.” says McIntosh. If a need for migration is eminent, there will be a real need for expertise in the field of enterprise search on both proprietary and open source platforms, depending upon several factors like size, in house talent, and growth expectations.

How is your company preparing for the discontinuation of support of FAST ESP?  Need guidance?  Contact us for pointers, analysis, or architecture for a full migration.

TNR Global to present at Apache Lucene Eurocon 2011 in Barcelona

We are happy to announce that TNR Global’s own Michael McIntosh will be presenting at the Apache Lucene Eurocon 2011 in Barcelona this October.  Michael’s talk is titled Enterprise Search:  FAST ESP to Lucene Solr.” His presentation will discuss migration from the FAST ESP platform to a Lucene Solr search platform. There are many reasons an IT department with a large scale search installation would want to move from a proprietary platform to Lucene Solr. In the case of FAST Search, the company’s purchase by Microsoft and discontinuation of the Linux platform has created an urgency for FAST users. Illustrated through actual case studies, the presentation will include challenges and concerns, present solutions and work-arounds to overcome migration issues.

Michael has more than 16 years of experience in large scale systems design and operation, online consumer product development, high volume transaction processing and engineering management. He has extensive experience developing, integrating and maintaining search technology solutions for companies such as FAST Search and Lycos.

We’re excited that Michael will be presenting in Barcelona this fall.  Please introduce yourself if you’re able to go!

Chris Miles Joins TNR Global as Senior Software Engineer

HADLEY, MA–July 18, 2011—TNR Global is pleased to welcome Chris Miles in the role of Senior Software Engineer on TNR’s Search Team. Miles will be responsible for designing and implementing custom enterprise search software for TNR’s clients. Among other projects, Chris will also develop solutions for one of TNR’s largest clients, a publisher of manufacturing parts and vendors. He is proficient in software languages Java, C++, C, Ruby, PHP, and CSS.

Prior to joining TNR Global, Miles was a Senior Systems Analyst at Cooley Dickinson Hospital. He has held consulting roles for CarePaths Inc., and was a Senior Developer for Miller Samuel Inc.

“We’re excited about the addition of Chris to TNR. He brings a wealth software development expertise to our team” said Michael McIntosh, VP of Search Technologies of TNR Global, LLC.

Join us as we extend Chris a warm welcome to the TNR team!

Continuous Integration for Large Search Solutions

Managing large projects takes a smart approach and some intuitive thinking. One project we are currently engaged in is with large publisher of manufacturing parts. This has been an extraordinary project due to its scale and ever changing scope. I spoke with our VP of Enterprise Search Technologies, Michael McIntosh about how TNR Global handles complex projects.

Karen: This project is a big one. Tell me more about the site’s function. What is the focus?

Michael: Product search is the focus. The site contains tens of millions of documents, both structured and unstructured content. They also have a huge amount of data provided by the advertisers and the companies themselves on products that they sell. One of the advantages we have over a search engine like Google is access to a vast amount of propriety data provided by the vendors themselves.

Karen: Tell me about how you are managing the project.  What are some of the variables you work with?

Michael: With this particular project, we are dealing with many different data feeds. There are many different intermediary metadata stages we have to generate to support the final searchable content.  The client also changes their business logic frequently enough that if it takes a month or more between data builds its likely something has changed. For instance, they might have changed an XML format or added an attribute to an element in the data feed that will break something else down the line. The problem is there are so many moving parts, it’s almost impossible to do it manually and always do it correctly.

Karen: What other kinds of business logic changes are you dealing with in top of the massive amounts of raw data?

Michael: Most of the business logic changes are when they need to modify how something behaves based on new data that’s available, or when they need to start treating the data in a different way.  Sometimes there is a change in the way they want the overall system to behave. They sometimes have some classification rules for content they like to tweak occasionally.

Another thing we consider is the client’s relevancy scoring and query pre-processing rules. So you need to consider if you issue a query and it fails, what happens then? What kind of fallback query do you use?  All these things are part of the business logic that is independent of the raw data. In summary, we have the raw data and we can do a number of things with it. They often want us to change exactly what we’re doing with it, how we’re conditioning it, and how we’re transforming it. We either tweak what exists or take advantage of new data that they’ve started including in their data feeds. The challenge is all these elements can change frequently.

Karen: This site is more of a portal than strictly an enterprise search project, isn’t it?

Michael: Yes. Enterprise search usually refers to searching for documents within an organization. This client is a public facing search engine that allows the public to perform product search across a very large number of vendors and service providers.

Changes come from their advertisers and data they provide. Advertisers come and go. People pay for placement within certain industrial categories. It’s not like we get a static list of sites to crawl and that’s that. It changes weekly, sometimes daily. This list of sites we crawl is on a weekly or daily basis. Also things need to be purged from the index. Say an advertiser’s contract ends and suddenly we need to stop crawling a site with thousands documents; that data needs to be purged from the index promptly. Not only do we have to crawl new sites but purge old ones as well. This is a project that is so massive that it’s not cut and dried. A lot of software development projects focus on a clear cut problem, come up with plan, tackle it, release it, and then maintain it. We’re constantly getting new information and learn new things about people hitting the site.

Karen: So this sounds like this project is always in a state of ongoing development.

Michael: We are building something that’s never been built before. One of the goals is to make this site remarkable. And we’re very excited to be a part of that. The scale of the project is quite big though, which is why we started using Continuous Integration.

The way our cycles work is we perform big data updates, but by using CI, we can continuously update and integrate new data. We’re moving to a place, by using the practice of CI, we can perform a daily builds which gives us the time we need to fix problems before we absolutely need it to be live.

Karen: How do you implement CI into your day to day management of the project?

Michael: There are some pretty great open source tools that we’re using to implement CI. We use Jenkins to help us do Continuous Integration for frequent data builds, which is an intensive process for this particular client.

We field questions from the client about the status of different data builds. We hope to use Jenkins in conjunction with other tools to automatically build data and have event-based data builds. We’re looking at a way to have it triggered by some other event and have Jenkins automatically generate reports as the data is being built. Each time we run a build script, if the output differs from the previous build, Jenkins makes it easy for you to see that something is different. There is a way to modify your output that Jenkins can understand.  One of the cool things about Jenkins is they have graphs that illustrate differences to help us identify issues that could pose a potential problem and let us fix it before we need to go live with the data.

Karen: Any other tools?

Michael: For multi-node search clusters, we’re using a tool called fabric3 that uses SSH to copy data and execute scripts across multiple nodes of a cluster based upon roles. We have a clever set up where we’re able to inform fabric3 what services are running on each node in our cluster and have actions or commands linked to certain tasks, like building metadata.  By linking them, they automatically know which nodes to deploy data to.

Using open source tools like Jenkins and fabric3 make it a lot more manageable considering the large number of moving parts. It’s allowed us to be successful in building this incredible site and making the search function relevant, accurate and up to date.

Migration from FAST ESP to Lucene Solr – Presentation

Download the presentation and see the video.

Michael McIntosh, Vice President of Enterprise Search Technologies at TNR, spoke at the Lucene Revolution conference in Boston, MA October 7-8, 2010. Michael reviewed the migration from Fast ESP to Lucene/Solr open source search. He discussed approaches to identifying core content areas of HTML documents such as Text-To-Tag Ratio Heuristics and Page Stereotype/Site Template Analysis, and reviewed specific use cases that we have encountered as search integration experts and discuss available tools.

from-fast-esp-to-solr

Apache Solr is a winner of this year’s Infoworld BOSSIE Award

SolrApache Solr is a winner of this year’s Infoworld BOSSIE Award for the best open source applications. Other winners included Alfresco, WordPress, and Drupal – so Solr is in good company. This award once again indicates that Solr has proved itself to be a robust search application.

http://www.infoworld.com/d/open-source/bossie-awards-2010-the-best-open-source-applications-150&current=10&last=1#slideshowTop

Michael McIntosh, TNR Global, speaking at Lucene Revolution – The First Conference Dedicated to Open Source Enterprise Search

mmphotoMichael McIntosh, Vice President of Enterprise Search Technologies at TNR, will be presenting at the Lucene Revolution conference in Boston, MA October 7-8, 2010.

Michael will review the migration from commercial search platforms (focusing on Fast ESP) to Lucene/Solr open source search. He will discuss approaches to identifying core content areas of HTML documents such as Text-To-Tag Ratio Heuristics and Page Stereotype/Site Template Analysis, and will review specific use cases that we have encountered as search integration experts and discuss available tools.

For more information see:
http://lucenerevolution.com/speakers-bios#McIntosh

TNR Global to present at Lucene Revolution Conference

Lucene Revolution Conference Logo
TNR Global has been selected to present at the Lucene Revolution conference in Boston, MA October 7-8, 2010. Michael McIntosh, Vice President of Enterprise Search Technologies at TNR, will speak on Friday, October 8th regarding the migration from commercial search platforms (focusing on Fast ESP) to Lucene/Solr open source search. He will discuss approaches to identifying core content areas of HTML documents such as Text-To-Tag Ratio Heuristics and Page Stereotype / Site Template Analysis, will review specific use cases that we have encountered as search integration experts and discuss available tools.

Lucene Revolution is the first conference dedicated to open source search in North America. The two-day conference is packed with technical sessions, developer content, user case studies, panels, and networking opportunities. Attendees will learn new ways to develop, deploy, and enhance search applications using Lucene/Solr. For more information, and to register, visit http://www.lucenerevolution.com.