Lucene – TNR Global

Enterprise Search Summit Fall 2012

schedule some time to talk with us one on one with a search problem you might be having, at #ESS12

We are pleased to announce that we will be attending the Enterprise Search Summit in Washington, DC on October 17-19, 2012. The conference will be held at the Renaissance Washington, DC Hotel. The theme for the conference is “Strategies to Hit Your Moving Targets” and discusses ‘ issues of findability, open source, cloud search, best practices, and other topics of concern to search practitioners.’ From TNR, Michael McIntosh, VP of Search Technology will be attending, along with Director of Business Development Karen Lynn.

Additionally, we will be attending the inaugural DC Search Meetup Group on Wednesday, Oct 17 from 6:30-8. The topic for the Meetup is “What’s Your Search Story.” We’re looking forward to meeting new friends and colleagues during both events.

If you’d like to schedule some time to talk with us one on one with a search problem you might be having, simple email us at Karen@tnrglobal.com or DM us via Twitter @TNRGlobal . We’ll also be tweeting live from the conference using hashtag #ESS12.

Introduce yourself if you happen to make the conference, we’re happy to meet you!

White Papers

Research is essential to choosing the right search technology for your organization. With new and complex technologies emerging every day, it can be difficult to navigate the path that will lead you to the right solution. Staying abreast of new tools and technology is a top priority for our team of software developers. We have reported on many tools and search technologies, their components and applications in a series of White Papers. Our papers are geared for the technology professional but often have enough business cases that illustrate how improving search can be of value to the needs of the business like analyzing large sets of data, better site performance, higher customer satisfaction and an overall healthy bottom line.

To indicate your interest in receiving our White Papers on the following subjects, click the headings below to enter your email. From time to time we may update the paper or send relevant information on the subject, but by no means will we inundate you with email. Our White Papers are free when you sign up.

Thank you for your interest in our work. Please feel free to contact us to discuss your specific needs. We’re happy to discuss your options.

White Papers Published 2012:

Elasticsearch Evaluation

Reporting on Large Data Sets with Elasticsearch

Bridging the Gap: A Migration Path from Fast ESP to Lucene Solr

Museum Collection Search

And if you are just getting started or learning about Enterprise Search on a smaller scale, you can view these White Papers as well, no email necessary.

White Papers Published 2010-2011:

TNR Global to be Gold Sponsors of Lucene Revolution 2012 Boston

“We are thrilled to be Gold Sponsors at Lucene Revolution.” said Karen E. Lynn, Director of Business Development. “TNR Global has been a supporter of Apache Lucene/Solr for three years now and we are excited to be a part of the Solr community.”

Hadley, MA–March 26, 2012 TNR Global announced today that they will be sponsoring the Lucene Revolution Boston this year held at the Royal Sonesta Hotel in Cambridge, MA.

“We are thrilled to be Gold Sponsors at Lucene Revolution.” said Karen E. Lynn, Director of Business Development. “TNR Global has been supporters of Apache Lucene/Solr for three years now and we are excited to be a part of the Solr community.”

The conference, which is the largest conference dedicated to Apache Lucene/Solr open-source search technology community, is in it’s 3rd year. TNR has sponsored the conference beginning in 2010. The conference will attract 400+ attendees and will offer training, presentations, trade show, and opportunities to socialize with the vibrant Lucene Solr community.

The conference will be held May 7-10, 2012

TNR invites all attendees to stop by the conference table to sign up for the free White Paper on one of our primary focuses: Migrating from Fast ESP to Apache Lucene Solr. Meet one of the authors of the paper and discuss the ways Lucene Solr can power search in the organization. Representatives from TNR will be on hand to meet, chat, and discuss the many advantages of Solr for search.

“We’re looking forward to meeting others in the Solr community. We expect the conference to be one of the high points of our year.” said Lynn.

Fast to Lucene Solr: Choosing a Document Processing Pipeline for Solr

If we want to leverage the power that Solr offers, but we need support for a more robust document processing framework, what are our options?

One of the most powerful features of FAST ESP is its flexible document processing engine. The engine that ships with FAST ESP supports multiple document processing pipelines that comprise of multiple document processing stages. A document processing stage performs a document processing task and can add, modify or remove elements from a document before it is passed to the next stage in the pipeline. A simple example of processing stage would be one that processes a document’s URL element, ESP ships with many processing stages and several processing pipelines out of the box for handling both structured and unstructured documents. FAST ESP document processing engine also provides a Python plugin API to allow customers to create custom processing stages of their own, which is a feature we use heavily for our customer ESP installations.

Unfortunately, Solr does not offer the same robust support for document processing pipelines that ESP does. The ESP processing pipeline is document-centric while the Lucene Solr platform is field-centric. When a document is fed to ESP for processing, it is routed to processing stages in a processing pipeline that can access document elements generated by previous processing stages. This allows for complex and optimal operations that can leverage previous processing, such as reuse of a previously generated HTML DOM tree structures. When a document is passed to a Solr update handler, the document is broken up into a set of individual fields. Each field can have a set of processors known as Solr Analysis Filter that can be chained together for field processing before indexing occurs. While this is fine for content that has been heavily processed before being sent to Solr, individual filters lack the same level of access to other documents elements to easily support more complex processing behaviors.

Another difference between ESP and Solr platforms is that ESP’s document processing architecture allows it to be scaled independently from its indexing architecture. ESP’s document processing architecture is fully decoupled from its indexing architecture and is designed out-of-the-box to take advantage of multiple processor cores per machine and multiple document processor machines per cluster. Solr’s out-of-the-box document processing architecture is tightly coupled with its indexing architecture, making it difficult to independently scale Solr’s content processing capacity without adding the complexity and overhead of additional Solr services and Lucene indexes. When we work with multiple terabyte document sets, we find content processing tends to be the biggest bottleneck, so being able to scale content processing ability separately from indexing is mission critical.

If we want to leverage the power that Solr offers, but we need support for a more robust document processing framework, what are our options? There are quite a number of content processing frameworks we can chose from that we discovered during the course of our research. Some of the options currently available include, but are not limited to OpenPipeline OpenPipe, Pypes, UIMA, SMILA , Apache Commons Pipeline, Piped, Behemoth, and Cascading.

Most of these frameworks are written in Java which gives them access to an incredibly broad and diverse spectrum of Java libraries. Since Solr and Lucene are also written in Java, it might make a lot of sense to favor a Java processing framework from scratch, especially if you are more comfortable with Java as a programming language.

Since our clients tend to have highly customized document processing pipelines with many custom FAST ESP Python processing stages, we are heavily biased towards choosing a framework that minimizes the amount of code that would need to be migrated. Many of the available processing frameworks are written in Java, which would be fine if you prefer using Java and don’t have a large amount of currently working Python code to migrate. For our use cases, the decision of which framework to chose was incredibly simple given the option, so we chose Pypes for our migration solution.

For a full report on how we use Pypes for a Document Processing Engine including sample code, sign up for our free FAST to Lucene Solr White Paper here.

We’re at Apache Lucene EuroCon in Barcelona 2011

“We’re certain that the urgency to migrate off FAST ESP will be ramping up significantly.”

We’re very excited to be in attendance at the Apache Lucene EuroCon in Barcelona October 17, 18, 19, and 20th, 2011. Our own Michael McIntosh, VP of Search Technologies will be presenting a talk on October 19th, Enterprise Search: FAST ESP to Lucene Solr. The good folks at Lucid Imagination are presenting the conference and will be video recording his talk for future broadcast.

After the conference, Michael will author a White Paper on migrating from FAST ESP to Lucene Solr, expected in November 2011. For a free copy of the White Paper, email us expressing your interest at fast2solr@tnrglobal.com. We believe that those businesses operating on a Linux system will be seeking out the power of Lucene Solr as their licenses expire and support for FAST ESP dries up. We’ve worked with FAST ESP for 7 years and understand it’s strengths and weaknesses. We know businesses who are used to the power of FAST ESP will need something just as powerful, and Lucene Solr is a very nice fit. “It’s a robust platform, capable of a lot that FAST ESP covers,” said Michael. “We’re certain that the urgency to migrate off FAST ESP will be ramping up significantly.”

FAST ESP to Lucene Solr Presentation: Open Call for Questions

To pre-load the discussion on Michael’s Enterprise Search: FAST ESP to Lucene Solr talk, send your questions to: fast2solr@tnrglobal.com We want to hear from you!

TNR Global is excited to be participating in the Apache Lucene EuroCon conference in Barcelona. Our own Michael McIntosh is scheduled to present: “Enterprise Search: FAST ESP to Lucene Solr” Here is your chance to pre-load the discussion. Before Michael puts the final touches on his talk, he wants to know what issues or questions you may be have. In the following video, he touches on some of the highlights of his upcoming talk, and asks for your input.

Enterprise Search: FAST ESP to Lucene Solr pre-conferece video - Click to Watch — Enterprise Search: FAST ESP to Lucene Solr pre-conf video

To participate in advance, send you questions or comments to: fast2solr@tnrglobal.com. While Michael cannot promise he will include your question or commentary in his actual talk, he will work to address them in an upcoming White Paper, to be released after the conference in November 2011. We look forward to hearing from you!

TNR Global to present at Apache Lucene Eurocon 2011 in Barcelona

We are happy to announce that TNR Global’s own Michael McIntosh will be presenting at the Apache Lucene Eurocon 2011 in Barcelona this October. Michael’s talk is titled “Enterprise Search: FAST ESP to Lucene Solr.” His presentation will discuss migration from the FAST ESP platform to a Lucene Solr search platform. There are many reasons an IT department with a large scale search installation would want to move from a proprietary platform to Lucene Solr. In the case of FAST Search, the company’s purchase by Microsoft and discontinuation of the Linux platform has created an urgency for FAST users. Illustrated through actual case studies, the presentation will include challenges and concerns, present solutions and work-arounds to overcome migration issues.

Michael has more than 16 years of experience in large scale systems design and operation, online consumer product development, high volume transaction processing and engineering management. He has extensive experience developing, integrating and maintaining search technology solutions for companies such as FAST Search and Lycos.

We’re excited that Michael will be presenting in Barcelona this fall. Please introduce yourself if you’re able to go!

Migration from FAST ESP to Lucene Solr

Download the presentation and see the video.

Michael McIntosh, Vice President of Enterprise Search Technologies at TNR, spoke at the Lucene Revolution conference in Boston, MA October 7-8, 2010. Michael reviewed the migration from Fast ESP to Lucene/Solr open source search. He discussed approaches to identifying core content areas of HTML documents such as Text-To-Tag Ratio Heuristics and Page Stereotype/Site Template Analysis, and reviewed specific use cases that we have encountered as search integration experts and discuss available tools.

TNR Global was a sponsor of Lucene Revolution. The conference gathered over 400 professionals from the enterprise search industry. We were happy to see so much interest in Lucene/Solr open source search, and get to know and learn from the folks who have done large scale implementations, including Twitter, LinkedIn, and eHarmony. Not surprisingly, there was a lot of interest about migration from proprietory search systems to Solr, especially from FAST ESP due to Microsoft’s discontinuing FAST ESP support for Linux. If you would like to learn more about how a migration from FAST ESP to Lucene Solr can benefit your company, contact us for a free consultation.

Lucene Solr Services

Solr is a powerful open source, scalable, cross-platform search engine. Solr has high performance features comparable to proprietary search engines like faceted search, full text search, rich document handling and dynamic clustering. TNR Global is a regular presenter at Lucene Solr Conferences worldwide and an active member of the Solr open source community. Since Solr is open source technology, the source code is free. Contact us to implement and integrate this robust search engine into your organization.

TNR Global offers Lucene Solr consulting and integration services for:

Software or SaaS System developers
IT/MIS system administrators
Corporate data administrators
Current Linux based FAST ESP users
Marketing departments of content intensive web sites

Our Services with Lucene Solr:

We integrate solutions using Lucene Solr for commercial grade Lucene Solr products through our partners at Lucid Imagination. We also develop tailor made solutions using Lucene Solr for the following:

Crawling web resources: pages and documents, forums, blogs
Content processing and conversion
Content enhancement and extraction
PDF search by page
Alternative search for SharePoint, email
Search and database integration
Audits and Upgrades to your current Solr installation

We leverage the power of Lucene Solr combined with the latest content enhancement approaches to provide more diversified search service offerings for clients. Years of hands-on experience give us an advantage; we understand the issues and the subtle nuances of data. Contact us for a free consultation.