Posts Tagged ‘open source’

TNR Global Launches Search Application for Museum Collections

by Karen Lynn

TNR Global Launches Search Application for Museum Collections

TNR Global is launching the alpha version of a search application designed specifically for museum collections. Museum Collections Search is any application for digitally searching a museum’s collection. This can be made available to the public or used by only the internal staff for curation, and can be made available to a selected professional or research audience. Our White Paper explains the application in more detail.

Collections Search adds tremendous value to the research community and is often in line with the educational mission of many museums. A search feature is a resource for students and researchers, and can expand the overall audience by reaching people separated by distance or with limited physical mobility. When the public finds items in your collection like historic letters, photographs of items, and other catalog items through you search function, it can increase interest and traffic to the museum’s site and physical collection.

While the ability to search a museum collection for is a way to bring immense value to the museum and the community that supports it, intellectual property is an ongoing concern for curators within the museum community. TNR Global recognizes this and has technologies to address access to material. When setting up a search, ease of use or understanding and responsiveness are addressed, also issues of ownership or privacy all combine to determine the search technology chosen and how it is applied. The search realm and results can be tailored based on the user. By defining the audience (or audiences) for the collection and the search, we can structure the presentation of the results. The public view can be a more restricted display, while protected view and can be more expansive and detailed.

We use open source search technology that works with most museum software systems and databases including the popular museum software product PastPerfect. We customize our search solution specifically for your collection and optimize search results for the most relevant results to queries.

TNR Global has a long history with the museum community. Our CEO is Principal of the organization We Love Museums and is a member of dozens of museums worldwide. He is involved with a number of archival and curatorial indexing projects. He has merged his lifelong career in database and web technology with his passion for art, education, and history with creating a search solution to benefit museums and their patrons. To get started, contact us for an evaluation of your Museum Collection Search Project today!

Elasticsearch Evaluation White Paper Released: Promising for Big Data

by Karen Lynn

There are many new technologies emerging around search, and we’ve been investigating several of them for our clients. Search has never been “easy” but Elasticsearch attempts to make it at least easier. Elasticsearch is billed to be “built for the cloud,” and with so many companies moving into the cloud, it seems like a natural that search would move there too.  This paper is designed to show you just how Elasticsearch works by setting up a cluster and feeding it data.  We also let you know what tools we use so you can test out the technology and we include a rough sketch of code as well. Finally, we make conclusions about how Elasticsearch can help with problems like Big Data and other search related uses.

Elasticsearch is an open source technology developed by one developer, Shay Bannon. This paper is simply a first look at elasticsearch and is not associated with an additonal product or variation of elaticsearch. The appeal for big data is due to elasticsearch’s wonderful ability to scale with growing content, which has largely been associated with the “big data problem” we all keep hearing about. It’s very easy to add new nodes and it handles the balancing of your data across the available nodes. It handles the failure of nodes in a graceful way that is important in a cloud environment. And lastly, we simply evaluate and test the technology. We really don’t believe there is a one size fits all technology in the realm of enterprise search, it is really highly dependent upon your systems, how many documents you have, how much unstructured data you have, and how you want your site to function. But that said– in terms of storing big data, it is as capable as any Lucene based product; it can handle a much larger load that the current Solr release as the notion of breaking the index up into smaller chunks is “baked in” to the product.

Here is an except from the paper:

“Products like Elasticsearch that lack a document processing component entirely become more attractive. In fact, most projects that involve a data set large enough to qualify as “big data”³² are building their own document processing stages anyway as part of their ETL cycle.”

If you are interested in downloading this free White Paper, sign up with us here.

If you would like help using Elasticsearch with your search project, contact us.

Open Source Search: Isn’t It Expensive?

by Karen Lynn

You’ve heard the debate on open source search vs. proprietary search. One question that constantly comes up for prospective clients is “What’s all this going to cost me?”

In these times, it’s a good question. Because proprietary has neatly packaged, practically shrink wrapped plans, it’s much easier to discern how much you will spend on a solution. But how much will it cost? That’s an entirely different question.

I see you cocking your head sideways.

Proprietary search has hidden costs. What if the software doesn’t perform the way you need it to? Does the software understand the nuances of your business? How adaptable is it? How much will it cost to adapt that software to get it to perform the way my business needs it to? Questions like this need to be asked, and answered. Eventually you will ask yourself….why am I paying for all of this? And your developer will ask, “why can’t I access the source code?”

What I’m getting at is this: It is a reassuring feeling for a customer to see what a package costs, to understand what services you will get with a solution, and to anticipate what the licensing fee will cost on an annual basis. If it’s your job to research a solution and present findings to your executive team to make a decision, then proprietary search, on the surface, seems a more secure choice. But rarely, if ever, are these solutions a perfect fit for the customer. It’s like buying a Ferrari, with all the brand recognition and polish a Ferrari offers, and not ever driving it past second gear, or cutting the wheel more than 15 degrees, or getting a chance to have your trusted mechanic look under the hood. This is why open source is such a good solution for businesses who want their IT to move quickly.

We’re hearing more buzz about companies waking up to the agility of an open source solution. Most recently, with the acquisition of Autonomy by HP, the industry is telling stories of ex Autonomy customers migrating to Solr (open source search) with only the annual licencing budget to finance the migration. Without an annual expenditure of cash for licensing, and the freedom of not being under a licensing agreement, companies quickly recoup the initial expenditure of a migration.

What kind of car does your company drive?

If you are examining the different choices for implementing search technology in your organization, contact us.  We’re happy to talk to you about the best solution for your business.


Migration Still Looms Large on the Horizon for FAST ESP Customers

by Karen Lynn

Microsoft acquired FAST all the way back in 2008 and then in early 2010 disclosed it’s plans to stop updating the FAST product on a Linux operating system after 2010, making FAST ESP 5.3 the latest and greatest, and very last update Linux users will see involving any improvements to the proprietary search platform. It was clear to anyone on Linux that a migration would need to occur, and as content grows, depending upon the size of your organization, that migration should probably happen sooner than later.

Buzz about migration ensued–an inevitable certainty for many companies, especially ones with huge amounts of data. But how many companies have jumped in with both feet? I had the opportunity to speak with an open source search engine expert who, along with the industry, believed that the move from Microsoft was a windfall for anyone in the business of enterprise search design and implementation. However, she admitted “we haven’t seen as large a response as we expected.”

This isn’t exactly surprising to everyone. “It’s coming” says our VP of Search Technologies, Michael McIntosh. “Corporations have an enormous investment in FAST ESP and it makes sense that they would be reluctant to move to something new until they absolutely have to.” That means, when their licenses expire.

“They will likely weigh the performance and support, or lack thereof, for the FAST ESP technical team with the timing of renewing a license and wait until they absolutely have to change to something else,” says McIntosh.

The purchase of Autonomy and the shift of HP from hardware to software could signal a recognition from Goliath HP the kind of growth opportunity enterprise search software offers, and that the “great shift” from FAST ESP to another search platform is very much on the horizon.

But as the clock continues to tick, companies using FAST ESP should be strategizing for migration now. “It’s an enormous undertaking to migrate an entire search solution from FAST to another platform. Designing a non-trivial search solution to fully meet your needs from scratch is hard enough on its own. If you are migrating an existing solution, it is very unlikely that you will find a one to one mapping of all of the features in a new search engine that you have come to depend upon with your existing implementation. Solving challenging issues like that requires both creativity and expertise to address your needs.” says McIntosh. If a need for migration is eminent, there will be a real need for expertise in the field of enterprise search on both proprietary and open source platforms, depending upon several factors like size, in house talent, and growth expectations.

How is your company preparing for the discontinuation of support of FAST ESP?  Need guidance?  Contact us for pointers, analysis, or architecture for a full migration.

Open Source Search Engines vs. Proprietary Search Engines

by Karen Lynn

There are plenty of articles about the pros and cons of open source software vs. proprietary software. I sat down with our VP of Search Technologies Michael McIntosh to discuss the benefits of each in terms of search engines.

Karen: You’ve been working with proprietary search engines for some time now. Tell me your thoughts on that. What’s the upside of proprietary?

Michael: I’ve worked with proprietary search engines for several years, specifically with FAST ESP since 2003 back when it was known as FAST Data Search (FDS). Proprietary software products often have better documentation, better support; more thought out design and are more aggressively tested. Because the product supports an entire company—it must succeed. They have nicer tools, nicer interfaces.

Karen: And the downsides?

Michael: “Over the years, we’ve run into a number of difficulties with proprietary search engines. One thing that comes up is that if a problem isn’t outlined in user troubleshooting documentation, it can become incredibly difficult to diagnosis and correct, and doing that is a frustrating, time intensive problem. The black box nature of the product is very limiting. If it’s not in documentation, it might as well not exist at all.”

Karen: There are gaps in the user manual?

Michael: “Yes. But in defense of FAST ESP, the documentation has improved by leaps and bounds over the years. However, one anomaly we find is that the clean, easy to read PDF form of user documentation (the original) for ESP is often not as up-to-date or helpful as the searchable online documentation—which is harder to read, but usually more current and correct. Sometimes even the online documentation is wrong—which is also frustrating. But it has become something we cope with regard to FAST”

Karen: Give me an example of what kind of problems you run into when integrating the proprietary search engine into a client’s website.

Michael: “The enterprise search platform uses Service Oriented Architecture (SOA). That is a bunch of different components that are able communicate with each other as services. With this type of architecture, it doesn’t matter which language you write something in as long as it’s a service another language can communicate with via RESTful interfaces, SOAP or something like XML-RPC. These services can all work together, despite the fact you don’t have a unified api—and that’s actually awesome.

Karen: Why?

Michael: “Indexers for one—you generally don’t want them written in an interpreted language for performance reasons. Indexing can be a CPU intensive operation, which can be a weak spot for interpreted languages such as Python or Ruby compared to languages like C/C++ or Java. It is both CPU and disk intensive process so a scripting language can be great if you’re writing an application that’s not CPU intensive because code speed doesn’t matter so much. The true slow-down is something outside the script. You can optimize the speed of the script to make it run as fast as humanly possible but you’re limited because the disk can only rotate so fast. Your indexing service can be written in a low level language like C vs. some of the other services in languages like Python or Ruby or Java and get good performance. BUT if you don’t have documentation for compiled programs that make up the search engine product you’re going to have a terrible time trying to figure out how to fix issues when they arise.

Karen: “So basically, because you have so many different languages being used that you lack the source code for, and documentation is spotty, it becomes a needle in a haystack trying to figure out where the problem occurs.”

Michael: “Yes. And this is not such a problem for anyone using a search engine for really basic applications. The place where you run into problems is if you push the search engine technology to its limits or you are using it in ways outside its typical usage, which is almost everything we do. We are always trying to get the best possible performance out of the search engine. We’re trying to get the search engine to deal with features we need but it doesn’t natively support.

Karen: “What is it you’re pushing specifically for the search engine to do?”

Michael: “One thing we want is for FAST ESP to have is a feature to deal with creating a faceted search for arbitrary fields. The way ESP works is that it has an index profile which is a statically defined set of fields that it indexes. Inside its index profile you can mark certain fields to have navigators. One of our customers deals with product verticals. They have a whole bunch of products that aren’t unified—all with completely different attributes. We’ve managed to work around these roadblocks in ESP to create faceted navigation on arbitrary fields.

Karen: “So you get creative to make it better.”

Michael: “Yes we constantly get creative to make it better to use its strengths and find ways to work around its limitations.”

“Another issue we have with ESP is we have a number of websites we need crawled and each website has metadata associated with it. Unfortunately the way the ESP crawler works, there are not many straightforward ways to preserve metadata associated with the seed URL which we use to crawl a website and pass the meta information along to any associated links. We can’t do this easily inside the ESP crawler. Since its proprietary and black box, we can’t look at the source code to the crawler, and can’t modify the source code to the crawler. When it does something mysterious, we can have no idea why it might be behaving in an unwanted way. We had one instance when we had a number of websites the crawler was temporarily blacklisting for some reason. When the ESP crawler automatically blacklists a site, it stops crawling the site for 30 minutes and then begins again after 30 minutes. We learned one thing that triggers blacklisting is if a website has HTTP 503 errors. If a site has more than 20 or so of those errors, the crawler temporarily blacklists the site. The problem was that the documentation is too sparse on details for that topic. When we ran into that problem—it was really difficult to know what was going on so we could properly explain the issue to the client and address the problem. Conversely, if we had an open source search engine, I could have just searched the source code and speed up the diagnosis of the problem.”

Karen: So from a business prospective—using open source allows you to invest in a technology that gives you the power to modify the code to better meet your business needs.

Michael: “It certainly can. It can accelerate development time and speed of diagnosing problems when issues pop up. And issues always do pop up. If something is not working very well, we can look at the problem which a much higher degree of granularity.

If it’s a simple problem, we never contact support. We only contact support when we’re stumped. And we’re not easily stumped. Usually, they can’t answer they question immediately because if we’re asking for help, the problem is complex. Our ticket is escalated, and eventually we talk to someone who can help us. But it does take time. Even if a support staff is top notch, there is the time is costs to deal with that, and that costs us and our clients’ time. We have a highly customized ESP installation for one of our clients it always take an enormous amount of time to explain over and over how we have our systems set up, the different parts work, and it’s a big pain to go through that every time I run across a problem. If it were open source, I can simply look at the source code and solve the problem.”

Karen: Let’s talk in more detail about open source search engines. Upsides?

Michael: “If you choose a popular open source search engine solution like Lucene Solr, you have an active, passionate community behind that solution. There are several developers looking at that engine, working on it, and actively posting in publicly available forums. You can often get your questions answered there by top notch experts in search technology. You can potentially talk to the original coders and creators of the product—and they are often happy to help you. I’ve seen people post a Solr question on their twitter feed and within 7 seconds, the creator has responded with a link to a forum explaining the solution.

Karen: Wow, that’s amazing. Other advantages?

Michael: “It’s free. That’s attractive to most companies. The downside is the formal documentation isn’t usually as good as the proprietary, and there isn’t a dedicated support team for the product. But if you have some savvy software developers on your team, the open source community is robust and willing to share information about the product. And having access to the source code is extremely valuable.”

Karen: So in your opinion, what’s the bottom line on Open Source Search Engines vs. Proprietary Search Engines?

Michael: “If you are a cutting edge company, you will be severely limited by a proprietary search engine as a solution. The more open the technology, the more able we are to refine it to meet our client’s needs.”

If you’d like more information on the pros and cons of Open Source Search Engines vs. Proprietary Search Engines that are specific to your business or organization’s needs, feel free to contact us for a free consultation.

Migration from FAST ESP to Lucene Solr

by Tamar Schanfeld

Download the presentation and see the video.

Michael McIntosh, Vice President of Enterprise Search Technologies at TNR, spoke at the Lucene Revolution conference in Boston, MA October 7-8, 2010. Michael reviewed the migration from Fast ESP to Lucene/Solr open source search. He discussed approaches to identifying core content areas of HTML documents such as Text-To-Tag Ratio Heuristics and Page Stereotype/Site Template Analysis, and reviewed specific use cases that we have encountered as search integration experts and discuss available tools.

TNR Global was a sponsor of Lucene Revolution. The conference gathered over 400 professionals from the enterprise search industry. We were happy to see so much interest in Lucene/Solr open source search, and get to know and learn from the folks who have done large scale implementations, including Twitter, LinkedIn, and eHarmony.  Not surprisingly, there was a lot of interest about migration from proprietory search systems to Solr, especially from FAST ESP due to Microsoft’s discontinuing FAST ESP support for Linux.  If you would like to learn more about how a migration from FAST ESP to Lucene Solr can benefit your company, contact us for a free consultation.