Search and Steel Girders

“Search by itself may look like a simple box, but behind the box is a foundry of girders, cross beams, and structural support that allows you to find what you need.”

“Search ties people together…”

This was one of the many themes at the Enterprise Search Summit in Washington, DC last week. It seems like a fairly obvious statement, but it quickly becomes part of the landscape, taken for granted even though the landscape couldn’t function without it. I have compared search function to the steel girders of a skyscraper. When you walk into the building, you aren’t thinking about the beams holding the building up or connecting floors, but without them, you wouldn’t have a building at all (you couldn’t even find the lobby). Other metaphors overheard include oxygen (invisible yet essential), sunlight (lest we remain in the dark) and electricity (everything stops without it).

Attendees of the conference know how important search is to companies, but increasingly, companies are taking search for granted. There is a fundamental gap in communicating the importance and difficulty of implementing a good search platform.

Companies who need search to run on their website or intranet, expect search to work as it does on the Internet, but this is an apples and oranges scenario.

Here are the main disconnects:

  1. Search is easy
  2. Search is cheap
  3. It never has to be touched again

People expect search inside the firewall to function much like Google does outside the firewall. Google exists for end users and is really, really incredible. It Geo-locates, it auto-completes. It uses your browsing history to provide more relevant results. And you had no financial investment in using this really lovely, elegant, useful tool that doesn’t just assist your Internet experience, but facilitates it. But behind the firewall, things are different. Let me explain.

  • Your business content isn’t publicly available or known. I mean, that would be bad, right? It’s behind the firewall for a reason. So keeping it there yet allowing your staff to access certain levels of information takes some architecture and planning.
  • Google has thousands of developers working on this beautiful, incredible technology every day. They finance this by ad content. How many people do you have on your search team? And how much of their day do they really spend on search? What department is being billed for it? Business leaders need to embrace this as a necessary cost of doing business and budget accordingly, or face the crippling result of staff and customers not being able to find the information they need.

  • 80% of your content is unstructured. Meaning, search engines can’t really read it until some love and care is put into cleaning the data. This is a vital, yet time intense process. Our VP of Search Technologies Michael McIntosh says “We spend about 90% of our time on the document processing pipeline, conditioning data to be fed into the engine.” Moreover, unstructured data isn’t a set number. It’s being creating faster than you can blink by your entire enterprise. Processing it is never a done deal.


So if search connects us, hopefully this finds you thinking about search in more realistic terms. Search by itself may look like a simple box, but behind the box is a foundry of girders, cross beams, and structural support that allows you to find what you need to “make money outside the firewall or save money inside the firewall.”

TNR Global Attends KMWorld’s Enterprise Search Summit Fall 2011

A proof of concept and rapid integration are essential for search customers–they cannot visualize what a search solution will look like without some help from the search professional.

ESSFallLogo

Last week TNR Global attended the Enterprise Search Summit organized by KMWorld in Washington, DC.  VP of Search Technologies Michael McIntosh and Director of Business Development Karen E. Lynn attended the three day conference and Enterprise Solutions Showcase at the Marriott Wardman Park.  Several companies were in attendance, and some common themes emerged.  Among these were designing for users, dealing with unstrcutured content, the need for better search and content analytics to facilitate better search results, as well as tagging content as part of a best practice in workflow.  Also discussed was the need for search vendors to demonstrate to search customers was “right looks like” in a search solution.  A proof of concept and rapid integration are essential for search customers–they cannot visualize what a search solution will look like without some help from the search professional.

An unexpected surprise came when the speaker on open source search was unable to attend at the last moment, our own Michael McIntosh was asked to step in and present on the subject.  Fortunately, he was fresh from his presentation at Apache Lucene EuroCon and already had his presentation loaded on his machine.  Michael discussed Solr and made general points on migrating from a commercial search engine like FAST ESP to a open source platform like Lucene Solr.

Overall it was a great conference with lots of informative talks and friendly search professionals.  We’re looking forward to the next Enterprise Search Summit in Spring, 2012.

TNR Global

TNR Global is a systems design and integration company focused on providing customers effective search and cloud computing solutions. We develop scalable web-based search solutions focusing on news sites, publishing, web directories and catalogs, information portals, education, manufacturing and distribution, customer service, and life sciences.

How Enterprise Search Works

An enterprise search engine has two components: a front end and a back end. Both work together with the search index. The index is built statically for search speed, and is updated periodically. This is unlike a database where the indexes are updated in real time when data is changed or added.

Enterprise Search System

ent_chart

Back End:

Creating and updating the index.

  • Crawler – The crawler module reads and collects web pages and follows the links between them, starting with a list of initial URLs.
  • Document Processor – The document processor module processes web pages received from the crawler, as well as information received from databases through a ‘database connector’ and information from directories of files. The document processor takes the meaningful text from the documents, no matter the type or format, and adds whatever meaningful ‘meta-data’ it can determine, such as title or authors.
  • Indexer – The indexer module does the brute force work of creating and maintaining the index from the information it receives from the document processor.

Front End:

Responding to user queries.

  • Web server – The user’s web browser fetches a web page from the web server that contains a search form. The user then enters a query and the web browser sends the request to the web server.
  • Query processor – The web server sends a request with the user’s query to the query processor. The query processor properly formats the request and sends it to one (or more) search modules, collects the results and sends them to the web server for final formatting.
  • Search Engine – The search engine module receives the request from the query processor and does the actual searching inside the index that was created by the indexer.

To deliver an enterprise search solution to meet your organization’s needs, a number of components need to be incorporated

Connectors

Allow search engines to gather information from various sources (structured databases, unstructured documents on internal and remote servers, desktop computers) in your enterprise as well as the external web. Specialized connectors are available for almost every type of file format and application, and custom connectors can be designed as needed.

Relevancy Tools

Build a customized ranking model that delivers content based on concepts, context, date, authority, completeness, geography, statistics, and quality. Tune each element to match your business needs.

Linguistics Tools

Identify synonyms (search for “Great Britain” will include results for “England” as well), abbreviations, phrases, idioms, part of speech, and misspellings. Lemmatization matches regular and irregular grammatical forms. Type in ‘goes,’ and you can also find ‘went.’ Prefixes and suffixes can be disregarded, if needed. Certain words can be skipped. The system knows the difference between the ‘wind that blows’ and ‘wind your watch.’ Phonetic search allows you to find results based on phonetic similarities, especially useful for names. The system can recognize queries containing who, what, where, when, or why, and provide appropriate results.

Sentiment analysis

Determine if a document has a negative or positive tone. Use this tool to monitor user groups, to analyze reviews of your products and services, or to follow press coverage of your business.

Business rules

Each organization has specific guidelines for how it does business, and these guidelines can be implemented in an enterprise search solution. If you have, for example, two levels of advertisers who pay for placement, the higher level advertiser can be assigned to appear in a separate format on top of the results list. When customers have worked with you before, use business rules to determine which search results are most relevant to their specific needs.

Performance, Scalability and High Availability

Enterprise search needs to be fast, and it needs to be reliable. TNR Global designs your hardware specifications to meet speed requirements and to offer a failover option in case of power outages or server issues. Search engine architecture is scalable, and you can increase index capacity, indexing rate, and/or query processing speed as needed.

Benchmarking

You can’t manage what you don’t measure. One of the most valuable tools in enterprise search is the ability to learn from the system and measure results. With benchmarking, you track how your users respond to the search engine, and how the search engine responds to them. Metrics include click counting, hardware performance, and quantity of data searched. You can track which queries return no results and then tweak synonyms or taxonomy to provide answers to these queries.

Security

Security is a vital component for any enterprise application. With enterprise search, security guidelines apply both to the documents and to the user. Each set of documents can have its own security settings. You can restrict documents from the search engine, or allow the search engine to index the documents, but restrict access to certain groups of users. Security checks are performed when a set of results is queried and as the indexer crawls the data source.