How Enterprise Search Works

ent_chart

An enterprise search engine has two components: a front end and a back end. Both work together with the search index. The index is built statically for search speed, and is updated periodically. This is unlike a database where the indexes are updated in real time when data is changed or added.

Enterprise Search System

ent_chart

Back End:

Creating and updating the index.

  • Crawler – The crawler module reads and collects web pages and follows the links between them, starting with a list of initial URLs.
  • Document Processor – The document processor module processes web pages received from the crawler, as well as information received from databases through a ‘database connector’ and information from directories of files. The document processor takes the meaningful text from the documents, no matter the type or format, and adds whatever meaningful ‘meta-data’ it can determine, such as title or authors.
  • Indexer – The indexer module does the brute force work of creating and maintaining the index from the information it receives from the document processor.

Front End:

Responding to user queries.

  • Web server – The user’s web browser fetches a web page from the web server that contains a search form. The user then enters a query and the web browser sends the request to the web server.
  • Query processor – The web server sends a request with the user’s query to the query processor. The query processor properly formats the request and sends it to one (or more) search modules, collects the results and sends them to the web server for final formatting.
  • Search Engine – The search engine module receives the request from the query processor and does the actual searching inside the index that was created by the indexer.

To deliver an enterprise search solution to meet your organization’s needs, a number of components need to be incorporated

Connectors

Allow search engines to gather information from various sources (structured databases, unstructured documents on internal and remote servers, desktop computers) in your enterprise as well as the external web. Specialized connectors are available for almost every type of file format and application, and custom connectors can be designed as needed.

Relevancy Tools

Build a customized ranking model that delivers content based on concepts, context, date, authority, completeness, geography, statistics, and quality. Tune each element to match your business needs.

Linguistics Tools

Identify synonyms (search for “Great Britain” will include results for “England” as well), abbreviations, phrases, idioms, part of speech, and misspellings. Lemmatization matches regular and irregular grammatical forms. Type in ‘goes,’ and you can also find ‘went.’ Prefixes and suffixes can be disregarded, if needed. Certain words can be skipped. The system knows the difference between the ‘wind that blows’ and ‘wind your watch.’ Phonetic search allows you to find results based on phonetic similarities, especially useful for names. The system can recognize queries containing who, what, where, when, or why, and provide appropriate results.

Sentiment analysis

Determine if a document has a negative or positive tone. Use this tool to monitor user groups, to analyze reviews of your products and services, or to follow press coverage of your business.

Business rules

Each organization has specific guidelines for how it does business, and these guidelines can be implemented in an enterprise search solution. If you have, for example, two levels of advertisers who pay for placement, the higher level advertiser can be assigned to appear in a separate format on top of the results list. When customers have worked with you before, use business rules to determine which search results are most relevant to their specific needs.

Performance, Scalability and High Availability

Enterprise search needs to be fast, and it needs to be reliable. TNR Global designs your hardware specifications to meet speed requirements and to offer a failover option in case of power outages or server issues. Search engine architecture is scalable, and you can increase index capacity, indexing rate, and/or query processing speed as needed.

Benchmarking

You can’t manage what you don’t measure. One of the most valuable tools in enterprise search is the ability to learn from the system and measure results. With benchmarking, you track how your users respond to the search engine, and how the search engine responds to them. Metrics include click counting, hardware performance, and quantity of data searched. You can track which queries return no results and then tweak synonyms or taxonomy to provide answers to these queries.

Security

Security is a vital component for any enterprise application. With enterprise search, security guidelines apply both to the documents and to the user. Each set of documents can have its own security settings. You can restrict documents from the search engine, or allow the search engine to index the documents, but restrict access to certain groups of users. Security checks are performed when a set of results is queried and as the indexer crawls the data source.

Leave a Reply

Your email address will not be published. Required fields are marked *