We are pleased to be in attendance at the Enterprise Search Summit in Washington, DC October 17-19th, 2012 talking to colleagues and discussing search technology in a changing environment. To follow the discussion, follow us on Twitter @TNRGlobal.
The idea is to provide a relatively small library that will make your life easier and hopefully more pleasant by making it straightforward to provide a consistent web service API that obeys HTTP semantics.
Christopher Miles, one of our Senior Software Developers here at TNR, wrote this post on Bishop. It all started when he was asked the question:
“What happens if I send it something that’s not JSON?”…….“I don’t know, but I bet it logs a really big stack trace!”
The question begged an answer, and Chris give an extremely thorough answer in his own blog. Here’s a small except that gives a taste of his analysis:
After taking a closer look at HTTP and it’s specification, it was clear that it could do a lot more than I had thought. Looking back on past projects, it’s painfully obvious that I’ve been taking what is really an application protocol and ignoring all of the interesting bits, instead using it as little more than a pipe to push documents through. I’ve been using either the requested URL or parameters or maybe even neither and simply examined the body content, thus eliminating any of the real advantages of using HTTP in the first place.
And there are advantages. The protocol is already thinking about caching your data where it makes the most sense. There’s already an algorithm for taking the list of content types that the client wants and the content types the server provides and picking the best match. It can manage safe updating of resources as well as notifying the client of conflicts. And so on, by ignoring what the HTTP protocol has to offer I was making more work for myself.
So he decided to take matters into his own hands by creating a library. More from Chris’ blog:
The idea is to provide a relatively small library that will make your life easier and hopefully more pleasant by making it straightforward to provide a consistent web service API that obeys HTTP semantics. It will make the lives of those around you easier as well, clients can expect your service to respond to the common HTTP request methods with reasonable responses. Placing caches around your service will also be much simpler and you’ll have some level of control over how your service’s data is cached.
Since creating this library, other developers have responded positively and are watching the project. If you would like to take a look at how our approach to solving this problem, take a look for yourself here.
If you’d like to talk to us on how we can solve some of your enterprise search, cloud, or scalability issues, contact us.
Research is essential to choosing the right search technology for your organization. With new and complex technologies emerging every day, it can be difficult to navigate the path that will lead you to the right solution. Staying abreast of new tools and technology is a top priority for our team of software developers. We have reported on many tools and search technologies, their components and applications in a series of White Papers. Our papers are geared for the technology professional but often have enough business cases that illustrate how improving search can be of value to the needs of the business like analyzing large sets of data, better site performance, higher customer satisfaction and an overall healthy bottom line.
To indicate your interest in receiving our White Papers on the following subjects, click the headings below to enter your email. From time to time we may update the paper or send relevant information on the subject, but by no means will we inundate you with email. Our White Papers are free when you sign up.
Thank you for your interest in our work. Please feel free to contact us to discuss your specific needs. We’re happy to discuss your options.
White Papers Published 2012:
And if you are just getting started or learning about Enterprise Search on a smaller scale, you can view these White Papers as well, no email necessary.
White Papers Published 2010-2011:
If we want to leverage the power that Solr offers, but we need support for a more robust document processing framework, what are our options?
One of the most powerful features of FAST ESP is its flexible document processing engine. The engine that ships with FAST ESP supports multiple document processing pipelines that comprise of multiple document processing stages. A document processing stage performs a document processing task and can add, modify or remove elements from a document before it is passed to the next stage in the pipeline. A simple example of processing stage would be one that processes a document’s URL element, ESP ships with many processing stages and several processing pipelines out of the box for handling both structured and unstructured documents. FAST ESP document processing engine also provides a Python plugin API to allow customers to create custom processing stages of their own, which is a feature we use heavily for our customer ESP installations.
Unfortunately, Solr does not offer the same robust support for document processing pipelines that ESP does. The ESP processing pipeline is document-centric while the Lucene Solr platform is field-centric. When a document is fed to ESP for processing, it is routed to processing stages in a processing pipeline that can access document elements generated by previous processing stages. This allows for complex and optimal operations that can leverage previous processing, such as reuse of a previously generated HTML DOM tree structures. When a document is passed to a Solr update handler, the document is broken up into a set of individual fields. Each field can have a set of processors known as Solr Analysis Filter that can be chained together for field processing before indexing occurs. While this is fine for content that has been heavily processed before being sent to Solr, individual filters lack the same level of access to other documents elements to easily support more complex processing behaviors.
Another difference between ESP and Solr platforms is that ESP’s document processing architecture allows it to be scaled independently from its indexing architecture. ESP’s document processing architecture is fully decoupled from its indexing architecture and is designed out-of-the-box to take advantage of multiple processor cores per machine and multiple document processor machines per cluster. Solr’s out-of-the-box document processing architecture is tightly coupled with its indexing architecture, making it difficult to independently scale Solr’s content processing capacity without adding the complexity and overhead of additional Solr services and Lucene indexes. When we work with multiple terabyte document sets, we find content processing tends to be the biggest bottleneck, so being able to scale content processing ability separately from indexing is mission critical.
If we want to leverage the power that Solr offers, but we need support for a more robust document processing framework, what are our options? There are quite a number of content processing frameworks we can chose from that we discovered during the course of our research. Some of the options currently available include, but are not limited to OpenPipeline OpenPipe, Pypes, UIMA, SMILA , Apache Commons Pipeline, Piped, Behemoth, and Cascading.
Most of these frameworks are written in Java which gives them access to an incredibly broad and diverse spectrum of Java libraries. Since Solr and Lucene are also written in Java, it might make a lot of sense to favor a Java processing framework from scratch, especially if you are more comfortable with Java as a programming language.
Since our clients tend to have highly customized document processing pipelines with many custom FAST ESP Python processing stages, we are heavily biased towards choosing a framework that minimizes the amount of code that would need to be migrated. Many of the available processing frameworks are written in Java, which would be fine if you prefer using Java and don’t have a large amount of currently working Python code to migrate. For our use cases, the decision of which framework to chose was incredibly simple given the option, so we chose Pypes for our migration solution.
For a full report on how we use Pypes for a Document Processing Engine including sample code, sign up for our free FAST to Lucene Solr White Paper here.
We believe that Elasticsearch is a product that everyone working in the field of big data will want to take a look at.
There are many new technologies emerging around search, and we’ve been investigating several of them for our clients. Search has never been “easy” but Elasticsearch attempts to make it at least easier. Elasticsearch is billed to be “built for the cloud,” and with so many companies moving into the cloud, it seems like a natural that search would move there too. This paper is designed to show you just how Elasticsearch works by setting up a cluster and feeding it data. We also let you know what tools we use so you can test out the technology and we include a rough sketch of code as well. Finally, we make conclusions about how Elasticsearch can help with problems like Big Data and other search related uses.
Elasticsearch is an open source technology developed by one developer, Shay Bannon. This paper is simply a first look at elasticsearch and is not associated with an additonal product or variation of elaticsearch. The appeal for big data is due to elasticsearch’s wonderful ability to scale with growing content, which has largely been associated with the “big data problem” we all keep hearing about. It’s very easy to add new nodes and it handles the balancing of your data across the available nodes. It handles the failure of nodes in a graceful way that is important in a cloud environment. And lastly, we simply evaluate and test the technology. We really don’t believe there is a one size fits all technology in the realm of enterprise search, it is really highly dependent upon your systems, how many documents you have, how much unstructured data you have, and how you want your site to function. But that said– in terms of storing big data, it is as capable as any Lucene based product; it can handle a much larger load that the current Solr release as the notion of breaking the index up into smaller chunks is “baked in” to the product.
Here is an except from the paper:
“Products like Elasticsearch that lack a document processing component entirely become more attractive. In fact, most projects that involve a data set large enough to qualify as “big data”³² are building their own document processing stages anyway as part of their ETL cycle.”
If you are interested in downloading this free White Paper, sign up with us here.
If you would like help using Elasticsearch with your search project, contact us.
“It’s basically a road map for companies looking at options for migration, and we outline Solr as a very good option”
HADLEY, MA– March 12, 2012
In the world of Enterprise Search, everything is changing. Companies who have been using Microsoft’s internal search engine, FAST Enterprise Search Platform, will be forced to make a change as Microsoft discontinues support for the search platform for companies using Linux as their operating system. Anticipating the need for a solution, local technology consultants TNR Global is pleased to announce the release of a White Paper for migrating off FAST ESP to a new search engine, Solr. The paper is titled Bridging the Gap: A Migration Path from Fast ESP to Apache Solr.
This effort began last October when TNR Global presented on the subject of migration from FAST to Solr at the open source conference, Apache Lucene Eurocon in Barcelona, Spain. The paper contains a case study with architecture overview, loading millions of documents into Solr indexes, evaluation and recommendation of tools to bridge the feature gap, migrating custom pipeline code, and the vastly improved ROI after implementation. “It’s basically a road map for companies looking at options for migration, and we outline Solr as a very good option” said Karen E. Lynn, Director of Business Development.
“We have spent over 9 years working with the FAST ESP product and we understand the nuances of what customers have come to expect from the technology. We’ve identified Solr as a top choice for migrating off FAST as support for the product drops off” said Michael McIntosh, VP of Search Technologies and lead author of the paper. “Solr is an open source technology that has matured and is certainly stable enough for commercial use” said Chris Miles, Senior Software Engineer and contributor to the paper. “We’re excited about this migration option for our customers, and we believe over the long run, it will save them a lot of money and give them greater control over their search engine.”
This heavily anticipated paper will assist companies and organizations in planning their own FAST ESP to Apache Solr migrations and alert them to tools and techniques that can help them achieve a relatively painless process. Several large blue chip companies have expressed interest in the paper. “We’ve had a healthy response to the paper” said Lynn.
Internal search engines differ from public search engines like Google or Bing, in that an internal search engine only searches for content inside the company’s firewall. Google cannot access internal content, therefore companies use search technology to make their content ‘findable.’ “Companies want to keep internal information safe and private. But they still need to find it” explained Lynn. “That’s why they need search technology integrated into their organization’s system.”
TNR Global www.tnrglobal.com, is a systems design and integration company focused on enterprise search and cloud computing solutions for publishing companies, news sites, web directories, academia, enterprise, and SaaS companies. TNR’s past clients include the University of Massachusetts Amherst, Mass Art & Culture, InterNano, Innovara, and the Allegis Group. TNR Global is located at 245 Russell Street, Suite 10 in Hadley, Massachusetts. TNR Global serves clients throughout New England, nationally, and world-wide. Its offices are in Hadley and Greenfield, Massachusetts.
Mobile search must be designed for a minimum number of touches before users arrive at the end result. If it takes more than 2-3 touches, the user will look elsewhere for answers.
Mobile phones are rapidly taking over the scene of web development, significantly impacting commerce, advertising, gaming, entertainment, banking and news. 77% of the world’s population or 5.3 billion people are mobile subscribers. China and India lead the way in overall mobile growth. Virtually every measurable metric concerning mobile phone growth indicates entire economies being influenced by mobile technology. It’s not surprising that search technology is powering mobile growth just as it has it’s larger cousin the desktop.
Mobile search used to be clunky and a pain to use. Until recently, the answer was to miniaturize the website. For a time, people thought mobile search would never be as good as the desktop search. But, as people use their phones for more and more, it has forced designers to consider how to make search, as well as all mobile apps, simple and powerful and built for end users.
The Mobile Only World
Outside the US, countries like India, South Africa and Egypt are leaders in mobile only--meaning users do not or infrequently use a desktop or laptop to access the Internet–making mobile search their primary mechanism for accessing queried information. Since these are also the countries sporting the most mobile growth, they are driving the need for quality relevant search for the mobile market.
Young and free
Another driving metric in the mobile game are young people. The under 25 crowd use a cell phone as their primary mode of accessing the Internet. Mobile phones, smart phones in particular, are used to do nearly everything. Younger people are more open to conducting transactions online via phone than any other demographic. Shopping, banking, GPS, social media, gaming–mobile access allows mobile subscribers to do everything they need to without restricting the user to an office.
Key differences for UX Impact
Key difference between mobile search and desktop search seem obvious. On a cell phone, the screen is much, much smaller. Users are on the go and may access the Internet between tasks or meetings, instead of being in one area. Access needs to be quick and simple. Mobile search must be designed for a minimum number of touches before users arrive at the end result. If it takes more than 2-3 touches, the user will look elsewhere for answers. Fewer touches mean a simpler design, engineered for the user without a lot of fanfare or complication.
Google reports that 1 in 7 searches are now done via mobile vs. desktop. Mobile searches have increased fourfold in just the last year. Businesses need a mobile application to ensure they are reaching the inbound web traffic looking for their services and products. Mobile applications need a strong search technology to ensure the consumer can connect with the products or information they are looking for. The companies that build their web solution for the mobile market are the companies who will gain more market share and capture that 14% of customers searching for their products on the mobile web.
For the enterprise, accessing important information inside and outside the firewall is vitally important as more content is built within businesses and accessed digitally. With the mounting demand placed on mobile phones and devices, the performance we’ve come to expect from out desktop needs to be scaled to a smaller screen by simplifying wireframes with sophistication and well thought out design.
TNR Global’s expertise lies in deep back end knowledge using powerful search technologies to give users fast, relevant search results for enterprise sites and large web portals. Recognizing the need for search to work as powerfully for a mobile application as well as a web application, we have teamed up with talented UX designers specifically in the field of search application design for web and mobile. Whether you are looking for a customized UX front end for your search solution or an out of the box answer for mobile search, TNR can connect you with a total solution to answer your web based and mobile search needs. For a free consultation, contact us.
“The truth is that if your end user of the solution doesn’t like the solution, they won’t use it.”
You’ve convinced the powers that be that a search solution is a necessary strategy for success and competitive advantage. Congratulations! Nice work. Think your job is done? Not by a long shot.
Ask your staff–what would a good solution look like to them? After you’ve decided to move forward with a search solution, it’s important, no–it’s crucial that you consider strongly the end user. If you have a web portal that you manage, it’s worth polling your typical customer to gather vital data on how they want their experience to be. If you are looking at an enterprise search solution, you need to spend time exploring what your staff wants and needs out of a solution, and ensure your search solution addresses design for them….not a boilerplate solution that only meets some of your needs. Search is an expensive endeavor, if you’re spending the money, you might as well get exactly what you want.
The truth is that if your end user of the solution doesn’t like the solution, they won’t use it. So getting the end user involved in the planning stage of the search project is vital to it’s overall success. If they have input to it’s overall features and design, they will be more invested in using it. Involving users manufactures all kinds of good-will collateral that can help develop better morale and a positive workplace. Doing this early in the process also introduces change more slowly to users–and people rarely react well to lots of radical change. Making them a part of the process and doing it early with lots of prepping for change can affect overall satisfaction rates with the search implementation after it’s complete.
Once the implementation actually goes live, you’ll need to ensure a training plan is in place and executed to ensure ongoing success. A successful search solution isn’t just done once it’s implemented. You need to work to include your whole team in the training process, and allow them to see for themselves how the solution is going to help them in their day to day tasks. If you included your staff in the planning of the design from the beginning, you’ll be much more successful once the solution is deployed, because they were part of the solution all along.
“Search by itself may look like a simple box, but behind the box is a foundry of girders, cross beams, and structural support that allows you to find what you need.”
“Search ties people together…”
This was one of the many themes at the Enterprise Search Summit in Washington, DC last week. It seems like a fairly obvious statement, but it quickly becomes part of the landscape, taken for granted even though the landscape couldn’t function without it. I have compared search function to the steel girders of a skyscraper. When you walk into the building, you aren’t thinking about the beams holding the building up or connecting floors, but without them, you wouldn’t have a building at all (you couldn’t even find the lobby). Other metaphors overheard include oxygen (invisible yet essential), sunlight (lest we remain in the dark) and electricity (everything stops without it).
Attendees of the conference know how important search is to companies, but increasingly, companies are taking search for granted. There is a fundamental gap in communicating the importance and difficulty of implementing a good search platform.
Companies who need search to run on their website or intranet, expect search to work as it does on the Internet, but this is an apples and oranges scenario.
Here are the main disconnects:
- Search is easy
- Search is cheap
- It never has to be touched again
People expect search inside the firewall to function much like Google does outside the firewall. Google exists for end users and is really, really incredible. It Geo-locates, it auto-completes. It uses your browsing history to provide more relevant results. And you had no financial investment in using this really lovely, elegant, useful tool that doesn’t just assist your Internet experience, but facilitates it. But behind the firewall, things are different. Let me explain.
- Your business content isn’t publicly available or known. I mean, that would be bad, right? It’s behind the firewall for a reason. So keeping it there yet allowing your staff to access certain levels of information takes some architecture and planning.
- Google has thousands of developers working on this beautiful, incredible technology every day. They finance this by ad content. How many people do you have on your search team? And how much of their day do they really spend on search? What department is being billed for it? Business leaders need to embrace this as a necessary cost of doing business and budget accordingly, or face the crippling result of staff and customers not being able to find the information they need.
- 80% of your content is unstructured. Meaning, search engines can’t really read it until some love and care is put into cleaning the data. This is a vital, yet time intense process. Our VP of Search Technologies Michael McIntosh says “We spend about 90% of our time on the document processing pipeline, conditioning data to be fed into the engine.” Moreover, unstructured data isn’t a set number. It’s being creating faster than you can blink by your entire enterprise. Processing it is never a done deal.
So if search connects us, hopefully this finds you thinking about search in more realistic terms. Search by itself may look like a simple box, but behind the box is a foundry of girders, cross beams, and structural support that allows you to find what you need to “make money outside the firewall or save money inside the firewall.”