Our VP of Systems Administration Michael Klatsky discusses the latest OS upgrade from Apple–Mountain Lion. The following is a repost from his blog which can be viewed here along with other articles.
Feeling adventurous yesterday, I decided to upgrade my Mac AIR to Mountain Lion. Other than a fairly long download time, which is to be expected on the release day, the upgrade went fairly smoothly.
I opened iTerm. I use virtualenv & virtualenvwrapper to manage multiple Amazon accounts and python environments. The following error was thrown:
ImportError: No module named virtualenvwrapper.hook_loader
Luckily, this was easy. To fix, just do the following:
sudo easy_install pip
sudo pip install virtualenv virtualenvwrapper
Once done, open another iTerm window. You may see several directories created, such as “virtualenvwrapper.user_scripts creating /Users/mklatsky/.virtualenvs/premkproject”.
So far so good.
But- where is java?
“java -version” throws the error:
No Java runtime present, requesting install.
Again- easy. Just the act of issuing the “java -version” command launches a dialog box asking if I’d like to install or upgrade java. Five minutes later and I am back in business.
I use Virtualbox (actually Vagrant and Veewee). Attempting to launch one of my boxes resulted in a kernel panic, which rebooted my computer. Really? A kernel panic? At any rate- a little Googling, and an install of the latest Virtualbox (4.1.18) and I was all set.
One issue I ran into when installing Virtualbox revolved around Apple’s software installation security in Mountain Lion. There are now 3 levels of security for installable software: Mac App Store, Mac App Store and identified Developers and Anywhere.
Unfortunately, Oracle is not an “identified Developer”, so installation is impossible under the default settings. You’ll need to go to System Preferences->Security & Privacy ->General to change this. Then you can install Virtualbox. I’d recommend changing this setting back to the default after you are done.
I’ll conclude by stating that so far, Mountain Lion seems to be performing nicely, with few issues other than the above noted.
For further reading, check out the articles below:
TNR Global Launches Search Application for Museum Collections
TNR Global is launching the alpha version of a search application designed specifically for museum collections. Museum Collections Search is any application for digitally searching a museum’s collection. This can be made available to the public or used by only the internal staff for curation, and can be made available to a selected professional or research audience. Our White Paper explains the application in more detail.
Collections Search adds tremendous value to the research community and is often in line with the educational mission of many museums. A search feature is a resource for students and researchers, and can expand the overall audience by reaching people separated by distance or with limited physical mobility. When the public finds items in your collection like historic letters, photographs of items, and other catalog items through you search function, it can increase interest and traffic to the museum’s site and physical collection.
While the ability to search a museum collection for is a way to bring immense value to the museum and the community that supports it, intellectual property is an ongoing concern for curators within the museum community. TNR Global recognizes this and has technologies to address access to material. When setting up a search, ease of use or understanding and responsiveness are addressed, also issues of ownership or privacy all combine to determine the search technology chosen and how it is applied. The search realm and results can be tailored based on the user. By defining the audience (or audiences) for the collection and the search, we can structure the presentation of the results. The public view can be a more restricted display, while protected view and can be more expansive and detailed.
We use open source search technology that works with most museum software systems and databases including the popular museum software product PastPerfect. We customize our search solution specifically for your collection and optimize search results for the most relevant results to queries.
TNR Global has a long history with the museum community. Our CEO is Principal of the organization We Love Museums and is a member of dozens of museums worldwide. He is involved with a number of archival and curatorial indexing projects. He has merged his lifelong career in database and web technology with his passion for art, education, and history with creating a search solution to benefit museums and their patrons. To get started, contact us for an evaluation of your Museum Collection Search Project today!
The future of search doesn’t come in a box.
Last week while many were on vacation, Google abandoned the smallest member of its’ Search Appliance family, the Google Mini. The small blue piece of external hardware was used for smaller data sets with a stable, some might say stagnant, data with slow and steady query rates. If you were a smaller business with search demands that weren’t, well–too demanding, then this piece of hardware could help you for a reasonable price tag.
Search evolves like all technologies do. Developers incorporate emerging technologies into their skill sets, and open source technologies like Lucene Solr have matured into a competitive option for companies of all sizes. IT managers are finally ready to move away from the confines of a Search Appliance in a box and move to a more agile solution that can offer room for growth, a lightweight application, and a healthy and growing community. Without the hefty annual licensing fees of a commercial product, Solr can save small to mid sized companies and startups valuable cash resources to invest in other areas of their respective businesses.
Open source technologies aside, many are speculating if Google will retire some of its other pieces of hardware like the well know GSA (Google Search Appliance). Although Google has a newly released version 6.14 with an updated website to easily explain features. Google continues evolving its enterprise search offerings to include a hosted search solution for e-tailers called Google Commerce Search, along with their standard Google Site Search. Neither of these products come in a physical blue or yellow box, and I wouldn’t expect Google’s next innovation to either.
There’s plenty lively discussion about this in the Enterprise Search Professionals discussion board on LinkedIn.
Christopher Miles, one of our Senior Software Developers here at TNR, wrote this post on Bishop. It all started when he was asked the question:
“What happens if I send it something that’s not JSON?”…….“I don’t know, but I bet it logs a really big stack trace!”
The question begged an answer, and Chris give an extremely thorough answer in his own blog. Here’s a small except that gives a taste of his analysis:
After taking a closer look at HTTP and it’s specification, it was clear that it could do a lot more than I had thought. Looking back on past projects, it’s painfully obvious that I’ve been taking what is really an application protocol and ignoring all of the interesting bits, instead using it as little more than a pipe to push documents through. I’ve been using either the requested URL or parameters or maybe even neither and simply examined the body content, thus eliminating any of the real advantages of using HTTP in the first place.
And there are advantages. The protocol is already thinking about caching your data where it makes the most sense. There’s already an algorithm for taking the list of content types that the client wants and the content types the server provides and picking the best match. It can manage safe updating of resources as well as notifying the client of conflicts. And so on, by ignoring what the HTTP protocol has to offer I was making more work for myself.
So he decided to take matters into his own hands by creating a library. More from Chris’ blog:
The idea is to provide a relatively small library that will make your life easier and hopefully more pleasant by making it straightforward to provide a consistent web service API that obeys HTTP semantics. It will make the lives of those around you easier as well, clients can expect your service to respond to the common HTTP request methods with reasonable responses. Placing caches around your service will also be much simpler and you’ll have some level of control over how your service’s data is cached.
Since creating this library, other developers have responded positively and are watching the project. If you would like to take a look at how our approach to solving this problem, take a look for yourself here.
If you’d like to talk to us on how we can solve some of your enterprise search, cloud, or scalability issues, contact us.
Our VP of Systems Administration, Michael Klatsky has started a blog specifically discussing Systems. Fresh from the AWS Summit 2012 in NYC, Michael has lots of new approaches to discuss in terms of systems, cloud computing, DevOps, System Architecture, and how developers and systems staff need to communicate well and work together for the best results in web development. The blog is his own but we feel it’s a great technical resource for our colleagues in systems and web development. You can take a look at his blog here. Michael welcomes commentary and discussion, and hopes to provide some shortcuts for fellow System Administrators.
One of the most powerful features of FAST ESP is its flexible document processing engine. The engine that ships with FAST ESP supports multiple document processing pipelines that comprise of multiple document processing stages. A document processing stage performs a document processing task and can add, modify or remove elements from a document before it is passed to the next stage in the pipeline. A simple example of processing stage would be one that processes a document’s URL element, ESP ships with many processing stages and several processing pipelines out of the box for handling both structured and unstructured documents. FAST ESP document processing engine also provides a Python plugin API to allow customers to create custom processing stages of their own, which is a feature we use heavily for our customer ESP installations.
Unfortunately, Solr does not offer the same robust support for document processing pipelines that ESP does. The ESP processing pipeline is document-centric while the Lucene Solr platform is field-centric. When a document is fed to ESP for processing, it is routed to processing stages in a processing pipeline that can access document elements generated by previous processing stages. This allows for complex and optimal operations that can leverage previous processing, such as reuse of a previously generated HTML DOM tree structures. When a document is passed to a Solr update handler, the document is broken up into a set of individual fields. Each field can have a set of processors known as Solr Analysis Filter that can be chained together for field processing before indexing occurs. While this is fine for content that has been heavily processed before being sent to Solr, individual filters lack the same level of access to other documents elements to easily support more complex processing behaviors.
Another difference between ESP and Solr platforms is that ESP’s document processing architecture allows it to be scaled independently from its indexing architecture. ESP’s document processing architecture is fully decoupled from its indexing architecture and is designed out-of-the-box to take advantage of multiple processor cores per machine and multiple document processor machines per cluster. Solr’s out-of-the-box document processing architecture is tightly coupled with its indexing architecture, making it difficult to independently scale Solr’s content processing capacity without adding the complexity and overhead of additional Solr services and Lucene indexes. When we work with multiple terabyte document sets, we find content processing tends to be the biggest bottleneck, so being able to scale content processing ability separately from indexing is mission critical.
If we want to leverage the power that Solr offers, but we need support for a more robust document processing framework, what are our options? There are quite a number of content processing frameworks we can chose from that we discovered during the course of our research. Some of the options currently available include, but are not limited to OpenPipeline OpenPipe, Pypes, UIMA, SMILA , Apache Commons Pipeline, Piped, Behemoth, and Cascading.
Most of these frameworks are written in Java which gives them access to an incredibly broad and diverse spectrum of Java libraries. Since Solr and Lucene are also written in Java, it might make a lot of sense to favor a Java processing framework from scratch, especially if you are more comfortable with Java as a programming language.
Since our clients tend to have highly customized document processing pipelines with many custom FAST ESP Python processing stages, we are heavily biased towards choosing a framework that minimizes the amount of code that would need to be migrated. Many of the available processing frameworks are written in Java, which would be fine if you prefer using Java and don’t have a large amount of currently working Python code to migrate. For our use cases, the decision of which framework to chose was incredibly simple given the option, so we chose Pypes for our migration solution.
For a full report on how we use Pypes for a Document Processing Engine including sample code, sign up for our free FAST to Lucene Solr White Paper here.
There are many new technologies emerging around search, and we’ve been investigating several of them for our clients. Search has never been “easy” but Elasticsearch attempts to make it at least easier. Elasticsearch is billed to be “built for the cloud,” and with so many companies moving into the cloud, it seems like a natural that search would move there too. This paper is designed to show you just how Elasticsearch works by setting up a cluster and feeding it data. We also let you know what tools we use so you can test out the technology and we include a rough sketch of code as well. Finally, we make conclusions about how Elasticsearch can help with problems like Big Data and other search related uses.
Elasticsearch is an open source technology developed by one developer, Shay Bannon. This paper is simply a first look at elasticsearch and is not associated with an additonal product or variation of elaticsearch. The appeal for big data is due to elasticsearch’s wonderful ability to scale with growing content, which has largely been associated with the “big data problem” we all keep hearing about. It’s very easy to add new nodes and it handles the balancing of your data across the available nodes. It handles the failure of nodes in a graceful way that is important in a cloud environment. And lastly, we simply evaluate and test the technology. We really don’t believe there is a one size fits all technology in the realm of enterprise search, it is really highly dependent upon your systems, how many documents you have, how much unstructured data you have, and how you want your site to function. But that said– in terms of storing big data, it is as capable as any Lucene based product; it can handle a much larger load that the current Solr release as the notion of breaking the index up into smaller chunks is “baked in” to the product.
Here is an except from the paper:
“Products like Elasticsearch that lack a document processing component entirely become more attractive. In fact, most projects that involve a data set large enough to qualify as “big data”³² are building their own document processing stages anyway as part of their ETL cycle.”
If you are interested in downloading this free White Paper, sign up with us here.
If you would like help using Elasticsearch with your search project, contact us.
HADLEY, MA– March 12, 2012
In the world of Enterprise Search, everything is changing. Companies who have been using Microsoft’s internal search engine, FAST Enterprise Search Platform, will be forced to make a change as Microsoft discontinues support for the search platform for companies using Linux as their operating system. Anticipating the need for a solution, local technology consultants TNR Global is pleased to announce the release of a White Paper for migrating off FAST ESP to a new search engine, Solr. The paper is titled Bridging the Gap: A Migration Path from Fast ESP to Apache Solr.
This effort began last October when TNR Global presented on the subject of migration from FAST to Solr at the open source conference, Apache Lucene Eurocon in Barcelona, Spain. The paper contains a case study with architecture overview, loading millions of documents into Solr indexes, evaluation and recommendation of tools to bridge the feature gap, migrating custom pipeline code, and the vastly improved ROI after implementation. “It’s basically a road map for companies looking at options for migration, and we outline Solr as a very good option” said Karen E. Lynn, Director of Business Development.
“We have spent over 9 years working with the FAST ESP product and we understand the nuances of what customers have come to expect from the technology. We’ve identified Solr as a top choice for migrating off FAST as support for the product drops off” said Michael McIntosh, VP of Search Technologies and lead author of the paper. “Solr is an open source technology that has matured and is certainly stable enough for commercial use” said Chris Miles, Senior Software Engineer and contributor to the paper. “We’re excited about this migration option for our customers, and we believe over the long run, it will save them a lot of money and give them greater control over their search engine.”
This heavily anticipated paper will assist companies and organizations in planning their own FAST ESP to Apache Solr migrations and alert them to tools and techniques that can help them achieve a relatively painless process. Several large blue chip companies have expressed interest in the paper. “We’ve had a healthy response to the paper” said Lynn.
Internal search engines differ from public search engines like Google or Bing, in that an internal search engine only searches for content inside the company’s firewall. Google cannot access internal content, therefore companies use search technology to make their content ‘findable.’ “Companies want to keep internal information safe and private. But they still need to find it” explained Lynn. “That’s why they need search technology integrated into their organization’s system.”
TNR Global www.tnrglobal.com, is a systems design and integration company focused on enterprise search and cloud computing solutions for publishing companies, news sites, web directories, academia, enterprise, and SaaS companies. TNR’s past clients include the University of Massachusetts Amherst, Mass Art & Culture, InterNano, Innovara, and the Allegis Group. TNR Global is located at 245 Russell Street, Suite 10 in Hadley, Massachusetts. TNR Global serves clients throughout New England, nationally, and world-wide. Its offices are in Hadley and Greenfield, Massachusetts.
Mobile phones are rapidly taking over the scene of web development, significantly impacting commerce, advertising, gaming, entertainment, banking and news. 77% of the world’s population or 5.3 billion people are mobile subscribers. China and India lead the way in overall mobile growth. Virtually every measurable metric concerning mobile phone growth indicates entire economies being influenced by mobile technology. It’s not surprising that search technology is powering mobile growth just as it has it’s larger cousin the desktop.
Mobile search used to be clunky and a pain to use. Until recently, the answer was to miniaturize the website. For a time, people thought mobile search would never be as good as the desktop search. But, as people use their phones for more and more, it has forced designers to consider how to make search, as well as all mobile apps, simple and powerful and built for end users.
The Mobile Only World
Outside the US, countries like India, South Africa and Egypt are leaders in mobile only--meaning users do not or infrequently use a desktop or laptop to access the Internet–making mobile search their primary mechanism for accessing queried information. Since these are also the countries sporting the most mobile growth, they are driving the need for quality relevant search for the mobile market.
Young and free
Another driving metric in the mobile game are young people. The under 25 crowd use a cell phone as their primary mode of accessing the Internet. Mobile phones, smart phones in particular, are used to do nearly everything. Younger people are more open to conducting transactions online via phone than any other demographic. Shopping, banking, GPS, social media, gaming–mobile access allows mobile subscribers to do everything they need to without restricting the user to an office.
Key differences for UX Impact
Key difference between mobile search and desktop search seem obvious. On a cell phone, the screen is much, much smaller. Users are on the go and may access the Internet between tasks or meetings, instead of being in one area. Access needs to be quick and simple. Mobile search must be designed for a minimum number of touches before users arrive at the end result. If it takes more than 2-3 touches, the user will look elsewhere for answers. Fewer touches mean a simpler design, engineered for the user without a lot of fanfare or complication.
Google reports that 1 in 7 searches are now done via mobile vs. desktop. Mobile searches have increased fourfold in just the last year. Businesses need a mobile application to ensure they are reaching the inbound web traffic looking for their services and products. Mobile applications need a strong search technology to ensure the consumer can connect with the products or information they are looking for. The companies that build their web solution for the mobile market are the companies who will gain more market share and capture that 14% of customers searching for their products on the mobile web.
For the enterprise, accessing important information inside and outside the firewall is vitally important as more content is built within businesses and accessed digitally. With the mounting demand placed on mobile phones and devices, the performance we’ve come to expect from out desktop needs to be scaled to a smaller screen by simplifying wireframes with sophistication and well thought out design.
TNR Global’s expertise lies in deep back end knowledge using powerful search technologies to give users fast, relevant search results for enterprise sites and large web portals. Recognizing the need for search to work as powerfully for a mobile application as well as a web application, we have teamed up with talented UX designers specifically in the field of search application design for web and mobile. Whether you are looking for a customized UX front end for your search solution or an out of the box answer for mobile search, TNR can connect you with a total solution to answer your web based and mobile search needs. For a free consultation, contact us.