System Administration

Mountain Lion Roars….and Leaves Some Blood

One issue I ran into when installing Virtualbox revolved around Apple’s software installation security in Mountain Lion

Feeling adventurous yesterday, I decided to upgrade my Mac AIR to Mountain Lion. Other than a fairly long download time, which is to be expected on the release day, the upgrade went fairly smoothly.

Until…

I opened iTerm. I use virtualenv & virtualenvwrapper to manage multiple Amazon accounts and python environments. The following error was thrown:

ImportError: No module named virtualenvwrapper.hook_loader

Luckily, this was easy. To fix, just do the following:

sudo easy_install pip

sudo pip install virtualenv virtualenvwrapper

Once done, open another iTerm window. You may see several directories created, such as “virtualenvwrapper.user_scripts creating /Users/mklatsky/.virtualenvs/premkproject”.

So far so good.

But- where is java?

“java -version” throws the error:

No Java runtime present, requesting install.

Again- easy. Just the act of issuing the “java -version” command launches a dialog box asking if I’d like to install or upgrade java. Five minutes later and I am back in business.

And then…

I use Virtualbox (actually Vagrant and Veewee). Attempting to launch one of my boxes resulted in a kernel panic, which rebooted my computer. Really? A kernel panic? At any rate- a little Googling, and an install of the latest Virtualbox (4.1.18) and I was all set.

One issue I ran into when installing Virtualbox revolved around Apple’s software installation security in Mountain Lion. There are now 3 levels of security for installable software: Mac App Store, Mac App Store and identified Developers and Anywhere.

Unfortunately, Oracle is not an “identified Developer”, so installation is impossible under the default settings. You’ll need to go to System Preferences->Security & Privacy ->General to change this. Then you can install Virtualbox. I’d recommend changing this setting back to the default after you are done.

I’ll conclude by stating that so far, Mountain Lion seems to be performing nicely, with few issues other than the above noted.

For further reading, check out the articles below:

http://arstechnica.com/apple/2012/07/os-x-10-8/

http://www.jongales.com/blog/2012/07/25/fixing-virtualenv-after-installing-mountain-lion/

https://www.virtualbox.org/ticket/10267

Our VP of Systems Administration Michael Klatsky discusses the latest OS upgrade from Apple–Mountain Lion. The following is a repost from his blog which can be viewed here along with other articles.

_____________________________________________________________________________________________________

Feeling adventurous yesterday, I decided to upgrade my Mac AIR to Mountain Lion. Other than a fairly long download time, which is to be expected on the release day, the upgrade went fairly smoothly.

Until…

I opened iTerm. I use virtualenv & virtualenvwrapper to manage multiple Amazon accounts and python environments. The following error was thrown:

ImportError: No module named virtualenvwrapper.hook_loader

Luckily, this was easy. To fix, just do the following:

sudo easy_install pip

sudo pip install virtualenv virtualenvwrapper

Once done, open another iTerm window. You may see several directories created, such as “virtualenvwrapper.user_scripts creating /Users/mklatsky/.virtualenvs/premkproject”.

So far so good.

But- where is java?

“java -version” throws the error:

No Java runtime present, requesting install.

Again- easy. Just the act of issuing the “java -version” command launches a dialog box asking if I’d like to install or upgrade java. Five minutes later and I am back in business.

And then…

I use Virtualbox (actually Vagrant and Veewee). Attempting to launch one of my boxes resulted in a kernel panic, which rebooted my computer. Really? A kernel panic? At any rate- a little Googling, and an install of the latest Virtualbox (4.1.18) and I was all set.

One issue I ran into when installing Virtualbox revolved around Apple’s software installation security in Mountain Lion. There are now 3 levels of security for installable software: Mac App Store, Mac App Store and identified Developers and Anywhere.

Unfortunately, Oracle is not an “identified Developer”, so installation is impossible under the default settings. You’ll need to go to System Preferences->Security & Privacy ->General to change this. Then you can install Virtualbox. I’d recommend changing this setting back to the default after you are done.

I’ll conclude by stating that so far, Mountain Lion seems to be performing nicely, with few issues other than the above noted.

For further reading, check out the articles below:

http://arstechnica.com/apple/2012/07/os-x-10-8/

http://www.jongales.com/blog/2012/07/25/fixing-virtualenv-after-installing-mountain-lion/

https://www.virtualbox.org/ticket/10267

Building for Enterprise Search: A Systems View, Part 2

“It’s important to incorporate expected behaviors into modeling and monitoring on both applications and systems sides and how they interact with one another.”

When we left off, Michael Klatsky, VP of Systems Administration was telling me how important communication between the systems side and search side of is to developing an enterprise search solution. The process of building, testing, monitoring, adjusting, more testing, and more monitoring ensures systems function that way they are intended to function. Let’s resume our conversation where Michael discusses the tools he uses to ensure the system he’s building works the way the client wants it to. This is the second portion of a two part blog post.
*********************************************************************************************************************
Tools for BDD: Part 2

Karen: It’s sounding like the Search Team and Sys Admin Team need to have a good relationship and communicate often to ensure the system will accommodate the work the search team does.

Michael: Yes, search sometimes has to construct their scripts to conforms to systems. Testing is run on both sides, but small changes can affect others down the line, so it’s important to incorporate expected behaviors into modeling and monitoring on both applications and systems sides and how they interact with one another.

Karen: How do you make sure that happens?

Michael: We’re exploring some tools to help us make sure the machine will act just as we expect it to, like cucumber and cucumber nagios We’re using certain tools to facilitate the systems behaves in the way that we expect it to. We’re exploring cucumber for basic modeling and for testing. Cucumber is cool for testing because it returns values to you in colors. Red, meaning it failed, yellow meaning there’s problem, and green meaning its good. According to their docs, they instruct you to “keep running it until it’s a cucumber.”

Karen: Ah, I get it.

Michael: Right. And what cucumber nagios does is it takes cucumber and allows you to create a nagios monitoring check script. So if you pass, great, if you god red, nagios will throw an alert to the systems administrator so we have an opportunity to fix it before more is built.

Karen: Sounds like it’s an attentive way to build a system.

Michael: The only way to scale is to have machines do things for themselves. That’s the way to do it.

Karen: To automate.

Michael: Yes. Automation. Not to just set things up to automatically do configuration management beforehand, but to test afterwards to determine that your machine is behaving just as you (and your client) envisioned it.

For more information on how you can plan your enterprise search in cooperation with your systems administration team, contact us for a free consultation.

Building for Enterprise Search: A Systems View, Part 1

We need to determine what right looks like, and have the system behave that way.

I sat down with our VP of Systems Administration, Michael Klatsky to discuss some of his thoughts on how Systems Administration needs to work in concert with the Search Team to implement search technologies for clients. This is the first portion of a two part blog post.

**********************************************************************************************************************
Karen: You wanted to discuss how your approaching the systems side of search, and using a Behavior Driven Development (BDD) approach. Tell me about that.

Michael: Well, one of the problems we run into when systems brings up machines for enterprise search clusters is the search software (FAST ESP for example) is very particular about it’s environment- more so than many of the more common applications such as the Apache webserver. Properly configured DNS, specific environment variables, specific library versions have to be present. There are ownership and permissions that need to be in place, and performance metrics that must adhere to a given baseline. There can be slow disks can affect performance. There has to be the right amount of memory, and different classifications of systems roles. Currently, we have homegrown scripts that bring up systems, then we have other scripts we run to detect issues. These scripts will tell us if the system is ready for what we need it to do. We also monitor the systems for standard items such as diskspace, memory usage, as well as basic search functionality. For example we’ll run a quick search on say paper clips, and if comes back with results we know it’s running.

That’s what we’ve done historically. But now, we need to bring up larger numbers of machines,and have confidence that they will perform exactly as we expect. Additionally, we have a set of functional tasks that must be available without fail As we bring up clusters of larger numbers of machines, and as we need to be more nimble, how can we ensure that it will respond the way we expect it to?

Karen: This is where Behavior Driven Development comes in, right?

Michael: Right. There is a lot of discussion out there on Behavior Driven Development which would include behavior driven modelling, behavior driven monitoring, behavior driven architecture and infrastructure. So not only does a machine come up and is listening on these ports, but I can bring a machine up, I can go to that machine and I’m able to log in, install certain software, and peform tasks. I can go to another machine and perform a task. So, the question is, how do you model that? How do we ensure the system will behaves as it should?

Karen: So you’re looking at replicating the behavior of these systems so that every time we deploy something it will be the same way.

Michael: Right. And if a change is made, even a small change, we’ll see it right away because a system or service will fail and be able to fix it. Sometimes a service will fail silently. But we test and monitor constantly to ensure the system will do exactly what we expect it to do. It’s all a part of the build process.

Karen: Sounds like a smart approach.

Michael: Yes. And if we make a change, we’ll find out how that change will affect the rest of the system. For instance, we run tests and if something is wrong it should give you an error. For example if you change the location of your SSH keys. You may still be able to get into the machine by SSH, but one little change could make it impossible to SSH from one machine to another in the cluster. So rather than find that out after you begin your manual work on that, we make it part of the build process by constantly monitoring and testing the system as we build it.

Karen: It sounds like building a house and then realizing you have bricks out of place after it’s built.

Michael: Worse, it’s like building a house and realizing you forgot to build a door! At the very least while you are building, you can test, and let me know, “Hey! I don’t have a door to my house!” So that I can fix it before you move in.

There are certain things the search team needs to do to ensure their work will function in the system, like SSHing around the machines in the cluster–they need to be able to do that. There are certain ports that system need to be listening on, there are certain services that need to return a normal range of results. We need to define what a proper operation looks like. We can’t necessarily say that if we search for gold plated paperclips for example, that the search result should show 1000 results every time, that may or may not be the case–we don’t necessarily know if this is a proper result every time, but we should determine if the result returned is within a proper range of normal.

We’re defining what a proper operation looks like and ensure it functions that way. Part of the behavior driven model which is what I’m really interested in, we can set up a natural language looking config file. This config file should describe the actions or behaviors I expect. For example, when I go to ABC.com website and search for gold plated paperclips, I expect to see results. One result should be X. There should be more than Y results. When I return that result, I should be able to click on one result and go to that products feature list. Basically I’m describing how the customer will interact with the search, what I expect the customer to do, and design the system to respond with the customer’s actions in mind.

Karen: So your engineering it with the customer’s behaviors in mind.

Michael: That’s exactly what we’re doing. Then that if I look for a certain item, I get that result, describe the behavior of what the customer should do and make the system behave in cooperation with the customer behavior. We need to determine what right looks like, and have the system behave that way.

Karen: And what right looks like is really different for each client.

Micheal: Yes. You can write in somewhat natural English what that looks like. It’s not magic, but you still have to come up with specification of what right looks like. But you can do a lot of sophisticated things in this manner because you will know you’ll have a website that’s going to perform the way it’s suppose to perform. The bottom line is: Define what your systems should “look” like, deploy those systems using those definitions, and after deployment, test to ensure that those systems “look” like your definition.

For more information on how you can plan your enterprise search in cooperation with your systems administration team, contact us for a free consultation.

Leveraging your assets: Repurposing a physical server as an OpenVZ virtual server

As an active system administrator, part of my job is determining which systems require decommissioning due to the age of the OS or for other reasons. When readying a server for retirement, we’ll take the opportunity to move and upgrade the services running on that server. Often, we are then left with a perfectly good piece of hardware that has already been paid for and is still a valuable asset. A great way to leverage this equipment is virtualization- specifically, OpenVZ.

Oftentimes, servers are underutilized- especially when it comes to development work, or when running lower impact applications. Rather than deal with multiple users on a system, apache virtual hosts for multiple websites, worrying about secure file access, or one user or customer hogging a huge amount of resources, we have found that creating multiple virtual servers using OpenVZ is an ideal solution. I won’t delve into OpenVZ deployment other than to briefly note that on our CentOS and RedHat servers, installation is as simple as adding the correct repository and installing via yum (see here for more info). Once installed, a quick reboot into the new kernel and you are ready to roll. We are running 45-50 virtual servers (VEs, or containers) on one of our 2 QUAD core CPU, 8GB RAM servers, with plenty of room to spare. I recommend running ‘vzsplit’ to generate a good configuration basis for you VEs.

Once we have installed and configured OpenVZ on our new server, we are then able to deploy a large number of VEs for individual users or customers. Each VE provides the user with the ability to have root access, update and install their own software, deploy their own applications, etc. To the user- they are on their own complete system. Should their application misbehave, it won’t affect the others on the system.

Additionally, many resources can be adjusted on the fly. Running out of disk space? Increase it on the fly. Need more memory? Increase on the fly. Live resource management such as this is a very powerful way to leverage your hardware.

We currently are using OpenVZ for CMS development, custom programming development, building custom rpms, running websites, and various other testing where we need easily deployed servers which may or may not be needed of extended periods of time.

Virtualizing our own equipment in this way makes great economic sense for several reasons-. We are using a server which we already own, thus helping us increase our “green” sensibilities by keeping this system out of the landfill. We eliminate the need for more servers for development work. We can even host paying customers, thus deriving income from the hardware. Using our own equipment also helps us keep costs lower by lessening the need to move data and applications offsite to providers such as Slicehost. Slicehost has it’s place- and, in fact, we use them for certain applications- but they do not provide the versatility necessary for much of our development work.
In summary, by leveraging existing, underutilized or potentially retired hardware, you can save money in reduced additional hardware costs, derive income and help the environment. Additionally, the agility in development and deployment that we gain simply adds another layer to the economic advantages that we gain.That sounds like a good plan to me!

Sidekick Danger – It’s not the cloud, it’s the approach

I recently saw the headline,”T-Mobile and Microsoft/Danger data loss is bad for the cloud“, and, as an admin who works with cloud technology on a daily basis, viewed the headline with some concern. However, after reading the article itself, my only thought is “What does this have to do with the cloud?”. Reading through, we find that Microsoft/Danger stores your phone data (contacts, photos, etc) on it’s servers, and that the phone needs to constantly be in contact with the servers in order to maintain service and data. Unfortunately, the servers crashed, and all of the data was lost. Turn off your phone, lose all your data. Yet- this is exactly what the Sidekick service promises to protect you from- and it failed.

The problem with blaming this on the “cloud” is that, while technically, your cell phone and the Microsoft/Danger servers form a “cloud”, the failure lies with the servers, and those who administer those servers. It doesn’t matter whether those servers are virtual, or physical- if there is not a disaster recovery plan in place, and if that disaster recovery plan has not been tested- data will be lost. Your data. This is not a shortcoming of cloud computing- it is a result of depending on others to maintain your data. It also gives us caution when depending on external providers over the network to always be available. Services stop. Power fails. Disks die. Routing interruptions happen.

This is just network computing. But if the people (or companies) behind it all don’t do their own due diligence- disasters like the this, and worse will continue to happen.