Posts Tagged ‘Cloud’

Amazon Web Services: My EBS is stuck!

by Karen Lynn
Many of us were affected by the Amazon EBS issues at the end of October 2012. If you had EC2 instances in us-east-1, you were likely affected by the issues. EBS volumes appeared “stuck”, snapshots would not complete, etc.
While the issues have been resolved (although we required some Amazon Support intervention for a few volumes), we have recently noticed what appear to be some vestigial issues related to the EBS outage.
The symptoms are, simply, that EC2 instances appear to be extremely slow. I/O is almost non-existent. Luckily, the fix is simple: perform a stop/start on the instance (not a restart). Your instance will be provisioned to new hardware, and you’ll have to ensure you account for a different IP address, but other than that, you’ll be back in business.
Of course- for next time- make sure that your instances are in multiple Availability Zones and Regions.
Until next time….
Many of us were affected by the Amazon EBS issues at the end of October 2012. If you had EC2 instances in us-east-1, you were likely affected by the issues. EBS volumes appeared “stuck,” snapshots would not complete, etc.
While the issues have been resolved (although we required some Amazon Support intervention for a few volumes), we have recently noticed what appear to be some vestigial issues related to the EBS outage.
The symptoms are, simply, that EC2 instances appear to be extremely slow. I/O is almost non-existent. Luckily, the fix is simple: perform a stop/start on the instance (not a restart). Your instance will be provisioned to new hardware, and you’ll have to ensure you account for a different IP address, but other than that, you’ll be back in business.
Of course- for next time- make sure that your instances are in multiple Availability Zones and Regions.
-Michael Klatsky, VP of Systems Administration

Elasticsearch Evaluation White Paper Released: Promising for Big Data

by Karen Lynn

There are many new technologies emerging around search, and we’ve been investigating several of them for our clients. Search has never been “easy” but Elasticsearch attempts to make it at least easier. Elasticsearch is billed to be “built for the cloud,” and with so many companies moving into the cloud, it seems like a natural that search would move there too.  This paper is designed to show you just how Elasticsearch works by setting up a cluster and feeding it data.  We also let you know what tools we use so you can test out the technology and we include a rough sketch of code as well. Finally, we make conclusions about how Elasticsearch can help with problems like Big Data and other search related uses.

Elasticsearch is an open source technology developed by one developer, Shay Bannon. This paper is simply a first look at elasticsearch and is not associated with an additonal product or variation of elaticsearch. The appeal for big data is due to elasticsearch’s wonderful ability to scale with growing content, which has largely been associated with the “big data problem” we all keep hearing about. It’s very easy to add new nodes and it handles the balancing of your data across the available nodes. It handles the failure of nodes in a graceful way that is important in a cloud environment. And lastly, we simply evaluate and test the technology. We really don’t believe there is a one size fits all technology in the realm of enterprise search, it is really highly dependent upon your systems, how many documents you have, how much unstructured data you have, and how you want your site to function. But that said– in terms of storing big data, it is as capable as any Lucene based product; it can handle a much larger load that the current Solr release as the notion of breaking the index up into smaller chunks is “baked in” to the product.

Here is an except from the paper:

“Products like Elasticsearch that lack a document processing component entirely become more attractive. In fact, most projects that involve a data set large enough to qualify as “big data”³² are building their own document processing stages anyway as part of their ETL cycle.”

If you are interested in downloading this free White Paper, sign up with us here.

If you would like help using Elasticsearch with your search project, contact us.

Cloud Platforms: The Promise vs. The Reality

by Karen Lynn

Recently our VP of Search, Michael McIntosh sat down and talked to me about his thoughts on cloud computing and what businesses should be aware of when investing in the cloud.


Karen: So, how does enterprise search and cloud computing fit together?  What’s good about it for companies?

Michael: The advent of cloud computing makes it a lot easier for companies to get into search without investing a huge sum of money up front. Some of the pay-as-you-go computing approaches make it possible to do things that in the past wouldn’t have been financially viable such as natural language processing on content.  Something that could have taken days, weeks, or even months can now take much less time by throwing more hardware at a problem for a shorter time span.

For example, you could throw 20 machines at a problem for 12 hours and do a bunch of computations in a massively parallel way, and then stop it as soon as it’s done….versus the old model where you have to buy all the hardware, or rent it, and make sure it’s not underutilized so you make your investment back.

But if you need a lot of processing power for a short amount of time, it’s really quite amazing what we can do now with an approach like this.

Karen: Is this a new technology for TNR?

Michael: TNR has been using cloud computing platforms for several years now—3 or 4 years.  Cloud computing in itself is sort of a buzz word, because distributed processing and hosting has been around for a while, but the pay-as-you-go computing model is relatively new. So we have a great deal of experience with the reality of cloud computing platforms vs. the promise of cloud computing platforms.

Karen: So, what is the difference between the “promise” and the “reality” of cloud computing platforms?

Michael: Well, A lot of people think of cloud computing as this magical thing; all their problems will be solved and it will be super dependable because there are very large businesses like Amazon running the underlying infrastructure and you don’t have to worry about it.

But, as the physical infrastructure becomes easier to deploy, other critical factors come into play. You won’t have to worry about the physical logistics of getting hardware in place. But, you will have to manage multiple instances, you have to make sure that when you provision temporary processing resources, you have to remember to retire it when it’s no longer needed. Otherwise you’ll be paying more than you need to. Since virtualization uses physical hardware you do not control or maintain—there are fewer warning signs to a potential systemic failure. Now Amazon, which is the one we use the most, does a good job of backing up instances and making things available to you even when there are failures. But we’ve had problems where we’ve lost entire zones. Even if we’ve had multiple machines configured with fault tolerance, Amazon has experienced outages that have taken entire regions offline despite every conservative effort to ensure continuous up time. So we’ve had our entire service clusters go down because of problems Amazon was having. It becomes critically important for companies to develop and maintain a disaster and recovery plan. Companies need to make sure things that are critical are backed up in multiple locations. Now historically, this has been hard to do because companies typically buy enough equipment for production needs, but not enough equipment for development and staging environments.

Karen: That sounds like a costly mistake.

Michael: It can be very costly because people often develop disaster recovery plans without ongoing testing to confirm the approach continues to work. If the approach is flawed, when you do suffer an outage, you can be offline for hours, days or weeks. Even worse, you may not be able to recover your data at all.

Karen: That sounds extremely costly.

Michael: Yes, it’s no fun at all.

There are upsides though. Some pluses are that cloud computing forces you to be more formal about how you manage your technical infrastructure. For example, for training purposes; with a new developer, we can just give them a copy of a production system, and have them go to town on it, make modifications, whatever without risking the actual production servers. And if they make a mistake, which is human (you have to factor in human error), you can reprovision a brand new one, and retire the one that is fouled up. Instead of having to spend hours and hours trying to fix the problem on the machine they were working on.

Karen: This sounds like it’s a lot more flexible and time efficient, with a layer of safety built in.

Michael: Yes. Cloud computing also comes in handy if you ever have a security breach. If a hacker gets into the system and the system is compromised–if this happens, system administrators can go in and try to correct the problem. But hackers can often install backdoors to get in and out. So a cloud platform with a good disaster contingency and backup can allow system administrators to bring a whole instance down and do the patch on a whole new machine without the security breaches and patches in place. This is pretty easy to do with a cloud platform.

Karen: So TNR can help their clients do all these things?

Michael: Yes, we’ve worked with large customers over many years and we’ve seen a wide variety of things that can possibly go wrong, and we’ve been through several physical service outages both with Amazon Web Services and with Rackspace.

Cloud computing in itself is no panacea, but if you have the technical and organization proficiency to effectively leverage the platform, it can be a powerful tool used to accelerate your company’s rate of innovation.

If you are assessing the cloud as a solution in your business, contact us.  There are a variety of options for hosting that can save your company money and minimize outages. Let us show you the option that is the best fit for your organization.

Postfix SMTP AUTH w/TLS

by Michael Klatsky

One of the widely discussed issues with Amazon EC2 instances is the inability to reliably send email from the instances. In all too many cases, email from EC2  instances is automatically categorized as spam by the various relay databases, and by many ISP’s and carriers. There are several solutions, with the most common being a smarthost setup using either an external smarthost smtp service, such as http://authsmtp.com, or using an existing smtp server within our infrastructure. Read the rest of this entry »

Cloud Enabled Personalized Genomics

by admin

Personalized medicine is a goal of the Department of Health and Human Services. It is a driver of genomic research. It is one version of the future of medicine, using our unique genetic code toward the prevention of disease and the use of more effective or safer tailored drug therapies. Cloud computing enables access to the computational resources needed, on demand, for the data analysis needed to lay the groundwork for revolution in health care. Read the rest of this entry »

Will Amazon AWS save me money?

by Michael Klatsky

One of this first questions we asked when deciding to sue Amazon’s AWS services was: Will we save money?

At first, your EC2 servers may be simply architected, perhaps small instances. Once a more serious commitment is made for a more robust architecture at Amazon, there will be additional costs introduced. For example, a small instance, running a 2 disk 100GB EBS volume w/RAID0 plus monitoring and a static IP, and the cost goes from the current ~$68 monthly per system to ~$88 monthly per system. Bump the instances to large instances (likely needed for any server running mysql, or any other equally intensive application) and that cost goes to ~$265 per instance. Add in the costs of the additional services like bandwidth, Static IP, Cloudwatch etc and the costs can quickly escalate. Of course, upfront payments for Reserved Instances can drastically reduce the costs further.

However, the savings in development and deployment costs I think far outweighs a narrower gap in the savings between physical and AWS servers, and the real MRC on the AWS servers will likely be lower for a given amount of computing resources.

So- can you save money? Yes. In some cases, it will be a direct apples to oranges savings of hard dollars. In other cases, the agility gained will provide the greatest savings. In most cases- a combination of both will drive your cost savings.

To learn more about how operating in the cloud can save your company money, contact us for a free consultation.

Sidekick Danger – It’s not the cloud, it’s the approach

by Michael Klatsky

I recently saw the headline,”T-Mobile and Microsoft/Danger data loss is bad for the cloud“, and, as an admin who works with cloud technology on a daily basis, viewed the headline with some concern. However, after reading the article itself, my only thought is “What does this have to do with the cloud?”. Reading through, we find that Microsoft/Danger stores your phone data (contacts, photos, etc) on it’s servers, and that the phone needs to constantly be in contact with the servers in order to maintain service and data. Unfortunately, the servers crashed, and all of the data was lost. Turn off your phone, lose all your data. Yet- this is exactly what the Sidekick service promises to protect you from- and it failed.

The problem with blaming this on the “cloud” is that, while technically, your cell phone and the Microsoft/Danger servers form a “cloud”, the failure lies with the servers, and those who administer those servers. It doesn’t matter whether those servers are virtual, or physical- if there is not a disaster recovery plan in place, and if that disaster recovery plan has not been tested- data will be lost. Your data. This is not a shortcoming of cloud computing- it is a result of depending on others to maintain your data. It also gives us caution when depending on external providers over the network to always be available. Services stop. Power fails. Disks die. Routing interruptions happen.

This is just network computing. But if the people (or companies) behind it all don’t do their own due diligence- disasters like the this, and worse will continue to happen.

From the Cloud Camp

by Michael Klatsky

Yesterday’s trip to Cloud Camp Boston was most interesting. The keynote gave a good overview of what cloud computing is, where it came from and where it is headed. Read the rest of this entry »

Amazon EC2 system restore

by Michael Klatsky

Recently, one of our small EC2 instances failed.  While we had Nagios monitoring it, Nagios only provides alerts when services fail, or when the host goes down. In this case, the failure was on Amazon’s side- the hardware where our instance resided was failing.

Read the rest of this entry »

Amazon EC2 ’steals’ from you

by Michael Klatsky

As we implement more systems in the EC2 architecture, we are noticing a not so insignificant amount cpu cycles ’stolen’. What is a ’steal’ time? It is CPU time that is taken by the Xen hypervisor for something else other than your processes- from what I have read, other people’s processes. What we need to understand is how this affects performance. Does it truly matter? We have one virtual system that consistently has steal time of between 6-12%. That would mean that 6-12% of the CPU time we pay for is being used for instances other than our own. We will have to research this more to see what the true impact is on our systems, and if there is a way to mitigate it.