Amazon Web Services: My EBS is stuck!

Many of us were affected by the Amazon EBS issues at the end of October 2012. If you had EC2 instances in us-east-1, you were likely affected by the issues.

Many of us were affected by the Amazon EBS issues at the end of October 2012. If you had EC2 instances in us-east-1, you were likely affected by the issues. EBS volumes appeared “stuck”, snapshots would not complete, etc.
While the issues have been resolved (although we required some Amazon Support intervention for a few volumes), we have recently noticed what appear to be some vestigial issues related to the EBS outage.
The symptoms are, simply, that EC2 instances appear to be extremely slow. I/O is almost non-existent. Luckily, the fix is simple: perform a stop/start on the instance (not a restart). Your instance will be provisioned to new hardware, and you’ll have to ensure you account for a different IP address, but other than that, you’ll be back in business.
Of course- for next time- make sure that your instances are in multiple Availability Zones and Regions.
Until next time….
Many of us were affected by the Amazon EBS issues at the end of October 2012. If you had EC2 instances in us-east-1, you were likely affected by the issues. EBS volumes appeared “stuck,” snapshots would not complete, etc.
While the issues have been resolved (although we required some Amazon Support intervention for a few volumes), we have recently noticed what appear to be some vestigial issues related to the EBS outage.
The symptoms are, simply, that EC2 instances appear to be extremely slow. I/O is almost non-existent. Luckily, the fix is simple: perform a stop/start on the instance (not a restart). Your instance will be provisioned to new hardware, and you’ll have to ensure you account for a different IP address, but other than that, you’ll be back in business.
Of course- for next time- make sure that your instances are in multiple Availability Zones and Regions.
-Michael Klatsky, VP of Systems Administration

New Systems and DevOps Blog

has lots of new approaches to discuss in terms of systems, cloud computing, DevOps, System Architecture, and how developers and systems staff need to communicate well and work together

Our VP of Systems Administration, Michael Klatsky has started a blog specifically discussing Systems.  Fresh from the AWS Summit 2012 in NYC, Michael has lots of new approaches to discuss in terms of systems, cloud computing, DevOps, System Architecture, and how developers and systems staff need to communicate well and work together for the best results in web development.  The blog is his own but we feel it’s a great technical resource for our colleagues in systems and web development.  You can take a look at his blog here. Michael welcomes commentary and discussion, and hopes to provide some shortcuts for fellow System Administrators.

TNR Global Helps Northampton’s Myers Information Systems Streamline Systems in the Cloud

“TNR Global did an outstanding job and we were impressed with their professionalism, industry knowledge and fee structure. We would certainly recommend them to anyone who was seeking to improve their enterprise search and/or cloud computing solutions.”

Hadley, MA–March 27, 2012–TNR Global announced today the successful completion of a cloud based solution for Northampton headquartered broadcasting communications company Myers Information Systems.

Myers engaged with TNR to move their systems into the cloud to improve their technical agility as well as to improve security, redundancy and promote these improvements to potential customers.

“As we set out to upgrade our existing application hosting service (ProHost), we prioritized the need to adopt the highest levels of security protocols. In addition, we sought to streamline the technology stack so that transaction speeds could be optimized while at the same time set-up and annual maintenance costs reduced. Our clients count on us to be proactive when it comes to adopting new standards and technology…not only to modernize our offerings over time but to increase productivity and lower operating expenses on their end as well.” said Crist Myers, President and CEO of Myers Information Systems.

Cloud based solutions for businesses have been growing rapidly over the last 3 years. Cloud technology offers increased flexibility, elasticity and scalability which allow businesses to maximize efficiencies to serve the needs of the business. Using the cloud in combination with virtualization techniques, businesses like Myers Information Systems can leverage rapid deployments and hardware efficiencies. Companies can get more value from every server by increasing the utilization rate of their servers, drastically reducing the number of servers they need to purchase and manage.

TNR was tasked to provide an assessment of Myers systems, give recommendations based upon their needs, and to provide reference implementation and documentation.

“We created a reference system and the documentation to allow them to deploy their own systems based on that reference by using Open Stack and Rackspace Cloud” said Michael Klatsky, the VP of Systems Administration and technical lead on the project from TNR Global.

As a result they can rapidly launch a new system for a client with all the tools they need in place and they have enhanced disaster recovery capability. This allows Myers to be agile in a more secure environment, and leave them better equipped to respond to their rapidly expanding market in broadcasting.

“Myers had been relying on physical servers housed locally or on site. With this cloud based virtualization, they will be able to save money and quickly deploy additional servers based in the cloud to service new clients immediately.” said Klatsky.

“TNR Global did an outstanding job and we were impressed with their professionalism, industry knowledge and fee structure. We would certainly recommend them to anyone who was seeking to improve their enterprise search and/or cloud computing solutions.” said Myers.

TNR Global (TNR) is a systems design and integration company focused on enterprise search and cloud computing solutions. TNR develops scalable web-based search solutions for content intensive websites for companies and organizations in the following industries: News Sites, Publishing, Web Directories, Information Portals, Web Catalogs, Education, Manufacturing and Distribution, Customer Service, and Life Sciences. For more information, please visit: www.tnrglobal.com

Myers Information Systems, Inc. has been developing broadcast management software since 1989. The Company provides technology and services for television, radio and other digital media providers designed to improve every aspect of their operations, from media management to scheduling, and from trafficking to reconciliation. For more information, please visit: www.myersinfosys.com

###

For more information on this topic or to schedule an interview, please contact Karen E. Lynn at 413-425-1499 or email at Karen@tnrglobal.com

Elasticsearch Evaluation White Paper Released: Promising for Big Data

We believe that Elasticsearch is a product that everyone working in the field of big data will want to take a look at.

There are many new technologies emerging around search, and we’ve been investigating several of them for our clients. Search has never been “easy” but Elasticsearch attempts to make it at least easier. Elasticsearch is billed to be “built for the cloud,” and with so many companies moving into the cloud, it seems like a natural that search would move there too.  This paper is designed to show you just how Elasticsearch works by setting up a cluster and feeding it data.  We also let you know what tools we use so you can test out the technology and we include a rough sketch of code as well. Finally, we make conclusions about how Elasticsearch can help with problems like Big Data and other search related uses.

Elasticsearch is an open source technology developed by one developer, Shay Bannon. This paper is simply a first look at elasticsearch and is not associated with an additonal product or variation of elaticsearch. The appeal for big data is due to elasticsearch’s wonderful ability to scale with growing content, which has largely been associated with the “big data problem” we all keep hearing about. It’s very easy to add new nodes and it handles the balancing of your data across the available nodes. It handles the failure of nodes in a graceful way that is important in a cloud environment. And lastly, we simply evaluate and test the technology. We really don’t believe there is a one size fits all technology in the realm of enterprise search, it is really highly dependent upon your systems, how many documents you have, how much unstructured data you have, and how you want your site to function. But that said– in terms of storing big data, it is as capable as any Lucene based product; it can handle a much larger load that the current Solr release as the notion of breaking the index up into smaller chunks is “baked in” to the product.

Here is an except from the paper:

“Products like Elasticsearch that lack a document processing component entirely become more attractive. In fact, most projects that involve a data set large enough to qualify as “big data”³² are building their own document processing stages anyway as part of their ETL cycle.”

If you are interested in downloading this free White Paper, sign up with us here.

If you would like help using Elasticsearch with your search project, contact us.

Building for Enterprise Search: A Systems View, Part 1

We need to determine what right looks like, and have the system behave that way.

I sat down with our VP of Systems Administration, Michael Klatsky to discuss some of his thoughts on how Systems Administration needs to work in concert with the Search Team to implement search technologies for clients. This is the first portion of a two part blog post.

**********************************************************************************************************************
Karen: You wanted to discuss how your approaching the systems side of search, and using a Behavior Driven Development (BDD) approach. Tell me about that.

Michael: Well, one of the problems we run into when systems brings up machines for enterprise search clusters is the search software (FAST ESP for example) is very particular about it’s environment- more so than many of the more common applications such as the Apache webserver. Properly configured DNS, specific environment variables, specific library versions have to be present. There are ownership and permissions that need to be in place, and performance metrics that must adhere to a given baseline. There can be slow disks can affect performance. There has to be the right amount of memory, and different classifications of systems roles. Currently, we have homegrown scripts that bring up systems, then we have other scripts we run to detect issues. These scripts will tell us if the system is ready for what we need it to do. We also monitor the systems for standard items such as diskspace, memory usage, as well as basic search functionality. For example we’ll run a quick search on say paper clips, and if comes back with results we know it’s running.

That’s what we’ve done historically. But now, we need to bring up larger numbers of machines,and have confidence that they will perform exactly as we expect. Additionally, we have a set of functional tasks that must be available without fail As we bring up clusters of larger numbers of machines, and as we need to be more nimble, how can we ensure that it will respond the way we expect it to?

Karen: This is where Behavior Driven Development comes in, right?

Michael: Right. There is a lot of discussion out there on Behavior Driven Development which would include behavior driven modelling, behavior driven monitoring, behavior driven architecture and infrastructure. So not only does a machine come up and is listening on these ports, but I can bring a machine up, I can go to that machine and I’m able to log in, install certain software, and peform tasks. I can go to another machine and perform a task. So, the question is, how do you model that? How do we ensure the system will behaves as it should?

Karen: So you’re looking at replicating the behavior of these systems so that every time we deploy something it will be the same way.

Michael: Right. And if a change is made, even a small change, we’ll see it right away because a system or service will fail and be able to fix it. Sometimes a service will fail silently. But we test and monitor constantly to ensure the system will do exactly what we expect it to do. It’s all a part of the build process.

Karen: Sounds like a smart approach.

Michael: Yes. And if we make a change, we’ll find out how that change will affect the rest of the system. For instance, we run tests and if something is wrong it should give you an error. For example if you change the location of your SSH keys. You may still be able to get into the machine by SSH, but one little change could make it impossible to SSH from one machine to another in the cluster. So rather than find that out after you begin your manual work on that, we make it part of the build process by constantly monitoring and testing the system as we build it.

Karen: It sounds like building a house and then realizing you have bricks out of place after it’s built.

Michael: Worse, it’s like building a house and realizing you forgot to build a door! At the very least while you are building, you can test, and let me know, “Hey! I don’t have a door to my house!” So that I can fix it before you move in.

There are certain things the search team needs to do to ensure their work will function in the system, like SSHing around the machines in the cluster–they need to be able to do that. There are certain ports that system need to be listening on, there are certain services that need to return a normal range of results. We need to define what a proper operation looks like. We can’t necessarily say that if we search for gold plated paperclips for example, that the search result should show 1000 results every time, that may or may not be the case–we don’t necessarily know if this is a proper result every time, but we should determine if the result returned is within a proper range of normal.

We’re defining what a proper operation looks like and ensure it functions that way. Part of the behavior driven model which is what I’m really interested in, we can set up a natural language looking config file. This config file should describe the actions or behaviors I expect. For example, when I go to ABC.com website and search for gold plated paperclips, I expect to see results. One result should be X. There should be more than Y results. When I return that result, I should be able to click on one result and go to that products feature list. Basically I’m describing how the customer will interact with the search, what I expect the customer to do, and design the system to respond with the customer’s actions in mind.

Karen: So your engineering it with the customer’s behaviors in mind.

Michael: That’s exactly what we’re doing. Then that if I look for a certain item, I get that result, describe the behavior of what the customer should do and make the system behave in cooperation with the customer behavior. We need to determine what right looks like, and have the system behave that way.

Karen: And what right looks like is really different for each client.

Micheal: Yes. You can write in somewhat natural English what that looks like. It’s not magic, but you still have to come up with specification of what right looks like. But you can do a lot of sophisticated things in this manner because you will know you’ll have a website that’s going to perform the way it’s suppose to perform. The bottom line is: Define what your systems should “look” like, deploy those systems using those definitions, and after deployment, test to ensure that those systems “look” like your definition.

For more information on how you can plan your enterprise search in cooperation with your systems administration team, contact us for a free consultation.

Cloud Platforms: The Promise vs. The Reality

Recently our VP of Search, Michael McIntosh sat down and talked to me about his thoughts on cloud computing and what businesses should be aware of when investing in the cloud.


Karen: So, how does enterprise search and cloud computing fit together?  What’s good about it for companies?

Michael: The advent of cloud computing makes it a lot easier for companies to get into search without investing a huge sum of money up front. Some of the pay-as-you-go computing approaches make it possible to do things that in the past wouldn’t have been financially viable such as natural language processing on content.  Something that could have taken days, weeks, or even months can now take much less time by throwing more hardware at a problem for a shorter time span.

For example, you could throw 20 machines at a problem for 12 hours and do a bunch of computations in a massively parallel way, and then stop it as soon as it’s done….versus the old model where you have to buy all the hardware, or rent it, and make sure it’s not underutilized so you make your investment back.

But if you need a lot of processing power for a short amount of time, it’s really quite amazing what we can do now with an approach like this.

Karen: Is this a new technology for TNR?

Michael: TNR has been using cloud computing platforms for several years now—3 or 4 years.  Cloud computing in itself is sort of a buzz word, because distributed processing and hosting has been around for a while, but the pay-as-you-go computing model is relatively new. So we have a great deal of experience with the reality of cloud computing platforms vs. the promise of cloud computing platforms.

Karen: So, what is the difference between the “promise” and the “reality” of cloud computing platforms?

Michael: Well, A lot of people think of cloud computing as this magical thing; all their problems will be solved and it will be super dependable because there are very large businesses like Amazon running the underlying infrastructure and you don’t have to worry about it.

But, as the physical infrastructure becomes easier to deploy, other critical factors come into play. You won’t have to worry about the physical logistics of getting hardware in place. But, you will have to manage multiple instances, you have to make sure that when you provision temporary processing resources, you have to remember to retire it when it’s no longer needed. Otherwise you’ll be paying more than you need to. Since virtualization uses physical hardware you do not control or maintain—there are fewer warning signs to a potential systemic failure. Now Amazon, which is the one we use the most, does a good job of backing up instances and making things available to you even when there are failures. But we’ve had problems where we’ve lost entire zones. Even if we’ve had multiple machines configured with fault tolerance, Amazon has experienced outages that have taken entire regions offline despite every conservative effort to ensure continuous up time. So we’ve had our entire service clusters go down because of problems Amazon was having. It becomes critically important for companies to develop and maintain a disaster and recovery plan. Companies need to make sure things that are critical are backed up in multiple locations. Now historically, this has been hard to do because companies typically buy enough equipment for production needs, but not enough equipment for development and staging environments.

Karen: That sounds like a costly mistake.

Michael: It can be very costly because people often develop disaster recovery plans without ongoing testing to confirm the approach continues to work. If the approach is flawed, when you do suffer an outage, you can be offline for hours, days or weeks. Even worse, you may not be able to recover your data at all.

Karen: That sounds extremely costly.

Michael: Yes, it’s no fun at all.

There are upsides though. Some pluses are that cloud computing forces you to be more formal about how you manage your technical infrastructure. For example, for training purposes; with a new developer, we can just give them a copy of a production system, and have them go to town on it, make modifications, whatever without risking the actual production servers. And if they make a mistake, which is human (you have to factor in human error), you can reprovision a brand new one, and retire the one that is fouled up. Instead of having to spend hours and hours trying to fix the problem on the machine they were working on.

Karen: This sounds like it’s a lot more flexible and time efficient, with a layer of safety built in.

Michael: Yes. Cloud computing also comes in handy if you ever have a security breach. If a hacker gets into the system and the system is compromised–if this happens, system administrators can go in and try to correct the problem. But hackers can often install backdoors to get in and out. So a cloud platform with a good disaster contingency and backup can allow system administrators to bring a whole instance down and do the patch on a whole new machine without the security breaches and patches in place. This is pretty easy to do with a cloud platform.

Karen: So TNR can help their clients do all these things?

Michael: Yes, we’ve worked with large customers over many years and we’ve seen a wide variety of things that can possibly go wrong, and we’ve been through several physical service outages both with Amazon Web Services and with Rackspace.

Cloud computing in itself is no panacea, but if you have the technical and organization proficiency to effectively leverage the platform, it can be a powerful tool used to accelerate your company’s rate of innovation.

If you are assessing the cloud as a solution in your business, contact us.  There are a variety of options for hosting that can save your company money and minimize outages. Let us show you the option that is the best fit for your organization.

TNR Global and STCC organized CloudCamp Western Massachusetts

 width=

TNR Global was the co-organizer and a sponsor of CloudCamp Western Massachusetts that took place on April 20, 2010, 2:30pm-7pm, at the National Science Foundation funded National Center for Information and Communications Technologies (ICT Center) at Springfield Technical Community College. This event was co-organized by CloudCamp co-founder Dave Nielsen and the ICT Center.

Developers, decision makers, end users, and vendors from MA, CT, VT, and surrounding states participated and presented at the event. CloudCamp Western Massachusetts provided a central point for bringing together local academia and businesses. The ICT Center streamed live video of the event to other technology community colleges around the nation. http://www.cloudcamp.org/westernmass

Presentations can be seen here.

Pictures can be seen here.

The speakers included:
David Irwin, UMass Amherst CS Department
Alex Barnett, Intuit Partner Program
Rich Roth, CEO, TNR Global
Chris Bowen, Microsoft Azure
Jim Kurose, Mass Green High Performance Computing Center

TNR Global is co-organizing CloudCamp Western Massachusetts

 width= TNR Global is co-organizing and sponsoring  CloudCamp Western Massachusetts that will take place on  April 20, 2010, 2:30pm-7pm, at the National Science Foundation funded National Center for Information and Communications Technologies (ICT Center) at Springfield Technical Community College (1 Federal St, ICT Center, STCC, Springfield, MA 01105).   This event is co-organized by CloudCamp co-founder Dave Nielsen and the ICT Center.

Developers, decision makers, end users, and vendors from MA, CT, VT, and surrounding states are invited to participate: attend, present, and/or sponsor.   CloudCamp Western Massachusetts will provide a central point for bringing together local academia and businesses. The ICT Center will stream live video of the event to other technology community colleges around the nation.   http://www.cloudcamp.org/westernmass

For more info, please contact Natasha Goncharova at corp@tnrglobal.com

Cloud Enabled Personalized Genomics

Focus on expertise, available, on-demand resources, and the agility to experiment with big ideas will continue to draw some personal genomics researchers to public cloud computing.

Personalized medicine is a goal of the Department of Health and Human Services. It is a driver of genomic research. It is one version of the future of medicine, using our unique genetic code toward the prevention of disease and the use of more effective or safer tailored drug therapies. Cloud computing enables access to the computational resources needed, on demand, for the data analysis needed to lay the groundwork for revolution in health care. Continue reading “Cloud Enabled Personalized Genomics”