Updating CentOS (RHEL, Fedora)

This is just a very concise summary to guide you through the typical update process of a CentOS based Linux server that has no control panel installed on top of it. This post will also appear in our virtual server hosting BLOG:

  1. run yum check-update from the shell.
    This will give you a list of newly available packages for your distribution based on the repositories you have defined. This list will typically not be too long for a well maintained server, unless the distribution itself has just undergone a major update (such as from CentOS 5.7 to 5.8 recently).
  2. check the packages listed and ensure that your currently running applications will still be compatible with the new versions of any packages updated.
  3. make backups of any individual settings you have made for any packages that are going to be updated (httpd.conf, php.ini, etc.). Usually, these will not be touched, but it doesn’t hurt to make sure you have a copy (in addition to the regular backups you should be doing!).
  4. once you have confirmed that everything should still be fine after the update, from the shell, run yum update.
    This will start the update process, and you will actually have to confirm the update before it is really being processed (last chance to say “no”!).
  5. once complete, restart affected services (such as httpd, for example), or reboot your server if vital system packages have been updated (kernel, libc, …)

 

My server crashed…what now?

So now it has finally happened…your server has crashed beyond repair. It won’t boot, or what it boots has little resemblance with what you expect it to come up with, remote console shows a manual file system check is needed, grub cannot find a kernel, your root partition is gone, Windows says it cannot find any disks anymore, and other nightmares you thought could happen to everyone else but not you.

Now what?

First of all: DON’T PANIC!

For those of you who are familiar with Adam’s Hitchhiker’s Guide to the Galaxy this advice sounds more than familiar, and it is in fact the very first action to take. Panic will cloud your mind, and you will take much longer for everything you do than when you do it calmly and even take the time to think twice before you do anything at all.

  1. Assess the situation: Are you able to try fixing it yourself, or are you not familiar enough with the error displayed, or the symptoms coming up?
  2. Do not mess with the system too much: Even if you do not have a managed server and therefore have to have a look yourself first, you are not on site, and you do not have the means to do a hardware repair, apart from the risk of the damage becoming larger the longer the remote actions take, and the more diversified the approach becomes in an attempt to salvage what is left.
  3. Ask your provider to step in: If you have a managed server, they will have to handle it anyway, and depending on the SLA in place, will provide you with a new machine in between, a failover solution, etc. If you do not have a managed server, your provider is still your best guess for actual on site operations as they are the ones who have physical access to the server, and if they are not proficient, they should be able to bring someone in faster than you who can have a look at the machine. If you have hired your own sysadmin (who is not on site either, however), your ISP and your sysadmin can communicate to discuss the best course of action.
  4. In the meantime, have your provider – with or without respective SLA – set up a new server, a replacement VPS, a shared hosting account, in other words, anything that allows you to bring back your site saying “We are performing maintenance / crash recovery / you name it”, i.e. something that brings you back in touch with your customers so they are aware you are on to the situation. Use twitter, facebook, your customer portal (if you have one on another machine), etc. to let your clients know who of them is, and why they are affected.
  5. Depending on the interim solution, get ready to bring your backups back online (you do have them, don’t you?). In a managed environment, your provider will most likely have them, otherwise (and in fact no matter what) you should have an external backup somewhere as well.
  6. Once you have a new production server (or the old one repaired), and have set it up with its operating system, updated it to the latest patchset and security fixes, and brought it to a state that matches the environment before the crash, use your backups and perform data recovery. Do not go live again yet, however:
  7. Test, test, and test again if everything is working according to specs and expectations. Naturally, you will want to be back online as fast as possible. On the other hand, you want to avoid nasty surprises such as inconsistent databases, mismatched orders, invoices, etc. It is up to your judgement to find the right balance here.
  8. Once everything is back to normal, write up an Incident Report and send it out to all customers who were affected by the outage, and handle compensation as per your own SLA and TOS.

 

Managed or not?

We have had a similar post back in July 20211 (cf. here) , so why are we bringing this up again? Recently, we have had a large surge in two categories of orders: unmanaged lowend VPS (256MB memory and the likes, for use as DNS server, etc.), and fully managed servers.

Customers are increasingly aware of the need to back up their sites with a well managed server. Typically, the managed option often only extends to managing the operating system (and possibly hardware) of the server in question, i.e. updating the operating system with the latest security patches (something that an “intelligent” control panel, such as cPanel, can handle itself, mostly), latest package upgrades, and generally making sure the server works as intended.

In most cases, managed does not, however, cover application issues. This, however, is a crucial point: You as the customer need to be sure that the server administration side of your enterprise speaks the same language as the application development side. Nothing is worse than an eager sysadmin updating a software package without consulting the developers who, incidentally, depend on the older version for the entire site to run smoothly. With nowadays globalisation, this can cause you additional grief – often your developers are from a different company than your ISP, and often they (as is natural) will defend themselves in taking the blame. It will leave you and your enterprise crippled or hindered.

What do we advise?

  1. Don’t save money on a sysadmin.
  2. Make sure your sysadmin talks to your developers and understands what they need.
  3. Make sure your sysadmin has a basic understanding of your application in case of emergencies.
  4. Make sure your staff: your sysadmin and developers coordinate updates and upgrades.
  5. Make sure you have a working test environment where you can run the updates and upgrades in a sandbox to see if afterwards things still work the way they are expected to run.
  6. Have a teamleader coordinate your sysadmin(s) and developer(s), or take this role upon yourself.

How much is it going to cost you?

Fully managed packages vary in cost – the normal sysadmin packages that deal with the operating system only will up your budget by anything between £ 20 to £200 per month, if you want the sysadmin to be an integral part of your team and support your application as well (in terms of coordinated server management), then the price will be more to the higher end of that range, but might possibly also include some support for the application as well already.

Who to hire?

Get someone with experience. There are sysadmins out there who have decades of experience and know the do’s and dont’s, and there are sysadmins who consider themselves divine just because they have been “into linux for 2 years”. A sysadmin is not someone who jumps at the first sight of an available package upgrade and yum installs 200 dependencies to claim he has a system up to date. A sysadmin is someone who understands the implications of a) upgrading and b) not upgrading. A sysadmin will weigh these pros and cons and explain them to you before making suggestions as to what to do. A sysadmin is someone you trust to even take this decision off your shoulder so you can run your business instead of having to worry whether the next admin cowboy is going to blow up your server. A sysadmin is someone who knows not only how to keep a system alive, but also how to bring a failed system back to life.

These are just some general guidelines, contact us for further advice, we are happy to help!

 

 

E3 Sandybridge: 32GB RAM now affordable

Prices for the 8GB 1333 DDR3 ECC unbuffered RAM modules for the E3 based servers have come down considerably lately. While still not linear compared to the 4 GB modules, the price difference is now a lot more affordable than it used to be. This could be a major incentive for people to switch from X34xx based machines to the newer E3 CPUs since the latter’s performance is much better. Memory intensive applications such as Minecraft servers and companies offering such will be interested in this new option as typically the E3 has plenty of CPU power left when all 16GB of memory are in use. Going for 32GB of memory then instead of having to rent another full 16GB dedicated server is a lot cheaper and allows for a better cost/performance ratio.

We have started offering our customers the E3 Sandybridge machines to be equipped with 4x8GB modules now as well. The price for a single E3 with 32GB memory is considerably cheaper than 2x E3 with 16GB memory.

 

 

Backups

When we mention backups, everybody will think, “hey, my data is safe anyway, isn’t it? I mean this is a reputable ISP, sure they have enterprise disks and RAID, and whatnot? Or don’t they?!”.

There are two important NBs when it comes to backing up your data, be it on a Virtual Private Server (VPS) or on dedicated servers:

  1. Don’t assume anyone but you is going to back up your data.
  2. Don’t assume that even if your ISP backs up your data that you shouldn’t as well.

By default, it is safe to assume that your provider does not back up your data. Typically, explicit backups will cost you some additional money, and even then you are well advised to ask your ISP what they are backing up, and how, how often they do they it, and where the data is being kept.

A couple of bad backup solutions:

  • different disk (or, worse even, partition) on the same machine;
  • some external drive, like a USB disk;

A couple of decent/workable backup solutions:

  • standby server in the same DC;
  • ftp space on some other machine in the same DC;
  • making sure disks are RAID (this is not, however, a real backup strategy, it just helps to gain some redundancy and should be treated more like a complementing measure; no ISP should, unless explicitly asked to, offer you a setup without RAID; a disk failure in a RAID setup at least allows online recovery in hot swap environments);

A good backup solution:

  • generation driven backup strategy on a server or backup system (such as IBM’s TSM which can backup to SAN and tape, or bacula, which is free of charge and can perform full/differential/incremental backups for example) in a different DC;

Any ISP employing one of the bad solutions means that you should explicitly look for a service that allows you to at least back up your data somewhere else as well, in a different data centre. You should also consider this when your ISP can only offer a backup solution that can be considered workable at best. If your ISP, however, can prove that they are using an enterprise solution to back up your data, then you can assume that your data is safe – nevertheless you should back up your data as well. At least make dumps and tgz’s of your most important data, download it, and store it safely away, burn it to CD/DVDs, etc. Be prepared for the worst case, backups can go corrupt, you might accidentally delete all instances of one file desperately needed, etc.

Backing up is only half the story – data backed up is pretty, but you also need to be able to restore it. Make sure to test your backup/restore strategy. Back up data, restore it, see if it works. Repeat this in regular intervals, and repeat it whenever you do major changes to your application or when you need to document milestones, etc. Ask yourself, how much is your data worth to you? What if you lose everything? When it comes to your data, it comes to your online presence, your enterprise and company: Don’t assume. Make sure.

 

Checking connectivity

There are various tools to measure and check connectivity of your dedicated server and virtual private server. Below we will give an overview over the most common ones, along with their most widespread use.

  1. ping
    ping is probably the most well known tool to check whether a server is up or not. A ping is a small packet of traffic sent from the originating machine to the destination machine which expects a so called echo reply to see whether the destination host is up and running and responding. The typical Linux syntax is:
    ping [-c INTEGER] [-n] [-q] HOSTNAME | IP address
    with -c followed by the number of packets to send, -n for numeric (IP address only – no dns resolution), and -q for quiet output so that only the summary lines will be displayed. The output will display how long it takes for each packet (or the packets on average) to travel back and forth between the source and destination host (round trip time). Large deviations in the min/avg/max values may indicate network congestion, whereas significant packet loss may indicate general network outages or congestion to a point where the network is simply too overloaded to allow anything else through and just drop packets instead. A 100% packet loss may, however, not necessarily indicate that the destination host is dead – it may simply be that the destination server is blocking ICMP ping packets via its firewall.
  2. traceroute
    traceroute is another useful tool that displays the route packets take from the originating host to the destination machine. It also displays round trip times, and can be used to identify potential issues on the way to the machine as well. It is important to understand that firewalls and routers are able to filter and deny these probing packets as well, so a non responding host may not necessarily be down, just as with ping. The typical Linux syntax is
    traceroute [-n] HOST | IP address
  3. mtr
    mtr can be seen as the combination of ping and traceroute – it displays not only the way packets travel down the line from the source to the destination, but also displays min/avg/max round trip statistics and packet loss. mtr is very helpful in determining network congestions or anomalies. The typical Linux syntax is
    mtr [-n] HOST | IP address

When would you typically use these tools:

  • when a host that is normally up can suddenly no longer be reached;
  • when you notice anomalies like slow network, packet loss, etc.;
  • when you want to prove that things are working OK on your end;

 

RAID

RAID is intended to keep your dedicated servers or your virtual private server (VPS) alive and your data redundant in case of single (or more) disk failures – allowing you to replace faulty hardware in the case of disk failure.

Our own opinion is that RAID is always worth the extra cost – it usually saves you from a lot of trouble when things go wrong. There are two main options to decide between when you want a RAID setup, these are software and hardware RAID. In the first case, your main CPU/memory take over the part of ensuring your desired RAID level, in the latter, you have extra (costly) hardware to handle that part of your machine.

Software RAID has advantages such as being cheaper and not subjecting you to vendor lock-in, and – in some cases – even outperforms a hardware RAID with today’s fast CPUs. Nevertheless, hardware RAID offers features a software RAID setup cannot, for example hot swapping disks, or write back caching if you have a BBU.

This post is not about the pros and cons of software vs. hardware RAID, however. Essentially, we want to present the four most common setups for data redundancy and integrity – RAID 1, RAID 5, RAID 6, and Raid 10 – in a concise summary.

RAID 1 is all about disk mirroring. You team up two identical disks, and they form a mirror, all your data is kept twice. You can lose one disk and still go on running your server. Of course, the storage efficiency is rather low – out of 2x2TB you only get 2TB in total.

RAID 5 is another very common setup. It needs at least 3 disks, and in a nutshell, you can lose one disk before things start getting sinister for your server. That gives you moderate storage efficiency – in a 3x2TB setup you get around 4TB in total, in a 4x2TB you get something close to 6TB in total.

RAID 6 could be seen as a further development of RAID 5, in laymen’s terms. Here you need at least 4 disks, and you can afford 2 disks going down before your disk array suffers data loss. The storage efficiency is worse than with RAID 5, but typically better than with RAID 1 since both RAID 5 and RAID 6 allow for more than just 3 or 4 disks to be used.

And finally, RAID 10 is a mix of RAID 0 (stripes over several disks) in combination with RAID 1 (mirroring). This gives the same capacity as RAID 1, as well as the same redundancy level, but requires at least 4 disks to work and is generally more expensive than RAID 5 or 6 compared to their capacity.

In terms of performance, RAID 10 generally outperforms the other RAIDs in terms of write speed. This difference becomes smaller with more disks in the array, but still, in your typical 4 disk setup, RAID 10 is fastest for writes, and RAID 1 is typically faster for writes than RAID 5 or 6 as well. In terms of read performance, RAID 1 lags behind the other options, whereas RAID 5, 6, and 10 can be considered pretty similar and vary depending on the applications and I/O going on.

Overall, if you don’t need much storage and want a cheap redundant solution, choose RAID 1, it offers enough performance for everyday’s applications as well. If you need more redundancy, but do not have write intensive applications, then RAID 5 or RAID 6 are fine. DB-intensive (really intensive in terms of writes) applications should consider RAID 10, however. The increase in write performance is well worth the extra cost, but pay attention to the number of disks in the array – the more disks in a RAID 5 or RAID 6 array, the better write performance becomes.

Monitoring your server

It is very important that you monitor your server, and by that is not only meant whether it is up or not, but a much more detailed view into what is going on. Popular, open source, monitoring tools are nagios, cacti, munin, and zabbix, and it is not uncommon to use them in combination as well.

What, now, are the stats you should be monitoring in general:

  • uptime – pinging the server (provided ICMP replies are not being filtered) to check whether it is alive or not;
  • disk space – monitoring all partitions on their free space. A full root partition is particularly nasty as it can bring your entire server to a stop, but it is not difficult to see that any full partition is generally a bad thing that can cause disastruous side effects;
  • memory consumption – how much physical RAM is left, how much is being used by the system, by applications, etc. Is swap space in use, how often is it being used, etc.;
  • CPU utilisation – how loaded is the CPU, do you have enough reserves, or are you already using the CPU near its capacity limits, how many processes are being run at the same time, etc.;
  • service monitoring – are all the services on your server running as planned? Such as apache, mysqld, sshd, etc.;
  • database monitoring – what is your database doing, how many queries per second are being executed, how many simultaneous connections do you have, and so on;
  • network traffic – is your server generating a lot of unwanted traffic, do you have any unusual spikes, or how much traffic are you using, anyway?

These are just examples, but they give you an idea of what can be done – the actual number of checks and monitoring scripts is legion, and it will be up to you and your ISP to decide which ones to implement. It is always advisable to use monitoring, it not only means you will have your own alert system when things go wrong, but it will also give you excellent insights into the general development in terms of use and capacity of your server, allowing you to plan ahead much more accurately than without monitoring and statistics collection.

Our advice: talk to your ISP about monitoring options. Some do it for free, some will charge a bit, but being ahead of the competition and having the ability to act proactively is a big advantage for any business, especially in IT, where information is key.

 

Cloud computing

(this post will also appear in our virtual private server blog)

Ok, this had to come. Sooner or later – cloud computing. In a nutshell, do we offer it: yes – if the very specific advantages of the cloud warrant its use.

So, what is it – why are so many people crazy about it, and why is it so expensive, compared to a virtual private server or dedicated servers? Essentially, it is simply a different concept of providing resources on demand that can seemingly scale ad infinitum, with similar contention disadvantages like a virtual private server, however. Why? Because eventually, also cloud resources must run on a physical machine. And typically, you won’t be the only person using this machine for your cloud purposes, hence you share the resources with others, and therefore there will always be a level of contention subject to the ISP’s discretion – even if you use very sophisticated virtualisation and isolation methods. Most ISPs sell you cloud computing as a hype, when in fact it is very little else than a different version of a virtual private server.

Of course, cloud crusaders will tell you why you must have a cloud, and start explaining about the additional layer of flexibility, redundancy, security, scalability, etc. In return, one can ask questions such as: do you really want to host your confidential and secure data in a cloud without being able really pinpoint the exact location? Your data is “somewhere in the cloud”. How secure does that make you feel? How redundant is it really? How can I determine redundancy if it is not even clear to me where exactly my data is stored? What different level of redundancy is there compared to normal Raid systems and backup/recover solutions? My application’s resource usage varies by 25% only, why can I not go for a dedicated setup instead, or a virtual private server even?

We still consider one of the cloud’s main advantage its flexibility for sites and application that vary a lot in their resource use over time, with very irregular patterns as well. While they can scale a lot (depending on what you agree upon with your ISP), there will still be resource limits to be observed, so even in a cloud you should take care of estimating your expected peak resource usage.

This is a very basic and by no means comprehensive statement – there are a lot more complex issues to be observed with clouds – put the other way round: normally, even for very complex and high resource applications, you will only need a cloud if you can state its technical advantages over a virtual private server / VPS and dedicated server. Otherwise, in 99 out of 100 cases you will be better of with the latter, outside the cloud.

A very good read is http://www.eucalyptus.com/resources/info/cloud-myths-dispelled – 4 important technical aspects when it comes to cloud computing.

Traffic and bandwidth, revisited

Today, I read a thread in a feedback forum:

http://www.webhostingtalk.com/archive/index.php/t-1052232.html

There is a lot of talk and fuss about what is legitimate use for these 150TB plans, whether download sites or CDNs are allowed, what constitutes a CDN, and what does not, etc.

The entire thread is one single credo for our own traffic policy – as long as it is legal, use it for whatever you want, we reserve the traffic for you, end of story. Yes, this comes at a price, but there is no smallprint. You won’t be capped to a 10mbps port if you exceed your bandwidth allowance, you are not expected to use your traffic perfectly evenly across the entire month all the time, go have some spikes! This is what we call fair – we do not use freerider tactics to give a small group of large traffic users an advantage the majority of negligible traffic users are paying.

Put it the other way round: Assume an ISP has 100gbps of bandwidth available, and sells off single servers with 100TB usage each. That would give the following equation, roughly:

100 x 1000 x 300GB x => 30000 TB / 100 TB = 300 servers

A typical server with such deals will cost you GBP 100 per month, x 300 means the company is reaping 30,000 GBP per month in turnovers before the bandwidth is being oversold.

With 30,000 per month, they will have to cover their infrastructure costs, all staff costs, and all opportunity costs. Even if the company only had one single employee (the owner), this would never pay. So, how do they do it? Quite simple: overselling fully aware that 99% of users will never even come anywhere close to these magic numbers of 100TB per month or more. And for that final per cent, they will (and do, as we see) apply their T&C smallprint, and make a couple of exceptions for those who shout too loud. In the end, there are two winners: the ISP using such practices, and the shouters. The rest, the majority, pays for these.

Often you will also find terms such as <insert large TB number here> OR 100mbps unmetered. 100mbps unmetered will give you 30TB of traffic per month. Why, then, can you choose between options that are SO much unlike each other? 100, 150 TB per month on a gbps will cost the same as 100mbps unmetered? This simply doesn’t add up.

Also, such contracts typically come with a 6 month nicely reduced fee, but then you will be charged the full monthly price – for something you might not even need. If you know you are never going to use 150TB, why pay for them to cover the losses the ISP makes from the small number of customers who actually do use them? Usually, after the initial contract period, these machines could be obtained considerably cheaper if you only paid for the traffic you actually need instead of having to drag that cost around like a ball and chain around your ankle.

Bottom line: again, be careful, ask questions. These T&C are all legit – not nice maybe, but legit – and you need to wade through the smallprint in order to understand what you can expect from these ISPs, and what you cannot.