Wednesday 24 August 2011

Zenoss - introduction

This is a first post about Zenoss - the Open Source IT Management System.
I need to introduce it first, because I'm sure that there will be far more than one article about that. I worked with Zenoss very close for almost 2 years and I have some interesting things to share, which I didn't put to the Zenoss community forums.

First of all - why Zenoss?
Some time ago I was not satisfied of the existing monitoring system. It was Nagios, and generally – it was very good for the main thing we want from it – the monitoring. But I wanted something more. I wanted an easy way to collect and view the performance data (we used cacti before), I wanted to have auto-discovery, to have the agent-less solution, to make it easy to configure for others (actually, I failed with that :)), to have the alerts escalation and something else. And I wanted this all ready just from the box and, of course, for free :).

I have tested several solutions before I stopped with Zenoss. I will not write a list of competitors and all their 'pluses' and 'minuses'. I will describe with what I had success with Zenoss and where I failed.

Advantages:
Core. Generally, with quite some effort, but it's possible to get any kind of data, analyze it in any possible way and proceed with any kind of reaction. And out from that I have all the other "pluses".

Performance data. I can take a value from any kind of the datapoint and put it to the RRD. Most popular for me are SNMP and COMMAND. Also, there are some types of datasources from ZenPacks (e.g. MYSQL or VMWare), or you can write your own. COMMAND datasourse uses any executable with any parameters and parse it's output for the variables. It is very flexible as for me, so I can provide monitoring for any kind of the device that has any kind of data available from network. How many time I've wrote a word "any"? :)

Reports. ... any ... any ... all ... any. I feel no need to describe it in details, you can find it in the documentation. Really useful thing for me is the dynamic Graph reports.

Support. Of course, there is no guarantied support for the Core (free) version. But there is quite useful documentation, great videos and quite responsive community on forums.

... At that moment I have realized, that it will take too long for me to describe all "pluses", so I just list some of them:
  • Very advanced and flexible Event Management
  • Flexible Monitoring Templates management (and templates itself is a big plus also)
  • Customizable Dashboard
  • Big set of ZenPacks
  • Distributed Collectors
  • SNMP traps, syslog, Windows eventlog receivers
  • Processes, IP-services and Windows-services monitoring "from the box"
  • Nice Interface (from version 3.*)
  • Agent-less
  • Easy to deploy
  • etc. (many other standard things like backups management, event DB maintenance etc.)

And now some disadvantages:
  • Interface (version 2.*). Although it was significantly changed in version 3.0, this needs to be described, because it was one of my biggest problems with this project. When I was investigating Zenoss, I get used to the interface and it seemed OK to me. And it was described as one of the advantages, because I was able to see what I need to see from Zenoss, forgetting that the information that Zenoss provides is different from what people used to see from Nagios. In fact Nagios provides information about the state of the monitored devices&components, but in Zenoss, the source for the state is an event. I have understood it very quickly, but I didn't describe it properly to a team and it was confusing to them, because instead of the state "OK" they should look for some events in the completely different Interface.
  • Performance. Well, Zenoss is quite greedy for the resources. The "bottle neck" for me is a MySQL DB performance on SELECT operations. It produces extra high I/O when getting some data from the Events History.
  • Complexity. Zenoss became not just a Monitoring System, as I planned from the very beginning, but it is a real IT Management System now with all those features and procedures developed and configured. And it requires a bit more than a superficial knowledge of the interface. I never wrote a code with an OOP language before. But to make Zenoss do what I want I had no other chose but to learn some Python.
There were few other disadvantages, but they became to a "must/good to have" list, and then most of them were solved.
Actually, next posts for Zenoss subject will describe how I solve some of those tasks.

Stay tuned! ;)

Tuesday 23 August 2011

IBM Bootable Media Creator (BoMC) - "must have" in IBM-based environments

- Hi, John! I have a new task for you. We have received a "brand new server". But it was retained in the dealer's warehouse for 3 years, so please, could you go to our very-cold-server-room and make all firmwares upgrade?
- Of course Boss! It will take just 4 hours, while I try to connect something to boot from (since there is no CDROM or FDD inside), and then reboot server for seven times after each peace of FW upgrade. And, by the way, I need to check 5-dimensions compatibility matrix to download the proper upgrade... And that's it! Hope we will not need this very-expensive-emergency-on-site support to repair the server this time.

Sounds familiar? I remember those times, when you need to check the floppy-diskettes consistency before upgrade, because a single read-error can lead to an unrecoverable HW failure. Of course, most of the equipment manufacturers provides us with FW upgrades for any type of upgrades - bootable CD images, images for USB-drives, for most popular OSes, etc.. And, probably, most of IBM server maintainers are aware of the very useful tool - Bootable Media Creator (BoMC).

Well, there is no need for detailed description here, since it can be found on the official page. In my short free interpretation - You can easily deploy FW upgrades for IBM Server x and BladeCenter with images created by the BOMC tool.


Last year, when I have some unpredictable IBM server behavior, in about 90% of cases it was enough to create a fresh CD with the FW updates set, then boot once, and problem just disappears.
Hope, this help you.

Saturday 13 August 2011

How to preserve eth0 interface name for cloned or migrated CentOS or Ubuntu?

I'm sure that many people faced this nasty thing: after you clone or reconfigure a Virtual Machine, OS reconfigures the interface because it has another MAC. I will not say that this is bad behavior, but what can you do, if you need to preserve the eth0 name and configuration?

In CentOS it's very simple - all you need is just to remove a string like "HWADDR=00:11:22:aa:bb:cc" in file /etc/sysconfig/network-scripts/ifcfg-eth0. That's it! Now even after you clone this machine or in any other way change the ethernet card - config will not be changed. Of course, in some cases there is a risk to face an IP conflict, but you're good admin, right :)?

Well, I don't use Ubuntu for server often, so I didn't look for a persistent solution. But what I do time to time, is just removing all strings from file /etc/udev/rules.d/70-persistent-net.rules right before cloning. It makes Ubuntu to forget, that some ethX name is already assigned for some MAC address.

Preconfigured ESXi on USB flash drive

It was quite long after my last post. Too much work and almost no free time. However, I've learned few interesting things, and I'm going to share them. And the first one - about the ESXi deployment.

We have up to 20 ESX nodes in our offices worldwide. And sometimes it was a hard way to deploy them remotely, using local supporters help. Moreover, even in our datacenter in place where we're sitting, it's quite uncomfortable to provide some maintenance if you forgot to take warm closes.

I knew there is a possibility to install ESXi on a USB flash drive. But I was interested in pre-deployment. The way is happened to be very easy.
To deploy a clean installation, all you need, is just to:
a) download the ESXi iso, for example here.
b) extract a file, named imagedd.bz2
c) unbzip2 it :)
d) deploy it on your flash drive. I used dd under the Linux like following:

# dd if=imagedd of=/dev/sdb bs=1M
(please, check if your flash drive is under /dev/sd?. if you just use the string above, you can destroy the existing data! Use it only if you understand what it means and on your own risk!)

Done! Now you have a fresh bootable flash drive with ESXi unconfigured yet.
Next thing I did, is just connect it to some laptop (I had Lenovo G550) and boot up.You can use any PC or laptop with H/W Virtualization support. The only issue I had with this budget laptop using ESXi, is that the keyboard doesn't work. So I just have plugged an external one. Now, you can configure the network and make some configuration remotely, with vSphere client, that is not possible to do from the console.

The main problem we had, is that after you select several interfaces to be used for NIC teaming (or portchannel on the Cisco switch side), ESXi that by default it uses "Route based on the originating virtual port ID". And that doesn't work well with the Cisco default configuration, which requires to be set the "Route based on IP hash" (or src-dst-ip hash).
Important note, for those who faced that NIC teaming is not working correctly! In difference to ESX 4.0, which we had used before, in ESXi 4.1. we found, that it makes NIC teaming configuration of management interface not as a default (inherited from the virtual switch config), but configures it independently! This strange behavior cost us 2 hours before we found this. Because we checked many times the NIC Teaming configuration of vSwitch and had no idea that Management Network has it different.

Next I configured all DNS settings, all VM and VMkernel networks, NTP, etc... Then I closed vSphere client and made the final network configuration to be used in the datacenter. Shutdown.... Ready!

All I need after that, is just go to the server room (or send a flash drive to the remote office), plug flash drive to the server (In case of IBM x3650 M2 and M3 there is a special USB port an a RAID controller for that purpose), boot from USB (in case of the mentioned IBM servers - boot from HardDisk1), and select the network interfaces to be for the Management Network after ESXi boot.

That is the way I do the ESXi deployment now. If you have any suggestions - you're welcome.