The admin's notes: servers

Showing posts with label servers. Show all posts

Monday, 16 April 2012

My Projects: Virtualization - part 2: P2V

Here is the next post in "My Projects" series, and the second one about Virtualization.

After migration from XEN was finished, we started to migrate all existing physical machines to virtual. Generally, this was one of the most light project for the last couple of years. First of all - VMware Converter 4.0 has been released, so all CentOS-es were migrated smoothly, without an issue. Second - while migrating VMs from XEN, I've collected so much experience, that some tasks which seemed very complicated before, now was a kind of obvious for me (e.g. like playing with LVM volumes).

What we had:

A bunch of physical servers (IBM) with a range of OSes installed (mainly RH-based).

What was done:

Most of servers were migrated using VMware Converter.
Some of those old ones were migrated manually as in previous project.
Some servers were reinstalled as VMs and just services were migrated.

Result:

The result is pretty obvious for this kind of migration. However I'll provide a couple of benefits:

Increased reliability. Very small system outage after H/W failure (That even happened twice. HA automatically migrated all VMs to the another ESXi host.)
Uninterrupted maintenance. I just migrate all VMs to another host in cluster during the upgrade.
Energy savings. I cannot provide exact quotas, but I was really surprised when review last report. We save really a lot.
Convenience. Add/remove some disk space/RAM/vCPUs is just several mouse clicks now.

Monday, 20 February 2012

How to migrate data to the smaller disks using LVM.

Well, there could be any other subject for this post taken, because I'm going to write about the Physical Extents in LVM, and this could be used for different purposes.

My particular issue was related to the fact that if you want to use snapshots or cloning of a turned-on VM, you must be aware of some overhead for VMDK files. You can find more details in VMware KB 1012384: "Creating a snapshot for a virtual machine fails with the error: File is larger than maximum file size supported".

The problem was that when I try to clone a VM that have a virtual drive with size of 256GB (the maximum size for the specified datastore), I faced the following errors:

Create virtual machine snapshot VIRTUALMACHINE File <unspecified filename> is larger than the maximum size supported by datastore '<unspecified datastore>'
File is larger than the maximum size supported by datastore

That was a server in production, so I couldn't just turn it off for 30 minutes. At this moment I realized, that I have number of servers with the problem like that and I can face the same problems sooner or later.

Generally you cannot just decrease disk size from the configuration of VM. (Even if there was this possibility, this is fairy not to use. Almost any file system will go crazy with the sudden change like that. In most cases this will lead to the loss of data).

Fortunately, I use LVM for all my servers, mainly for the purpose of extending volumes if needed. This time I had a task to decrease the size of Physical Volumes and make it without a second of downtime.

So, steps was as following:

Check your backups
Add the virtual hard disks holding in mind the .vmdk sizes as it's specified in VMware KB 1012384.
Depending on your space assigning policy you can follow the following rules:
a) add 2 disks of 250GB (in my case) with thin provisioning;
b) add 1 of 250GB and 1 disk of 10GB. (I use round numbers to simplify the setup)
Re-scan scsi bus inside the VM:

# echo "- - -" > /sys/class/scsi_host/host0/scan

Create lvm partitions for all the added devices:

# fdisk /dev/sdX

Command (m for help): n

Command action

e extended

p primary partition (1-4)

Partition number (1-4): 1

First cylinder (1-32635, default 1):

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-32635, default 32635):

Using default value 32635

Command (m for help): t

Selected partition 1

Hex code (type L to list codes): 8e

Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

Extend the Volume Group by adding those new disk

# lvm vgextend VolGroupXX /dev/sdX1 /dev/sdY1

Check the number of Physical Extents to migrate

# lvm pvdisplay

--- Physical volume ---
PV Name /dev/sdf1
VG Name VolGroupXX
PV Size 250.00 GiB / not usable 4.69 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 63998
Free PE 63998
Allocated PE 0
PV UUID Zp7uJR-YsIQ-AjRP-hdGL-OXSl-XbJG-N1GFn2
--- Physical volume ---
PV Name /dev/sdg1
VG Name VolGroupXX
PV Size 250.00 GiB / not usable 4.69 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 63998
Free PE 63998
Allocated PE 0
PV UUID eKdiqW-eMjI-ck4a-grM3-ogX3-6BOP-Q1ldlC
--- Physical volume ---
PV Name /dev/sdb1
VG Name VolGroupXX
PV Size 255.99 GiB / not usable 2.72 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 65534
Free PE 0
Allocated PE 65534
PV UUID PajbnY-65II-Aeyt-WamI-Xzr8-M3b2-kWfy6U

In this example /dev/sdf and /dev/sdg are the new added hard drivse with the size of 250GB, and /dev/sdb is an old one with the size of 256GB. The important part for as is "Total PE". If we call the pvmove command without specifying the amount of Physical Extents to move, we will receive an error.

Migrate the Extents to the new hard drives

# lvm pvmove /dev/sdb1:1-63998 /dev/sdf1

# lvm pvmove /dev/sdb1 /dev/sdg1

As you noticed I didn't specify the number of extents to move to the second drive. In fact this command will move only the used ones.

Now, you might extend the Logical Volume to use all the space added

Extend the Logical Volume

# lvm lvextend +100%FREE VolGroupXX/LogVolXX

extend the partition with the proper file system tools

# <resize2fs|resize_reiserfs|...> /dev/mapper/VolGroupXX-LogVolXX

And, finally, remove the old drives

# lvm vgreduse VolGroupXX /dev/sdb1

# lvm pvremove /dev/sdb1

Now you can remove the old drive from the VM configuration... zero downtime.

Well, many of the experienced Linux admins might not find anything new in this article. However, I didn't know how it works till I faced a need of a change like that. For instance, I have extended so many volumes so much times, that this operation (extending) takes about a minute for me (if not count the time spent by the FS extending tool).

Saturday, 3 December 2011

Subversion "fun"

Well, although git is seems to be better from subversion even for our needs, we still use and will be using subversion in order to provide a long-term support for our customers, those who might require some changes in some old projects to be made... Anyway, the story is not about that...

"It was a rainy Thursday". As we could rely on our monitoring system, nobody was expecting some sudden troubles and I was doing some usual stuff... Suddenly... Suddenly we receive several calls at one time. All users reported some problem with accessing the SVN server. In a few second monitoring system reported: 100% use of the / (root) partition. WTH?! There should be two thresholds warnings before 90% and 95%! Well, login in, check.. Well... subversion repos are on another partition, so it's some system stuff... Logs are normal size - logrotate is configured for that. Aha!.. Several gigs in /tmp.... in a single file... Well, # lsof|grep filenname and we have a guilty. It was the svnsync process (we put it on postcommit hook to keep remote mirror synchronized), fortunately with a path to the repository to sync.

Now, let's see what was the number of the last synchronized revision on remote server (let's call it N), and then take a look on the N+1 revision. A single file was committed, named other_repo_name.tar.gz. With size of few gigs...

First of all block the synchronization to remote mirrors and block all commits (exit 1 in the pre-commit hook. Don't forget to put some message for users, sth like `echo "Commits are blocked, please contact support for more details" 1>&2`). Seems like we will need to cut this commit from the DB. Now, when further risks are eliminated, let's call to talk with the user.

- Hi Mike (let's call him Mike), this is John Smith (let's call me John :)) from the IT dept. We noticed that you have committed a few gigs file to the XXXX repo, didn't you?
- Hi, yes I did.
- But do you know that we have other services to store archives, like _service1_ and _service2_? Instead, our SVN servers are used as a version control system.
- Yes I know, that's why I put there a copy of the repository YYYY (Note: YYYY is an another repo on the same server), but since it was too big in sources I have compressed it to not take too much space on the server...
- ... *confused... (describing users the basics of work with services like SVN is far from my responsibility). Well, Mike, I understand. Thank you for description. Just to inform you, XXXX repo will be not available for commits for few hours. Bye...

Next, I called to Mike's department boss. Not to claim, but just to describe the situation and to ask him to inform his team that repo will be unavailable for commits for few hours (dump/recover), and that one commit that has been made after the huge one will have to be re-committed gain.

The moral of the story: On the important servers keep /tmp on a separate partition! I do now :)

Friday, 2 December 2011

CentOS 6 rpm sign issue (V4 signature is used by default)

Well, the problem I'm going to write about is known and the appropriate bug-reports exists. But I didn't found the proper threads on at the first pages of search results while googling by the error messages. So, I hope, this post will help somebody to find out what is happening when "rpm --checksig" returns "Header V3 DSA signature: OK, key ID xxxxxxxx" when "signature: OK" is expected.

I faced the problem when was deploying a Spacewalk server. I choose CentOS 6.0 to be the OS on that server and I was simply following install instructions (many thanks to the Spacewalk community, it was really easy to install the server) until I tried to create a custom Software Channel for some individual packages.

In fact, instructions about how to sign RPMs is the same on most of the howto pages... let's remind them (simple version):

gpg --gen-key

gpg --export -a XXXXXXX > RPM-GPG-KEY-Mycompany

put to ~/.rpmmacros:

%_signature gpg

%_gpg_name XXXXXXXX

and then just use:

rpm --resign some-package.rpm

On the client system it's enough to get the public key and import it by rpm command:

rpm --import /path/to/RPM-GPG-KEY-Mycompany

This was a very simplified recipe without expected outputs, but if everything is going well there shouldn't be any unexpected questions.

After all is done, rpm --checksign some-package.rpm should return something like that:
some-package.rpm: rsa sha1 (md5) pgp md5 OK

And it was like that on the CentOS 6 server... But when I tried to install the package via yum on CentOS 5.7, I've received the following error:

error: rpmts_HdrFromFdno: Header V4 RSA/SHA1 signature: BAD, key ID XXXXXXXX

And rpm -v --checksig some-package.rpm was returning:

    Header V4 RSA/SHA1 signature: BAD, key ID xxxxxxxx
    Header SHA1 digest: OK (835b77fb70d2a6075c428b9eb57bbfcdc2a0d1ce)
    V4 RSA/SHA1 signature: BAD, key ID xxxxxxxx
    MD5 digest: OK (ede2464b724b0bafef0db4a53c02c1d0)
 

More weird thing, is that when I sign it with the same key on CentOS 5.7 the rpm was OK.
It was my first time signing the RPMs, so I have spent some time while found out the difference from the package with proper signature:

$ rpm -v --checksig rpmforge-release-0.5.1-1.el5.rf.i386.rpm
    Header V3 DSA signature: OK, key ID 6b8d79e6
    Header SHA1 digest: OK (56871fe945ed2b2c868430b0002bb47dc129e981)
    MD5 digest: OK (69c4cbf8229ba4b319d58f99ddebddf3)
    V3 DSA signature: OK, key ID 6b8d79e6

So, with that insight I found an old bug with a description how to force GPG signature version to v3. To do that your ~/.rpmmacros should look like this:

%_signature gpg
%_gpg_name XXXXXXXX
%__gpg_sign_cmd %{__gpg} \
    gpg --force-v3-sigs --digest-algo=sha1 --batch --no-verbose --no-armor \
    --passphrase-fd 3 --no-secmem-warning -u "%{_gpg_name}" \
    -sbo %{__signature_filename} %{__plaintext_filename}

Seems like the "rpm --resign" command uses GPG signature V4 by default, despite to the following text in rpm manual page:

For compatibility with older versions of GPG, PGP, and rpm, only V3 OpenPGP signature packets should be configured. Either DSA or RSA verification algorithms can be used, but DSA is preferred.

Hope this helps...