There is basically no documentation on this out there in the world, so I've decided to put it together for you as well as I can. I recently had the pleasure of converting a myriad of RedHat boxes from physical to virtual. (RH 7.3, 9, FC4 and ES4). This was by no means an easy task.
The largest issue was that I was going from IDE disks to SCSI disks with either LILO or GRUB as the boot loaders. LVM really did not help WHATSOEVER on this as well. Basically, Platespin and Converter were of ZERO use to me. Ghost was my friend...
My wish is that somewhere along the way that Converter and Platespin do something severely address this issue. I know, it's not any of their true business generated revenues, but come on, help out those that truly help you out.
So the biggest issue with these linux machines is that if you do any type of custom kernel, you probably don't include drivers for hardware that is not in use on your machine. Makes sense, no argument there. But prior to your P2V, make sure you include support for the scsi driver needed for vmware. Either a BusLogic (yikes!) or the LSI Logic scsi controller. At the very least compile the driver. You can go anywhere to figure out how to do this if you are unsure.
After this, use what ever disk image solution you know of to copy that machine up to your VMWare machine. You'll need to create the VM, and boot off whatever disk imaging solution you use. I used ghost and used the GhostCast Server piece to a succesful end.
Download the rescue disk for your distro or use a copy you already have and boot into rescue mode on that distro. You'll need to create a new memory image (mkinitrd) with the proper driver included in your /etc/modules.conf.
That's basically it, but no one went even into that much detail that I could find on the internet. It worked for me quite easily.
Friday, May 30, 2008
Friday, April 11, 2008
Windows Domain Rename - WOW
A customer wanted to do a domain migration. Easy enough with the ADMT from Microsoft. Create a new domain, move some users around, make sure they have seemless access to resources on member servers...blah blah blah...
That takes a while and can be very cumbersome. But, this is my PREFERRED method of doing it. The domain rename functionality freaks me out thoroughly.
And then this customer came along. They have no workstations and all users connect via Citrix. The email system is Lotus Domino, and here is the best part of that, they only use POP3 for it. It's not like they are small...they are over 500 users. So after poking around, they looked good for a domain rename and it would be the least impact to the users. They only had around 30 servers, so it's something that could be accomplished quickly.
So last night after prepping thoroughly, went after it. Following the docs from Microsoft was the best way to do it, and there is plenty of documentation on their site for it.
Oh, and one hold your breath moment with the DC's taking over 5 minutes to come up all the way. DNS took FOREVER to start and therefore when trying to logon, the new domian name did not exist. Didn't see that in the handy dandy documentation, but take heed, it does take a few minutes after the rendom /execute command and the reboot of the DC's for it to come to life all the way.
They did have quite a few SQL Servers which were my biggest problem. It has to do with Windows Auth being used on the SQL Server.
I had to make sure that I deleted the user and recreated that same user with the new domain name. Some jobs were orphaned and were therefore moved to be owned by the sa account. I don't forsee that to be a large problem, but you DBA's would know more on the actual problems and caveats of that.
Citrix was the most time consuming part of the ordeal. There is a nice link that applies to the version they were running (Metaframe XP). (http://support.citrix.com/article/CTX102371). Nice and very helpful.
So after it was all said and done, the UPN's were not automagically updated for the users. Quick VBS script to fix that? I think not. Thanks to Joe at www.joeware.net and his handy ADfind and ADmod utilities, a quick single line took care of it all:
adfind -b dc=new,dc=domainame -f "objectcategory=user" userPrincipalName sAMAccountname -adcsv admod userPrincipalName::{{sAMAccountName}}@new.domainname -unsafe
It's in his documentation on how to do this, but not explicit enough. Joe's a good man in protecting the innocent on this one with the damage that can be brought forth with admod itself.
So there you have it. A domain rename with over 500 users, 30 servers and a TON of Citrix apps in exactly 4 hours.
That takes a while and can be very cumbersome. But, this is my PREFERRED method of doing it. The domain rename functionality freaks me out thoroughly.
And then this customer came along. They have no workstations and all users connect via Citrix. The email system is Lotus Domino, and here is the best part of that, they only use POP3 for it. It's not like they are small...they are over 500 users. So after poking around, they looked good for a domain rename and it would be the least impact to the users. They only had around 30 servers, so it's something that could be accomplished quickly.
So last night after prepping thoroughly, went after it. Following the docs from Microsoft was the best way to do it, and there is plenty of documentation on their site for it.
Oh, and one hold your breath moment with the DC's taking over 5 minutes to come up all the way. DNS took FOREVER to start and therefore when trying to logon, the new domian name did not exist. Didn't see that in the handy dandy documentation, but take heed, it does take a few minutes after the rendom /execute command and the reboot of the DC's for it to come to life all the way.
They did have quite a few SQL Servers which were my biggest problem. It has to do with Windows Auth being used on the SQL Server.
I had to make sure that I deleted the user and recreated that same user with the new domain name. Some jobs were orphaned and were therefore moved to be owned by the sa account. I don't forsee that to be a large problem, but you DBA's would know more on the actual problems and caveats of that.
Citrix was the most time consuming part of the ordeal. There is a nice link that applies to the version they were running (Metaframe XP). (http://support.citrix.com/article/CTX102371). Nice and very helpful.
So after it was all said and done, the UPN's were not automagically updated for the users. Quick VBS script to fix that? I think not. Thanks to Joe at www.joeware.net and his handy ADfind and ADmod utilities, a quick single line took care of it all:
adfind -b dc=new,dc=domainame -f "objectcategory=user" userPrincipalName sAMAccountname -adcsv admod userPrincipalName::{{sAMAccountName}}@new.domainname -unsafe
It's in his documentation on how to do this, but not explicit enough. Joe's a good man in protecting the innocent on this one with the damage that can be brought forth with admod itself.
So there you have it. A domain rename with over 500 users, 30 servers and a TON of Citrix apps in exactly 4 hours.
Tuesday, April 8, 2008
Cisco Fixup/Inspect Rules for SMTP/ESMTP
What in the world is up with this? Why does Cisco continue to push this with their PIX/ASA?
I agree that adding some additional security to the SMTP protocol is necessary in order to lower the amount of spam and attacks (Directory harvesting in particular), but I think we need to leave that to the MTA's themselves.
The havoc this inspection rule causes for mail flow is INSANE. Slow response times across the boards, banner masking, confusion on the part of admins as to what the hell is going on....etc. This recently happened with myself and a professional counterpart where we were banging our heads against the wall for several hours late into the night...ugh.
Ok, just venting.
I agree that adding some additional security to the SMTP protocol is necessary in order to lower the amount of spam and attacks (Directory harvesting in particular), but I think we need to leave that to the MTA's themselves.
The havoc this inspection rule causes for mail flow is INSANE. Slow response times across the boards, banner masking, confusion on the part of admins as to what the hell is going on....etc. This recently happened with myself and a professional counterpart where we were banging our heads against the wall for several hours late into the night...ugh.
Ok, just venting.
Friday, March 28, 2008
Wicked Problem with EMC NS, iSCSI and VMWare
Wanted to let some engineers know about a “bug” that exists in the EMC Dart Code for the Cellera (NS) series of NAS.
There is a bug in that if an iSCSI read request that is larger than 1 MB in size is made to a Cellera, that the data mover may crash on the request. What happens is that the iSCSI service cannot process the request and results in a kernel panic on the data mover. The data mover then fails over, but due to the nature of the crash, meaning that the greater than 1 MB read request still must be processed, it could lead to the second data mover crashing, or in some cases, complete loss of data as the ESX server has writes needing to be processed.
There is also a HUGE possibility with ESX 3.5 where an HA event may be triggered and that the VM’s on that 3.5 node will be powered down. If you are in a mixture ESX 3.0.x and 3.5 nodes in an HA cluster, a HUGE amount of confusion can be brought on by the cluster with the HA event not being seen by the 3.0 hosts. This in fact lead to, in the case of one customer, 17 VM’s being corrupted and over 6 having to have DR performed on them to recover.
According to VMWare, they have made 3.5 much more sensitive to storage problems (???) and that it will force an HA event. Now, the customer in this case was using iSCSI QLogic HBA’s, so ESX is not aware of the underlying network calls being processed and in their defense treat the code the same as a Fiber Channel HBA so the timing is as if it’s an actual SAN and therefore doesn’t wait for the 45 second failover period of the data movers. This variable cannot be changed according to an escalation engineer within VMWare.
The cure is to patch the Dart Code on the NS so that it doesn’t panic on greater than 1 MB read requests.
I’d like to point out how easily a read over greater than 1 MB can be processed. VMFS formats a LUN in 1 MB blocks or larger. Windows NTFS formats it’s file system with 4k blocks. We all know that fragmentation occurs when a file is larger than 4k is written to the file system that it must take up at least 2 blocks. So, a 6k file actually takes up 8k of space on the file system. When the OS writes and it the next contiguous block is occupied, Windows (Linux, whatever) writes to the next available free block. So when FileA is 6k, it gets written to block 23 and the next block available is block 238,654. When the OS needs to read that file, it has to read both blocks. This is fragmentation. It takes a while to spin the disk and hence we get slower performance.
Well, since the 2 blocks are located on the VMFS VMDK file, ESX has to read 2 – 1 MB blocks to service the request for its Guest VM. Boom, kernel panic!
This problem was not found for this particular customer as the VM’s did not have fragmentation at the beginning of the implementation. However, now that the systems have run for a period of time, fragmentation builds up within both their file servers and database servers and has led to 2 major outages in exactly 10 days. They have thus decided, even though EMC claims the problem will not continue, that they are breaking the CX Clariion backend out and trashing the Cellerra NAS head. I really can’t say I blame them. EMC did not replace the first data mover that failed or was even able to determine the problem in the 10 days between the failures. So their failure yesterday did not have another data mover to move to. They were down for 10 hours. This customer is a bank and is very dependent on the services their infrastructure provides.
To make things worse, they did have a hot site that was configured for host based replication of their existing VM”s to be mirrored at. However, about 2 months ago their NS in that facility suffered from this very same problem and has not been completely repaired. They are to break the CX out there as well.
So at any rate, thought you guys should know. I have to admit I am personally a bit hesitant about pushing ESX with iSCSI out on these devices. It’s not solid, and EMC is HIGHLY unresponsive on resolving the issues when they do occur. When the sales rep is calling you to say (which I do have to say, that was awesome for him to do) “hey, I noticed your NAS was down” 2 hours after it occurred and tech support has not called the customer, there is a huge lack of communication or ability to fulfill the service requests that EMC receives. I know that this part is a rant, and everyone has some problems, but it’s not like we’re buying a 10k kia rio…Ok, I’ll shut up now.
There is a bug in that if an iSCSI read request that is larger than 1 MB in size is made to a Cellera, that the data mover may crash on the request. What happens is that the iSCSI service cannot process the request and results in a kernel panic on the data mover. The data mover then fails over, but due to the nature of the crash, meaning that the greater than 1 MB read request still must be processed, it could lead to the second data mover crashing, or in some cases, complete loss of data as the ESX server has writes needing to be processed.
There is also a HUGE possibility with ESX 3.5 where an HA event may be triggered and that the VM’s on that 3.5 node will be powered down. If you are in a mixture ESX 3.0.x and 3.5 nodes in an HA cluster, a HUGE amount of confusion can be brought on by the cluster with the HA event not being seen by the 3.0 hosts. This in fact lead to, in the case of one customer, 17 VM’s being corrupted and over 6 having to have DR performed on them to recover.
According to VMWare, they have made 3.5 much more sensitive to storage problems (???) and that it will force an HA event. Now, the customer in this case was using iSCSI QLogic HBA’s, so ESX is not aware of the underlying network calls being processed and in their defense treat the code the same as a Fiber Channel HBA so the timing is as if it’s an actual SAN and therefore doesn’t wait for the 45 second failover period of the data movers. This variable cannot be changed according to an escalation engineer within VMWare.
The cure is to patch the Dart Code on the NS so that it doesn’t panic on greater than 1 MB read requests.
I’d like to point out how easily a read over greater than 1 MB can be processed. VMFS formats a LUN in 1 MB blocks or larger. Windows NTFS formats it’s file system with 4k blocks. We all know that fragmentation occurs when a file is larger than 4k is written to the file system that it must take up at least 2 blocks. So, a 6k file actually takes up 8k of space on the file system. When the OS writes and it the next contiguous block is occupied, Windows (Linux, whatever) writes to the next available free block. So when FileA is 6k, it gets written to block 23 and the next block available is block 238,654. When the OS needs to read that file, it has to read both blocks. This is fragmentation. It takes a while to spin the disk and hence we get slower performance.
Well, since the 2 blocks are located on the VMFS VMDK file, ESX has to read 2 – 1 MB blocks to service the request for its Guest VM. Boom, kernel panic!
This problem was not found for this particular customer as the VM’s did not have fragmentation at the beginning of the implementation. However, now that the systems have run for a period of time, fragmentation builds up within both their file servers and database servers and has led to 2 major outages in exactly 10 days. They have thus decided, even though EMC claims the problem will not continue, that they are breaking the CX Clariion backend out and trashing the Cellerra NAS head. I really can’t say I blame them. EMC did not replace the first data mover that failed or was even able to determine the problem in the 10 days between the failures. So their failure yesterday did not have another data mover to move to. They were down for 10 hours. This customer is a bank and is very dependent on the services their infrastructure provides.
To make things worse, they did have a hot site that was configured for host based replication of their existing VM”s to be mirrored at. However, about 2 months ago their NS in that facility suffered from this very same problem and has not been completely repaired. They are to break the CX out there as well.
So at any rate, thought you guys should know. I have to admit I am personally a bit hesitant about pushing ESX with iSCSI out on these devices. It’s not solid, and EMC is HIGHLY unresponsive on resolving the issues when they do occur. When the sales rep is calling you to say (which I do have to say, that was awesome for him to do) “hey, I noticed your NAS was down” 2 hours after it occurred and tech support has not called the customer, there is a huge lack of communication or ability to fulfill the service requests that EMC receives. I know that this part is a rant, and everyone has some problems, but it’s not like we’re buying a 10k kia rio…Ok, I’ll shut up now.
Monday, March 10, 2008
Exchange 2007 Restore of Hub Transport
Here is another un-documented bug on doing a restore of an Exchange 2007 server:
I was doing a /M:recoverserver on an Exchange 2007 server that had failed. The server had Service Pack 1 installed for Exchange.
When doing the recover (after I re-installed the OS of course) the recover program complained about some registry entries not being there when it was recovered the Hub Transport role. Truly interesting as I was recovering this service!!
It complained that these keys did not exist:
HKLM\Software\Microsoft\Exchange\Transport
HKLM\Software\Microsoft\Exchange\Pickup
Simple enough though to move forward...
Just create the keys and then redo the /M:recoverserver and all will be well!
I was doing a /M:recoverserver on an Exchange 2007 server that had failed. The server had Service Pack 1 installed for Exchange.
When doing the recover (after I re-installed the OS of course) the recover program complained about some registry entries not being there when it was recovered the Hub Transport role. Truly interesting as I was recovering this service!!
It complained that these keys did not exist:
HKLM\Software\Microsoft\Exchange\Transport
HKLM\Software\Microsoft\Exchange\Pickup
Simple enough though to move forward...
Just create the keys and then redo the /M:recoverserver and all will be well!
Exchange 2007 and Exclaimer
So I ran across this horrible problem this past week in doing an Exchange 2003 to Exchange 2007 migration. My customer used Exclaimer heavily for doing auto replies and for creating automatic signatures for the users within the organization.
After installing this app and then trying to configure the Captaris Rightfax Connector for Exchange 2007, I ran into an instance where when sending to an address formatted as:
[Fax:user@2125555555]
Exchange generated an instance undeliverable message that it could not be delivered.
After working with the great folks at Captaris (Read as: some jerk on the phone that constantly spooke over you and did not really want to help anyways) and Microsoft, we discovered that Exclaimer re-writes the IMCEA address that's generated when sending to the FAX: or RFAX: address space, to the opposite case it needs to be. In other words, it takes IMCEA and makes it imcea.
Ready for the kicker? Exchange 2007 is case sensitive. The only solution was to uninstall Exclaimer and move on for now. The customer accepted the problem.
Just wanted to save you peeps some time!!
After installing this app and then trying to configure the Captaris Rightfax Connector for Exchange 2007, I ran into an instance where when sending to an address formatted as:
[Fax:user@2125555555]
Exchange generated an instance undeliverable message that it could not be delivered.
After working with the great folks at Captaris (Read as: some jerk on the phone that constantly spooke over you and did not really want to help anyways) and Microsoft, we discovered that Exclaimer re-writes the IMCEA address that's generated when sending to the FAX: or RFAX: address space, to the opposite case it needs to be. In other words, it takes IMCEA and makes it imcea.
Ready for the kicker? Exchange 2007 is case sensitive. The only solution was to uninstall Exclaimer and move on for now. The customer accepted the problem.
Just wanted to save you peeps some time!!
Subscribe to:
Posts (Atom)