Sunday, December 11, 2011

Exchange 2010 CAS Array DNS Round Robin

There is very little information on this out there in regards to if DNS Round Robin being supported for a CAS Array when NLBS isn't available (in such cases where you're using 2 total servers running DAG groups at the same time) and a hardware load balancer isn't available.

The short of it is that this works spectacularly. What few people understand is how DNS Round Robin truly works. The DNS Server itself dishes out both IP addresses associated with the dual record to the client at all times. The difference is in that the DNS Server round robin's which IP address is first when it responds to the client. The client has both addresses in it's local DNS cache at this time.

When this happens, the client tries to connect to the first address and if it does not respond, it fails over to using the second IP address in its cache. Super. This generally occurs within 30 seconds. Additionaly, when the client is connected to a CAS server and that connection drops for any reason, the client then tries to connect to the alternate IP address at that time.

The outage time experienced for Outlook in this scenario is somewhere around 15 to 30 seconds in all my testing.

Now, does this help with OWA? No, you will have to re-authenticate as that session is lost. Is it acceptable on a small budget with little impact in the event of a failure? Yes, by all means.

Tuesday, January 13, 2009

Using NetIQ Exchange Migrator to Migrate to a New Domain

So if you ever have the pleasure of using this utility, feel blessed. I'm not sure of the cost on it, but it is WELL worth it.

I recently had the opportunity to assist with a small user migration (500 users) for a company that was being split out of a rather large organization. Since we arrived late (meaning we weren't brought in until the last second) this utility saved our asses. The beauty of the product is that it wraps all tasks that you normally have into a simple TaskPad type view and guides you step by step through it. It is pure genius!

With just a simple few configurations, we were able to transfer all users email, public folder data, and even accounts from old domain/exchange org to the new. It then created a contact in the old domain to allow for email to flow from the old to the new. It also created contacts in the new domain so that you could send to the old domain. You can limit this down to certain users quite easily so that you won't confuse the end user too terribly much.

There are some interesting things needed though. The account that you used to do the migration (I was creative and used an account labeled "Migrate") will need to have local admin access on both domains (this is the administrators group located in the default Builtin Container) and also the local admin group on the Exchange servers in both domains. It also needs Full Exchange Administrator access delegated to it for both Exchange Orgs.

However, granting that level of access actually frees you of an ENORMOUS amount of tasks out there.

I reccomend it!!!

Friday, May 30, 2008

Converting Linux Boxes to VMWare ESX Virtual Machines

There is basically no documentation on this out there in the world, so I've decided to put it together for you as well as I can. I recently had the pleasure of converting a myriad of RedHat boxes from physical to virtual. (RH 7.3, 9, FC4 and ES4). This was by no means an easy task.



The largest issue was that I was going from IDE disks to SCSI disks with either LILO or GRUB as the boot loaders. LVM really did not help WHATSOEVER on this as well. Basically, Platespin and Converter were of ZERO use to me. Ghost was my friend...

My wish is that somewhere along the way that Converter and Platespin do something severely address this issue. I know, it's not any of their true business generated revenues, but come on, help out those that truly help you out.

So the biggest issue with these linux machines is that if you do any type of custom kernel, you probably don't include drivers for hardware that is not in use on your machine. Makes sense, no argument there. But prior to your P2V, make sure you include support for the scsi driver needed for vmware. Either a BusLogic (yikes!) or the LSI Logic scsi controller. At the very least compile the driver. You can go anywhere to figure out how to do this if you are unsure.

After this, use what ever disk image solution you know of to copy that machine up to your VMWare machine. You'll need to create the VM, and boot off whatever disk imaging solution you use. I used ghost and used the GhostCast Server piece to a succesful end.

Download the rescue disk for your distro or use a copy you already have and boot into rescue mode on that distro. You'll need to create a new memory image (mkinitrd) with the proper driver included in your /etc/modules.conf.

That's basically it, but no one went even into that much detail that I could find on the internet. It worked for me quite easily.

Friday, April 11, 2008

Windows Domain Rename - WOW

A customer wanted to do a domain migration. Easy enough with the ADMT from Microsoft. Create a new domain, move some users around, make sure they have seemless access to resources on member servers...blah blah blah...

That takes a while and can be very cumbersome. But, this is my PREFERRED method of doing it. The domain rename functionality freaks me out thoroughly.

And then this customer came along. They have no workstations and all users connect via Citrix. The email system is Lotus Domino, and here is the best part of that, they only use POP3 for it. It's not like they are small...they are over 500 users. So after poking around, they looked good for a domain rename and it would be the least impact to the users. They only had around 30 servers, so it's something that could be accomplished quickly.

So last night after prepping thoroughly, went after it. Following the docs from Microsoft was the best way to do it, and there is plenty of documentation on their site for it.

Oh, and one hold your breath moment with the DC's taking over 5 minutes to come up all the way. DNS took FOREVER to start and therefore when trying to logon, the new domian name did not exist. Didn't see that in the handy dandy documentation, but take heed, it does take a few minutes after the rendom /execute command and the reboot of the DC's for it to come to life all the way.

They did have quite a few SQL Servers which were my biggest problem. It has to do with Windows Auth being used on the SQL Server.


I had to make sure that I deleted the user and recreated that same user with the new domain name. Some jobs were orphaned and were therefore moved to be owned by the sa account. I don't forsee that to be a large problem, but you DBA's would know more on the actual problems and caveats of that.

Citrix was the most time consuming part of the ordeal. There is a nice link that applies to the version they were running (Metaframe XP). (http://support.citrix.com/article/CTX102371). Nice and very helpful.

So after it was all said and done, the UPN's were not automagically updated for the users. Quick VBS script to fix that? I think not. Thanks to Joe at www.joeware.net and his handy ADfind and ADmod utilities, a quick single line took care of it all:


adfind -b dc=new,dc=domainame -f "objectcategory=user" userPrincipalName sAMAccountname -adcsv admod userPrincipalName::{{sAMAccountName}}@new.domainname -unsafe

It's in his documentation on how to do this, but not explicit enough. Joe's a good man in protecting the innocent on this one with the damage that can be brought forth with admod itself.

So there you have it. A domain rename with over 500 users, 30 servers and a TON of Citrix apps in exactly 4 hours.

Tuesday, April 8, 2008

Cisco Fixup/Inspect Rules for SMTP/ESMTP

What in the world is up with this? Why does Cisco continue to push this with their PIX/ASA?

I agree that adding some additional security to the SMTP protocol is necessary in order to lower the amount of spam and attacks (Directory harvesting in particular), but I think we need to leave that to the MTA's themselves.

The havoc this inspection rule causes for mail flow is INSANE. Slow response times across the boards, banner masking, confusion on the part of admins as to what the hell is going on....etc. This recently happened with myself and a professional counterpart where we were banging our heads against the wall for several hours late into the night...ugh.

Ok, just venting.

Friday, March 28, 2008

Wicked Problem with EMC NS, iSCSI and VMWare

Wanted to let some engineers know about a “bug” that exists in the EMC Dart Code for the Cellera (NS) series of NAS.

There is a bug in that if an iSCSI read request that is larger than 1 MB in size is made to a Cellera, that the data mover may crash on the request. What happens is that the iSCSI service cannot process the request and results in a kernel panic on the data mover. The data mover then fails over, but due to the nature of the crash, meaning that the greater than 1 MB read request still must be processed, it could lead to the second data mover crashing, or in some cases, complete loss of data as the ESX server has writes needing to be processed.

There is also a HUGE possibility with ESX 3.5 where an HA event may be triggered and that the VM’s on that 3.5 node will be powered down. If you are in a mixture ESX 3.0.x and 3.5 nodes in an HA cluster, a HUGE amount of confusion can be brought on by the cluster with the HA event not being seen by the 3.0 hosts. This in fact lead to, in the case of one customer, 17 VM’s being corrupted and over 6 having to have DR performed on them to recover.

According to VMWare, they have made 3.5 much more sensitive to storage problems (???) and that it will force an HA event. Now, the customer in this case was using iSCSI QLogic HBA’s, so ESX is not aware of the underlying network calls being processed and in their defense treat the code the same as a Fiber Channel HBA so the timing is as if it’s an actual SAN and therefore doesn’t wait for the 45 second failover period of the data movers. This variable cannot be changed according to an escalation engineer within VMWare.

The cure is to patch the Dart Code on the NS so that it doesn’t panic on greater than 1 MB read requests.

I’d like to point out how easily a read over greater than 1 MB can be processed. VMFS formats a LUN in 1 MB blocks or larger. Windows NTFS formats it’s file system with 4k blocks. We all know that fragmentation occurs when a file is larger than 4k is written to the file system that it must take up at least 2 blocks. So, a 6k file actually takes up 8k of space on the file system. When the OS writes and it the next contiguous block is occupied, Windows (Linux, whatever) writes to the next available free block. So when FileA is 6k, it gets written to block 23 and the next block available is block 238,654. When the OS needs to read that file, it has to read both blocks. This is fragmentation. It takes a while to spin the disk and hence we get slower performance.

Well, since the 2 blocks are located on the VMFS VMDK file, ESX has to read 2 – 1 MB blocks to service the request for its Guest VM. Boom, kernel panic!

This problem was not found for this particular customer as the VM’s did not have fragmentation at the beginning of the implementation. However, now that the systems have run for a period of time, fragmentation builds up within both their file servers and database servers and has led to 2 major outages in exactly 10 days. They have thus decided, even though EMC claims the problem will not continue, that they are breaking the CX Clariion backend out and trashing the Cellerra NAS head. I really can’t say I blame them. EMC did not replace the first data mover that failed or was even able to determine the problem in the 10 days between the failures. So their failure yesterday did not have another data mover to move to. They were down for 10 hours. This customer is a bank and is very dependent on the services their infrastructure provides.

To make things worse, they did have a hot site that was configured for host based replication of their existing VM”s to be mirrored at. However, about 2 months ago their NS in that facility suffered from this very same problem and has not been completely repaired. They are to break the CX out there as well.

So at any rate, thought you guys should know. I have to admit I am personally a bit hesitant about pushing ESX with iSCSI out on these devices. It’s not solid, and EMC is HIGHLY unresponsive on resolving the issues when they do occur. When the sales rep is calling you to say (which I do have to say, that was awesome for him to do) “hey, I noticed your NAS was down” 2 hours after it occurred and tech support has not called the customer, there is a huge lack of communication or ability to fulfill the service requests that EMC receives. I know that this part is a rant, and everyone has some problems, but it’s not like we’re buying a 10k kia rio…Ok, I’ll shut up now.

Monday, March 10, 2008

Exchange 2007 Restore of Hub Transport

Here is another un-documented bug on doing a restore of an Exchange 2007 server:

I was doing a /M:recoverserver on an Exchange 2007 server that had failed. The server had Service Pack 1 installed for Exchange.

When doing the recover (after I re-installed the OS of course) the recover program complained about some registry entries not being there when it was recovered the Hub Transport role. Truly interesting as I was recovering this service!!

It complained that these keys did not exist:

HKLM\Software\Microsoft\Exchange\Transport
HKLM\Software\Microsoft\Exchange\Pickup

Simple enough though to move forward...

Just create the keys and then redo the /M:recoverserver and all will be well!