I rebooted our corporate Windows Server 2003 today. I was moving it to a UPS. No problem – except when I restarted I had no network connectivity.
First I saw a “service didn’t start, check the event viewer” message. The event viewer just told me I couldn’t register with the domain. I couldn’t do that because I didn’t have network access. I got the usual “may have limited connection” error.
I did all the usual things (ipconfig, repair connection, swap cables, switch accounts, login as local user, test everything, etc etc) but they all passed. The big breakthrough was when I investigated the advanced boot options on restart. Windows 2003 includes a “safe start with network” option. When I did that I had a network connection.
There was a lot more work to do before I found that disabling IPSEC service, then rebooting after disabling it, fixed everything.
I easily blew 6-8 hours of work today.
Lesson 1: Run Safe Boot/Safe Start with networking first.
Then you work your way through this Microsoft kb article. I’ll excerpt some key points, then pass on a trick, then I’ve got to go home and finish up the work I couldn’t do today …
How to troubleshoot startup problems in Windows Server 2003
How to Start the Computer in Safe Mode
When you start the computer in Safe mode, Windows loads only the drivers and computer services that you need. You can use Safe mode when you have to identify and resolve problems that are caused by faulty drivers, programs, or services that start automatically.
If the computer starts successfully in Safe mode but it does not start in normal mode, the computer may have a conflict with the hardware settings or the resources. There may be incompatibilities with programs, services, or drivers, or there may be registry damage. In Safe mode, you can disable or remove a program, service, or device driver that may prevent the computer from starting….How to Use System Configuration Utility
System Configuration Utility (Msconfig.exe) automates the routine troubleshooting steps that Microsoft Product Support Services technicians use when they diagnose Windows configuration issues…
… Click the General tab, and then click Selective Startup.
…Note You might be able to determine more quickly which service is causing the problem by testing the services in groups. Divide the services into two groups--select the check boxes of the first group, and clear the check boxes of the second group. Restart your computer, and then test for the problem. If the problem occurs, the faulty service is in the group with the selected check boxes. If the problem does not occur, the faulty service is in the group with the cleared check boxes. Repeat this process on the faulty group until you have isolated the faulty service.
It took hours.
Here’s the trick. Boot in Safe Mode first. Then run msconfig.exe and look at the services. Assuming things work in safe mode, the ones that are running (sort by that column) are good. Now uncheck all services, check the ones that are currently running, apply, restart.
When you restart you’re in the equivalent of Safe Mode, but you can use msconfig.exe to add services in blocks.
The UI of this app is dismal. I sorted alphabetically, then did screen captures to a Word document to get a complete alpha sorted list. I printed that to guide my tedious enabling of sets. (In theory you can do the binary sort approach faster. Long story, can’t explain.)
One thing to watch for.
When you enable “Error Reporting Service” you start getting … error reports! Wow. So if gets enabled with a bunch of other items, you might think you’ve found a problem. Wrong. It’s just that now you’re getting the error reports.
IPSEC.
So now I have to figure out what the #$!#% happened. I don’t think we’ve done any software installs on that box or tweaked any services. Did some antiviral update trigger a problem?
Update: This experts exchange article may be related, but the responses are not accessible. A clue:
Description: The IPSec driver has entered Block mode. IPSec will discard all inbound and outbound TCP/
IP network traffic that is not permitted by boot-time IPSec Policy exemptions. User Action: To restore full unsecured TCP/IP connectivity, disable the IPSec services, and the restart the computer. For detailed troubleshootinginformation , review the events in the Security event log
Update: This article connects group policy file corruption to IPSEC problems and loss of network access, and points out there are definite bugs with group policy editing. I didn't touch local or group policy on our server, but perhaps another admin might have. I now see there have been nasty unfixed bugs.
Update: I'll take a look at these when I get back to work on Monday, then update this post. I think we're narrowing things down to a corruption of misconfiguration of a group policy file that activated IPSEC and disabled, without any meaningful entry in the event monitor, all network TCP/IP traffic.
- http://support.microsoft.com/kb/870910: looks like a pretty pertinent kb article
- http://support.microsoft.com/kb/914962: IPSEC bugs fixed in SP2. So did some later upgrade break them again? Clearly I need to check windows update for the server.
- http://support.microsoft.com/kb/898060: After SP1 a security update broke IPSEC. Should be ok in SP2, but did it get broken again?
- http://marc.info/?l=patchmanagement&m=121632162501913&w=2: A fairly recent DNS spoof prevention security update from Microsoft has broken IPSEC on some machines.
- http://support.microsoft.com/default.aspx?scid=kb;en-us;816579: In place upgrades when WS 2003 is truly hosed. I don't think this applies, but nice to know.
So Monday I'll look at windows update and try opening, reviewing and savng the IPSEC and Group Policy files. If they're corrupted they may cause other problems.
- http://support.microsoft.com/kb/956188 (details the problem)
- http://support.microsoft.com/kb/956189 (the fix)
- http://support.microsoft.com/kb/812873 (More on these transient ports)
The latter references the problem I had:
Event Type: ErrorUpdate 12/31/08: Nope, it didn't work.
Event Source: IPSec
Event Category: None
Event ID: 4292
Date: Date
Time: Time
User: N/A
Computer: Server_name
Description: The IPSec driver has entered Block mode. IPSec will discard all incoming and outgoing TCP/IP network traffic that is not permitted by boot-time IPSec Policy exemptions.
User Action: To restore full unsecured TCP/IP connectivity, disable the IPSec services, and then restart the computer. For detailed troubleshooting information, review the events in the Security event log.
I finally got around to applying Microsoft's fix and it didn't work!
So even after I reserved these ports:
3343-3343I still got the service failure notice on restart and lost my network connections. Guess I'll have to wait for a service pack. I removed the registry changes I'd made (why ask for trouble?) and again disabled IPSEC services.
1645-1646
1812-1813
2883-2883
4500-4500
I’ve been struggling with IPSEC problems for a while. I’ve read many articles but nothing to cover the problems I’ve experienced. I manage over 50 servers using terminal services and find at times some of them will not be accessible after a remote reboot. After having someone onsite reboot the server it always comes up fine. After the remote reboots the event log show the IPSec driver has entered Block mode which of course prevents access. So my problem is strictly with a reboot done with terminal services. Since IPSec works fine otherwise, the policy must be fine as well.
ReplyDeleteJust a follow-up to my post above. I believe I found the cause and answer to this problem and why it is intermittent. The problem and resolutions can be found here: http://support.microsoft.com/kb/956188. This article also discusses this issue: http://support.microsoft.com/kb/956189
ReplyDeleteThanks very much for following-up with the solution. I'd just disabled IPSec and put it on the back burner. Now I'll fix it.
ReplyDeleteI added your references as an update to the original post.
I had the same problem with a windows 2003 32 bit server. The system didn't come up after WSUS patching. I rolled the system back to a pre-patching snapshot but still neworking only in safe mode. Disabling IPSec service did the trick.
ReplyDeleteDid you ever find a root cause for the IPSec issue?
Thanks, you just saved me hours of diagnostics!!!!
ReplyDeleteYep. We had this happen to us. Windows Update, reboot, system was offline. After many dead ends, found this article, disabled the IPSEC service (previously set to automatic) and restarted the server. Obviously, something happened here after having no issues with this heretofore, and still a bit of a mystery as to why this had to happen.
ReplyDeleteI have been fighting this issue for a couple days! I just found and read through this and tried turning off IPSEC with no results. Does anyone have any further updates on this issue?
ReplyDeleteI have been fighting this issue for a couple days! I just found and read through this and tried turning off IPSEC with no results. Does anyone have any further updates on this issue?
ReplyDelete