I rebooted our corporate Windows Server 2003 today. I was moving it to a UPS. No problem – except when I restarted I had no network connectivity.
First I saw a “service didn’t start, check the event viewer” message. The event viewer just told me I couldn’t register with the domain. I couldn’t do that because I didn’t have network access. I got the usual “may have limited connection” error.
I did all the usual things (ipconfig, repair connection, swap cables, switch accounts, login as local user, test everything, etc etc) but they all passed. The big breakthrough was when I investigated the advanced boot options on restart. Windows 2003 includes a “safe start with network” option. When I did that I had a network connection.
There was a lot more work to do before I found that disabling IPSEC service, then rebooting after disabling it, fixed everything.
I easily blew 6-8 hours of work today.
Lesson 1: Run Safe Boot/Safe Start with networking first.
Then you work your way through this Microsoft kb article. I’ll excerpt some key points, then pass on a trick, then I’ve got to go home and finish up the work I couldn’t do today …
How to Start the Computer in Safe ModeWhen you start the computer in Safe mode, Windows loads only the drivers and computer services that you need. You can use Safe mode when you have to identify and resolve problems that are caused by faulty drivers, programs, or services that start automatically.
If the computer starts successfully in Safe mode but it does not start in normal mode, the computer may have a conflict with the hardware settings or the resources. There may be incompatibilities with programs, services, or drivers, or there may be registry damage. In Safe mode, you can disable or remove a program, service, or device driver that may prevent the computer from starting….
System Configuration Utility (Msconfig.exe) automates the routine troubleshooting steps that Microsoft Product Support Services technicians use when they diagnose Windows configuration issues…
… Click the General tab, and then click Selective Startup.
…Note You might be able to determine more quickly which service is causing the problem by testing the services in groups. Divide the services into two groups--select the check boxes of the first group, and clear the check boxes of the second group. Restart your computer, and then test for the problem. If the problem occurs, the faulty service is in the group with the selected check boxes. If the problem does not occur, the faulty service is in the group with the cleared check boxes. Repeat this process on the faulty group until you have isolated the faulty service.
It took hours.
Here’s the trick. Boot in Safe Mode first. Then run msconfig.exe and look at the services. Assuming things work in safe mode, the ones that are running (sort by that column) are good. Now uncheck all services, check the ones that are currently running, apply, restart.
When you restart you’re in the equivalent of Safe Mode, but you can use msconfig.exe to add services in blocks.
The UI of this app is dismal. I sorted alphabetically, then did screen captures to a Word document to get a complete alpha sorted list. I printed that to guide my tedious enabling of sets. (In theory you can do the binary sort approach faster. Long story, can’t explain.)
One thing to watch for.
When you enable “Error Reporting Service” you start getting … error reports! Wow. So if gets enabled with a bunch of other items, you might think you’ve found a problem. Wrong. It’s just that now you’re getting the error reports.
So now I have to figure out what the #$!#% happened. I don’t think we’ve done any software installs on that box or tweaked any services. Did some antiviral update trigger a problem?
Update: This experts exchange article may be related, but the responses are not accessible. A clue:
Description: The IPSec driver has entered Block mode. IPSec will discard all inbound and outbound TCP/
Update: This article connects group policy file corruption to IPSEC problems and loss of network access, and points out there are definite bugs with group policy editing. I didn't touch local or group policy on our server, but perhaps another admin might have. I now see there have been nasty unfixed bugs.
Update: I'll take a look at these when I get back to work on Monday, then update this post. I think we're narrowing things down to a corruption of misconfiguration of a group policy file that activated IPSEC and disabled, without any meaningful entry in the event monitor, all network TCP/IP traffic.
- http://support.microsoft.com/kb/870910: looks like a pretty pertinent kb article
- http://support.microsoft.com/kb/914962: IPSEC bugs fixed in SP2. So did some later upgrade break them again? Clearly I need to check windows update for the server.
- http://support.microsoft.com/kb/898060: After SP1 a security update broke IPSEC. Should be ok in SP2, but did it get broken again?
- http://marc.info/?l=patchmanagement&m=121632162501913&w=2: A fairly recent DNS spoof prevention security update from Microsoft has broken IPSEC on some machines.
- http://support.microsoft.com/default.aspx?scid=kb;en-us;816579: In place upgrades when WS 2003 is truly hosed. I don't think this applies, but nice to know.
So Monday I'll look at windows update and try opening, reviewing and savng the IPSEC and Group Policy files. If they're corrupted they may cause other problems.
- http://support.microsoft.com/kb/956188 (details the problem)
- http://support.microsoft.com/kb/956189 (the fix)
- http://support.microsoft.com/kb/812873 (More on these transient ports)
The latter references the problem I had:
Event Type: ErrorUpdate 12/31/08: Nope, it didn't work.
Event Source: IPSec
Event Category: None
Event ID: 4292
Description: The IPSec driver has entered Block mode. IPSec will discard all incoming and outgoing TCP/IP network traffic that is not permitted by boot-time IPSec Policy exemptions.
User Action: To restore full unsecured TCP/IP connectivity, disable the IPSec services, and then restart the computer. For detailed troubleshooting information, review the events in the Security event log.
I finally got around to applying Microsoft's fix and it didn't work!
So even after I reserved these ports:
3343-3343I still got the service failure notice on restart and lost my network connections. Guess I'll have to wait for a service pack. I removed the registry changes I'd made (why ask for trouble?) and again disabled IPSEC services.