In this case there were 3-4 different bugs and failures that impacted our home network and especially backup. The lessons learned were not in my ai responses so they seem worth sharing. The interacting bugs included:
- Apple hardware bugs in the M1 Air that can make the Air WiFi unreliable. (Could be drivers/software but persistence suggests either a very hard problem or hardware with possible software mitigation.)
- The odd behavior of Eero wifi
- Bugs and limitations in macOS SMB networking and in HPFS mean network share filesystems can be corrupted beyond repair.
- A completely unrelated red herring that turned out to be due to CenturyLink's parent org messing up their DNS configuration.
Such is the nature of our times, where complexity and unsustainable share prices combine to decrease reliability of core systems.
At the core was WiFi instability. Our M1 Air WiFi was constantly fluctuating, leading to constant disconnects. I knew there was something wrong as Carbon Copy Cloner would quite often warn that a backup was being transiently disrupted by the network share disappearing. The very frequent write/read failures presumably led to the HPFS hard drive file system corruption. I had to diagnoses and fix the drive connection before focusing on the underlying WiFi issue.
Things I am doing differently now:
- Reconfigured physical layout of our Eero base stations so there was a direct "line of sight" short distance between the M1 Air dock location and the Eero base station. The M1 Air needs a much stronger than usual WiFi connection to be stable. The new configuration also offloads some traffic from an overloaded Eero device.
- We use a Synology Time Machine server as a secondary (not robust) backup. That backup was also corrupted (happens normally anyway -- because bugs, but WiFi issues sped it up). At least in Sequoia if you remove a Synology TM backup destination and then add it back there's an option to replace the original. This is faster than wiping it from the Synology side.
- I used advanced preferences so Carbon Copy Cloner will dismount the network share after a clone/backup is compete. The less that share is open the better because it's hosted from a MacBook Pro that can be disconnected from the network, and macOS/SMB does not handle that disconnect gracefully.
- I had configured a user quota for one of our machines that had become too small. Time Machine should have provided guidance about capacity but did not do so in a useful way.
Related
No comments:
Post a Comment