A disk so full, it couldn't be restored

[+] miles|1 year ago|reply

The author might have had better luck by using an external storage device to boot the Mac and delete unneeded files on the internal disk from there:

Use an external storage device as a Mac startup disk https://support.apple.com/en-us/111336

Was surprised to learn that with Apple silicon-based Macs, not all ports are equal when it comes to external booting:

If you're using a Mac computer with Apple silicon, your Mac has one or more USB or Thunderbolt ports that have a type USB-C connector. While you're installing macOS on your storage device, it matters which of these ports you use. After installation is complete, you can connect your storage device to any of them.

* Mac laptop computer: Use any USB-C port except the leftmost USB-C port when facing the ports on the left side of the Mac.

* iMac: Use any USB-C port except the rightmost USB-C port when facing the back of the Mac.

* Mac mini: Use any USB-C port except the leftmost USB-C port when facing the back of the Mac.

* Mac Studio: Use any USB-C port except the rightmost USB-C port when facing the back of the Mac.

* Mac Pro with desktop enclosure: Use any USB-C port except the one on the top of the Mac that is farthest from the power button.

* Mac Pro with rack enclosure: Use any USB-C port except the one on the front of the Mac that's closest to the power button.

[+] mrb|1 year ago|reply

The author tried essentially the same thing as what you suggest. He booted into recoveryOS (a separate partition) then from there tried to delete files from the main system partition. But rm failed with the same error "No space left on device". So as others have suggested, truncating a file might have worked "echo -n >file"

[+] userbinator|1 year ago|reply

If the filesystem itself got into a deadlocked state, booting from anything and going through the FS driver to delete files from it won't work.

[+] klausa|1 year ago|reply

Why do you think that would work; if using recoveryOS or starting the Mac Share Disk/Target Disk mode didn't?

[+] appplication|1 year ago|reply

This is the kind of comment someone is going to be very happy to read in 8 years when they’re looking for answers for their (then) ancient Mac.

[+] deeth_starr_v|1 year ago|reply

Anyone know why you can’t use the first usb-c port on a Mac laptop to make the bootable os?

[+] Macha|1 year ago|reply

> Was surprised to learn that with Apple silicon-based Macs, not all ports are equal when it comes to external booting

iirc, not all ports were equal when it came to charging with the m1 macs, so this is actually not so surprising.

[+] timcederman|1 year ago|reply

Or boot it into Target Disk Mode using another machine.

[+] userbinator|1 year ago|reply

My best guess at what happened (based on a little knowledge of HFS+ disk structures, but not APFS) is that the journal file also filled up, and since deletion requires writing to it and possibly expanding it, you get into the unusual situation where deletion requires, at least temporarily, more space.

macOS continued to write files until there was just 41K free on the drive.

I've (accidentally) ran both NTFS and FAT32 to 0 bytes free, and it was always possible to delete something even in that situation.

Digging around in forums, I found that Sonoma has broken the SMB/Samba-based networking mount procedure for Time Machine restores, and no one had found a solution. This appears to still be the case in 14.4.

In my experience SMB became unreliable and just unacceptably buggy many years ago, starting around the 10.12-10.13 timeframe; and now it looks like Apple doesn't care about whether it works at all anymore.

I hate to think what people without decades of Mac experience do when confronted with systemic, cascading failures like this when I felt helpless despite what I thought I knew and all the answers I searched for and found on forums.

I don't have "decades of Mac experience", but the first thing I'd try is a fsck --- odd not to see that mentioned here.

If I were asked to recover from this situation, and couldn't just copy the necessary contents of the disk to another one before formatting it and then copying back, I'd get the APFS documentation (https://developer.apple.com/support/downloads/Apple-File-Sys...) and figure out what to edit (with dd and a hex editor) to get some free space.

[+] arghwhat|1 year ago|reply

That's for a journalling filesystem. For CoW filesystems, the issue is that any change to the filesystem is done by making a new file tree containing your change, and then updating the root to point the new tree. Later, garbage collection finds files that are no longer part of an active tree and returns their storage to the pool.

Changes are usually batched to reduce the amount of tree changes to a manageable amount. A bonus of this design is that a filesystem snapshot is just another reference to a particular tree.

This requires space, but CoW filesystems also usually reserve an amount of emergency storage for this reason.

[+] Lammy|1 year ago|reply

> Sonoma has broken the SMB/Samba-based networking

Apple dropped Samba in favor of their own implementation a long time ago after Samba adopted GPLv3:

https://lists.samba.org/archive/samba-announce/2007/000122.h...

https://www.engadget.com/2011-03-24-apple-to-drop-samba-netw...

[+] wazoox|1 year ago|reply

Ah, Apple. SMB has always performed from horribly slowly a few years back, to barely decent recently, but is still way slower than NFS or (oh the irony) Appleshare on the exact same hardware.

Tested a few years ago throughput to a big NAS connected in 10gigEo from a Hackintosh with BlackMagic Disk Speed Test :

* running Windows, SMB achieves 900MB/s

* running MacOS, SMB achieves 200MB/s

* running MacOS, NFS and AFP both achieve 1000MB/s

Anything related to professional work is a sad joke in MacOS, alas.

(People keep repeating that AFP is dead, however it still works fine as a client on my Mac Pro -- and performs so much better than SMB than it's almost comical).

[+] itsTyrion|1 year ago|reply

Fun (until you run into it) fact: the same thing is possible with BTRFS and ZFS. If you manage fill it to the brim, you might have a problem. BTRFS tries to become read-only while there is still room for metadata so you can remount it in safe mode and delete something, but no safecguard is perfect.

> ran both NTFS and FAT32 to 0b and was able to delete something.

AFAIK those aren’t journaled, no?

[+] begueradj|1 year ago|reply

So far, you're the only one who provided a technical explanation for this.

[+] greenicon|1 year ago|reply

For a networked Time Machine restore you can reinstall MacOS without restoring first and then use the migration utility to restore from a remote Time Machine. That seems to use a different smb binary which works. Still, I find it infuriating that restoring, one of the most important things you do on a machine, is broken and was not caught by QA.

[+] chrisjj|1 year ago|reply

> get into the unusual situation where deletion requires, at least temporarily, more space.

s/unusual/usual/ surely.

[+] staticfloat|1 year ago|reply

I ran into an issue like this in my first ever job! I accidentally filled up a cluster with junk files and the sysadmin started sending me emails saying I needed to fix it ASAP but rm wouldn’t work. He taught me that file truncation usually works when deletion doesn’t, so you can usually do “cat /dev/null > foo” when “rm foo” doesn’t work.

[+] mjevans|1 year ago|reply

In shell :>filepath often works...

However sometimes filesystems can't do that. For those cases, hopefully the filesystem supports: resize-grow, resize-shrink, and either additional temporary storage or is on top of an underlying system which can add/remove backing storage. You may also need to use custom commands to restore the filesystem's structure to one intended for a single block device (btrfs comes to mind here).

[+] JohnMakin|1 year ago|reply

I was once in a situation years ago where a critical piece of infrastructure could brick itself irreparably with a deadlock unless it was always able to write to the file system, so I had a backup process just periodically send garbage directly to dev null and as far as I know that dirty hack is still running years later.

/dev/null is magical and worth reading into

[+] pram|1 year ago|reply

You can actually just do >file

[+] JdeBP|1 year ago|reply

Although note that several comments here report situations where truncation doesn't work either. 21st century filesystem formats are a lot more complex than UFS, and with things like snapshotting and journalling there are new ways for a filesystem to deadlock itself.

[+] deltarholamda|1 year ago|reply

I accidentally filled a ZFS root SSD with a massive samba log file (samba log level set way high to debug a problem, and then forgot to reset it), and had to use truncate to get it back.

I knew that ZFS was better about this, but even so I still got that "oh... hell" sinking feeling when you really bork something.

[+] dredmorbius|1 year ago|reply

Having recently experienced an over-capacity MacOS disk, "emptying" files in this manner simply did not work.

[+] pdimitar|1 year ago|reply

To me what works is `cat /dev/null >! filename`.

[+] jaimehrubiks|1 year ago|reply

Great to know

[+] voidwtf|1 year ago|reply

It seems like Time Machine has been steadily declining. I'm not sure why there is no impetus to get it reliable and functioning well. Between sparse bundles becoming corrupt and having to start a new backup and failing functionality I haven't felt like Time Machine is worth setting up anymore. This is in stark contrast to the iOS/iPadOS backups which have worked every time.

[+] thenickdude|1 year ago|reply

By contrast, ZFS has "slop space" to avoid this very problem (wedging the filesystem by running out of space during a large operation). By default it reserves 3.2% of your volume's space for this, up to 128GB.

So by adjusting the Linux kernel tunable "spa_slop_shift" to shrink the slop space, you can regain up to 128GB of bonus space to successfully complete your file deletion operations:

https://openzfs.github.io/openzfs-docs/Performance%20and%20T...

[+] JdeBP|1 year ago|reply

People find it a confusing idea to grasp that deleting things actually requires more space, either temporarily or permanently. Other comments here have gone into the details of why some modern filesystems with snapshotting and journalling and so forth actually end up needing to allocate from free space in order to delete stuff.

In a different field: In the early decade of Wikipedia it often had to be explained to people that (at least from roughly 2004 onwards) deleting pages with the intention of saving space on the Wikipedia servers actually did the opposite, since deletion added records to the underlying database.

Related situations:

* In Rahul Dhesi's ZOO archive file format, deleting an archive entry just sets a flag on the entry's header record. ZOO also did VMS-like file versioning, where adding a new version of a file to an archive did not overwrite the old one.

* Back in the days of MS/DR/PC-DOS and FAT, with (sometimes) add-on undeletion utilities installed, deleting a file would need more space to store a new entry into the database that held the restore information for the undeletion utility.

* Back in the days of MS/DR/PC-DOS and FAT, some of the old disc compression utilities compressed metadata as well, leading to (rare but possible) situations where metadata changes could affect compressibility and actually increase the (from the outside point of view) volume size.

"I delete XYZ in order to free space." is a pervasive concept, but it isn't strictly a correct one.

[+] jen729w|1 year ago|reply

I had this issue in October 2018 as documented in this Stack Overflow question, whose text I’ll paste below.

I was lucky: I had an additional APFS partition that I could remove, thus freeing up disk space. Took me a while to figure out, during which time I was in a proper panic.

---

https://apple.stackexchange.com/questions/338721/disk-full-t...

I’m in a pickle here. macOS Mojave, just updated the other day. I managed to fill my disk up while creating a .dmg, and the system froze. I rebooted. Kernel panic.

Boot to Recovery mode. Mount the disk. Open Terminal.

–bash–3.2# rm /path/to/large/file

rm: /path/to/large/file: No space left on device

Essentially the same issue as this Unix thread from ‘08! https://www.unix.com/linux/69889-unable-remove-file-using-rm...

I’ve tried echo x > /path/to/large/file, no good.

It’s borked. Does anyone have any suggestions that aren’t “wipe the drive and restore from your backup”?

[+] desro|1 year ago|reply

Impressive. I've never dealt with a situation where even `rm` failed, but I have had the displeasure of using and managing modern Macs with 256 GB (or less) of internal storage. I like to keep a "spaceholder" file of around 16GB so when things inevitably fill up and prevent an update or something else, I can nuke the placeholder without having to surgically prune things with `ncdu`

[+] timenova|1 year ago|reply

I noticed CockroachDB does the same thing too, on node startup [0].

[0] https://www.cockroachlabs.com/docs/v23.2/cluster-setup-troub...

[+] LeifCarrotson|1 year ago|reply

I find that one of the main benefits of a space holding file is that when it's needed, freeing up that space provides a window of time where you can implement a long-term solution (like buying a new drive with quadruple the storage space of the original for the cost of an hour of that employee/machine's time).

[+] TazeTSchnitzel|1 year ago|reply

I've had a similar experience on my iPhone. The disk became so full that deleting things was seemingly no longer actually doing anything. Rebooting, the phone couldn't be logged into. Rebooting again, it boot-looped. Rebooting once more, it booted into an inconsistent state where app icons still existed on the home screen, but the actual app was missing, so the icon was blank and the app could not be launched. I became concerned about data integrity and ultimately restored from a backup.

I am certain this was a result of APFS being copy-on-write and supporting snapshotting. If no change is immediately permanent, but instead old versions of files stay around in a snapshot, then if you don't have enough space for more snapshot metadata you're in trouble. Maybe they skip the snapshot in low disk space situations, but they still have the copy-on-write metadata problem.

[+] entropicgravity|1 year ago|reply

I ran into a similar situation not long ago on the system partition of a linux installation. The partition was too small to begin with and as new updates piled up there was almost no space left to start deleting stuff. It took me about half an hour to find a subdirectory with a tiny bit of stuff that could be deleted. It was like being in a room so plugged up with junk that you couldn't open the (inward swinging) door to let yourself out.

From the tiny beginning I started being able to delete bigger and bigger spaces until finally it was clear and then of course I resized the partition so that wouldn't happen again. The End.

[+] albertzeyer|1 year ago|reply

When `rm file` gives you "No space left on device", a trick you can do:

    echo > file  # delete content of file first
    rm file  # now it should work

[+] magicalhippo|1 year ago|reply

Reminds me of when a customer's database kept crashing with an error code indicating the disk was full. Except Windows Explorer showed the disk having hundreds of gigs free...

Took us a little while to figure out that the problem was the database file was so fragmented NTFS couldn't store more fragments for the file[1].

What had happened was they had been running the database in a VM with very low disk space for a long time, several times actually running out of space, before increasing the virtual disk and resizing the partition to match. Hence all the now-available disk space.

Just copying the main database file and deleting the old solved it.

[1]: https://superuser.com/questions/1315108/ntfs-limitations-max...

[+] farkanoid|1 year ago|reply

My wife's IPhone 12 Pro Max had the same problem.

She somehow managed to fill up the entire 512GB. Updates were unsuccessful, she couldn't make calls and wasn't able to delete anything to make room.

She couldn't even back up her phone through iTunes, the only option was to purchase an iCloud subscription and back up to the cloud in order to access her photos.

[+] _wire_|1 year ago|reply

Ran into precisely this problem with a friend's Ventura Mini last year.

The solution was to boot into recovery and mount the Data partition using Disk Utility.

I don't recall where the Data partition gets mounted but I think it is:

"/System/Volumes/Macintosh HD - Data"

Or just Data, since Sonoma. It will be clear from Disk Utility.

Then close Disk Utility and go into Terminal and run rm on a big unseeded file.

You can find one using:

find <data-mnt> -size +100m

Using rm will fail.

Unmount the Data partition and run fsck on it.

This completes the deletion.

From there enough more space can be freed in recovery to have a healthy buffer, then reboot normally and finish cleaning.

It seems that when the volume gets so full that rm doesn't work anymore the filesystem also gets corrupted.

HTH and that I didn't forget anything.

[+] wyldfire|1 year ago|reply

If you can truncate() an existing file (via 'echo > big_file.img' or similar), I would hope the filesystem could deallocate the relevant extents without requiring more space. Seems a bit like a filesystem defect not to reserve enough space to recover from this condition with unlink().

[+] anticensor|1 year ago|reply

you need delete_and_dealloc(), not unlink()

[+] whartung|1 year ago|reply

I had this happen to me, though I can’t recall how I fixed it.

In general I’ve had good success with Time Machine. I, too, have lost TM volumes. I just erased them and started again. Annoying to be sure but 99.99% of the time don’t need a years worth of backups.

The author mentioned copying the Time Machine drive. I have never been able to successfully do that. Last time I tried I quit after 3 days. As I understand it, only Finder can copy a Time Machine drive. Terrible experience.

That said, I’d rather cope with TM. It’s saved me more than it’s hurt me, and even an idiot like me can get it to work.

I did have my machine just complain about one of my partitions being irreparable, but it mounted read only so I was able to copy it, and am currently copying it back.

I don’t know if this is random bit rot, or if something is going wrong with the drive. That would be Bad, it’s a 3TB spinning drive. Backed up with BackBlaze (knock on wood), but I’d rather not have to go through the recovery process if I could avoid it.

Problem is I don’t know how to prevent it. It’s been suggested that SSDs are potentially less susceptible to bit rot, so maybe switching to one of those is a wise plan. But I don’t know.

[+] skhr0680|1 year ago|reply

> The author mentioned copying the Time Machine drive. I have never been able to successfully do that. Last time I tried I quit after 3 days. As I understand it, only Finder can copy a Time Machine drive. Terrible experience.

rsync -av $SOURCE $DEST has never let me down. Copy or delete on Time Machine files using Finder never worked for me.

> Problem is I don’t know how to prevent it. It’s been suggested that SSDs are potentially less susceptible to bit rot, so maybe switching to one of those is a wise plan. But I don’t know.

OpenZFS with two drives should protect you from bit rot. ZFS almost became the Mac file system in Snow Leopard.

[+] pronoiac|1 year ago|reply

I have notes somewhere on roundtripping Time Machine backups between USB drives and network shares. (It's non-trivial, and it's not supported, but it worked.) It was with HFS+ backups, and there were various bits that were "Here Be Dragons", so I never posted them.

[+] armchairhacker|1 year ago|reply

btrfs used to have this issue, the problem being that the filesystem has to append metadata to do any operation including (ironically) deletion: https://serverfault.com/a/478742.

AFAIK it's fixed now, because btrfs reserves some space and reports "disk full" before it's reached. macOS probably does the same (I'd hope), but it seems in this case the boundary wasn't enforced properly and the background snapshot caused it to write beyond.

[+] userbinator|1 year ago|reply

It looks like ZFS also suffers(ed?) the same problem:

https://zfs-discuss.opensolaris.narkive.com/BQ7RMcjo/cannot-...

[+] willyt|1 year ago|reply

They mentioned that the problem occurred while steam was downloading, I wonder if because steam is ultra cross platform with bare minimum OS specific UI it is using something quite low level to write data to disk? Maybe NSFile does some checks that posix calls can’t do while remaining compliant with the spec or something weird like that. That would explain why people using various low level ´pro level’ cross platform tools like databases would have issues but typical garage band user is usually ok. If you’re doing database writes you probably don’t want the overhead of these checks making your file system performance look bad so it’s left to the software to check that it’s not going to fill up the file system. Stab in the dark hypothesis. I would hope that however we are writing data to the file system it shouldn’t be able to lock it up like this. I’d be curious for someone with technical knowledge of this to chime in.

[+] themoonisachees|1 year ago|reply

Steam simply stores games in it's install folder, and while downloading the (compressed) game files it keeps them fragmented in a separate directory. As far as I can tell it doesn't employ special low-level APIs, because on lower power hardware (and even sometimes on gaming gear) the bottleneck is often the decompression step. This is what steam is doing when you are downloading a game and it stops using the network but the disk usage is going and the processor gets pinned at 100%.

I also heard of this happening to regular users downloading stuff with safari. It is simply terrible design on apple's part that you can kill a macOS install simply by filling it up so much that it becomes possible to not be able to delete files.

[+] sspiff|1 year ago|reply

Still, you should not be able to brick your device into a state like this with legitimate, normal, non elevated operations.

If the POSIX API does have some limitation which would prevent this error from occurring with higher level APIs (which I sincerely doubt), macOS should simply start failing with errno = ENOSPC earlier for POSIX operations.

There is no other system that behaves like this, and we wouldn't be making excuses like this if Microsoft messed something basic up like this.

[+] protoman3000|1 year ago|reply

> I found that Sonoma has broken...

All too familiar. I have two Macs. I upgraded one of them to Sonoma and ever since then it has been nothing but headache and disappointment. Starting from the upgrade having failed (meaning I had to completely wipe the disk and install Sonoma from scratch, luckily I still had data), to problems with Handoff, the firewall seems to not work, Excel very slow etc etc.

I don't recommend using Sonoma.

[+] JackYoustra|1 year ago|reply

This happened to me! My solution was to go to an apple store, buy one of their portable SSDs right there, cp everything on to the SSD (that didn't appear to use any additional space!), wipe the mac, and then rm some unneeded stuff on the ssd before cp-ing back and using their no-fee return to return the SSD. There were a few esoteric issues, but for the most part it worked.

234 comments