In the bad old Linux days, if you read a huge amount of data off the disk (like if you were doing a backup) Linux would try to take all that data and jam it in the page cache. This would push out all the useful stuff in your cache, and sometimes even cause swapping as Linux helpfully swaps out stuff to make room for more useless page cache.
One of the great things about `dd` is that you have a lot of control how the input and output files are opened. You can bypass the page cache when reading data by using iflag=direct, which would stop this from happening.
Moreover, flash drives (and all flash media) have a favorite page size, which is generally 4kB or 512kB. By defining a common denominator page size like 1024kB (with bs=1024kB), you can keep your flash drive happy with enough backlog to write, so it can perform at its peak write speed without churning, which will help with faster writes and lower write amplification, which is a win-win.
It's great to have control over this but I suspect most users never knew this was happening, had no idea dd could bypass that behavior, nor knew which argument to pass to dd to accomplish this. It's like saying 'what makes 3d printers so great is you can make anything!' but you'd be way better off with an industrially forged object than the 3d printed object.
Pretty much, and understanding what is going on "under the hood" as it were can be informative. Had the author done a 'cp myfile.foo /dev/sdb' on a UNIX system they would have found they now had a regular file named '/dev/sdb' with the contents of myfile.foo and and their sd card would have remained untouched. But you would only know that if you realized that cp would check to see if the file existed in the destination, unlink it[1], and then create a new file to copy into.
The subtlety of opening the destination file first, and then writing into it, was what made dd 'special' (and it would open things in RAW mode so there wasn't any translation going on for say, terminals) but that is lost on people. Bypassing the page cache and thus not killing directory and file operations for other users of the system is a level even below that. Only the few remaining who have done things "poorly" an incurred the wrath of the other users of the system sitting in the same room really get a good feel for that :-). Fortunately for nearly everybody these days they will never have to experience that social embarrassment. :-)
[1] Well unless you had noclobber set in which case it would error out.
Still the bad old days for copying files from iOS to Linux. It seems to make a copy internally of everything you copy internally on the device before sending it, which leads to running out of free space just trying to copy things of :(
That should write 512 bytes from "source" beginning 10k into "target". (I've not tested this, you should verify.)
4. Create a sparse file. Sparse files appear to have a nonzero size, but take up no space on disk, until data is actually written to them. These are often used as "inflating" dynamic filesystem images for virtual machines.
When reading to or from IBM data tapes, you might find blocking / unblocking conversions useful. I've done this, but it's so long ago that I don't trust my memory on that any more. Odds are good you'll not have to worry about this.
There are other useful applications as well, though these are not typically encountered very often. Do feel free to explore and attempt these on safe media.
Careful, your #2 and #3 are incorrect -- skip and seek operate with blocks, not bytes. So your #2 would copy 32 bytes after the first 32kB of data, and #3 would write 512 bytes at position 5120k.
Especially handy when you've fed in a huge amount of JSON (sometimes all on one line because, y'know, why not) into jq and you get the inscrutable output:
parse error: Invalid numeric literal at line 1, column 236162512
My most common use for `dd` is using it with `sudo` to direct the output of a unprivileged pipeline to a root-owned file. Instead of running `echo hello >/root/test.txt`, which will fail, I use `echo hello | sudo dd of=/root/test.txt`.
The trick to getting this throughput is telling Linux to do an insanely large IO (4 MiB). The drive can't do 4 MiB reads - the largest IO it can handle is 2 MiB.
My SSD - apparently one of the fastest on the market - KINGSTON SKC3000D2048G - rated 7GB/7GB read/write, after writting data to it, stopped being very fast. No idea, if it's something related to how nvme/ssd-s work, or maybe I have a broken unit.
I have another nvme drive - Force MP510 - and it doesn't care if data was previously written or not. When reading from it, I get ~full x4/Gen3 speeds of 3.5GB/s
PS: nvme smart-log shows 100% available spare, and 0% percentage_used, so it doesn't seem to be wear-related.
Oh thank you for submitting this to HN. I’ve been telling people not to use dd for years and everyone looks at me like I just gave birth to a full grown dinosaur or something.
“Well why does the entire internet say to use dd then?”
Because they copy from each other just like you copied from them. Just use cat.
> Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices.
That's the thing; it isn't. Author forgot to explain (if he knows that at all) that /dev/sda2 on Linux is not a raw device. It's a block device.
So if dd is hailed as something to use on /dev/sda, that's not an example of being hailed for a raw device.
dd's capability to control the read/write size is needed for classic raw devices on Unix, which require transfers to follow certain sizes.
E.g if a classic Unix tape needs 512 byte blocks, but you do 1024 byte writes, you lose half the data; each write creates a block.
The raw/block terminology comes from Unix. You have a raw device and a block device representing the same device. The block device allows arbitrarily sized reads and writes, doing the re-blocking underneath. That overhead costs something, which you can avoid by using the raw device (and doing so correctly).
Heh, I remember finding out this on the good old days when I was finding out how to rip cds and shitting bricks. I mean I read it on a BBS and thought "that can't be right"; I was expecting to find something like the Nero suite back in windows.
Much more recently, I enjoyed the same kind of amazement on the bash tcp pseudo devices.
I think this has a bit of bad advice for using cp or shell redirection to read / write to raw block devices but dd isn't necessarily the best either. Personally any time I'm trying to image some hard drive, damaged or otherwise, I'll just about always jump straight to ddrescue (not dd_rescue). It's similar to dd, surprise surprise, but it keeps a log of which parts of the input and output have been copied / had errors / skipped. Nothing is more annoying than waiting an hour for some large copy to make progress and then run into an error or get interrupted for whatever reason. Using ddrescue because it keeps a log of the status of the operation you can resume it with the same command and it will pick back up right where it left of instead of having to start all over. It's also intelligent enough to not fail the first time and skip over some bad region of the disk on error and come back and reattempt it using various strategies once it's already copied the low hanging fruit.
There's very little reason not to use it, even if it's just to get a nice progress view instead of just the current amount of data copied.
Just a note -- despite the title, the article eventually presents a nuanced view and points out that "dd" has some uses.
> If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.
> dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.
> However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.
One thing dd does for me, that cp and cat do not, is that it forces me to read and check the command at least three times before pressing enter, which is a very good thing when messing with raw devices.
When I first learned that dd was not magical, I started using cp but I made some mistakes with partitions number and whatnot (nothing serious).
Maybe it's just the wierd syntax or the fact that I treat dd differently, I'm just more cautious and don't press enter automatically. Of course it's silly but to me at least, that's a good reason to keep using dd.
Back in the Dark Ages (1968) we had pre-written JCl scripts. I don't think many people knew JCL. We just appended a script to the front of our Fortran card decks. A DD card was the final card. After the frustrating work of finally getting the JCL right, the DD card was always just the "Do it Damn it" card in my mind.
This actually serves at least two approximately legitimate purposes: firstly, it ensures that reads from sda are aligned to a (at least nominal, ie 512-byte) disk block, which doesnt matter for normal, kernel-supported drives like IDE/SATA/most USB (which is almost certainly what sda is), but avoids bespoke devices (or their drivers) trying to do something clever when gzip asks for 1 or 7 or 17 bytes. (And writes to poorly-designed devices/drivers can be even worse.)
More importantly, like useless use of cat, it prevents gzip from trying to delete sda when it's done, which is something it will in fact do:
$ echo test > /tmp/sda
$ gzip /tmp/sda
$ cat /tmp/sda
cat: /tmp/sda: No such file or directory
(For gzip specificially, you can also prevent this by writing `gzip </tmp/sda`, but I've occasionally run into tools that try to 'intellegently' handle stdin file descriptors that point at 'real' files, so I feel better having a separate process blocking the way.)
It’s funny (given that the name likely is inspired by Useless Use of ‘cat’) that examples here have actual useless uses of ‘cat’. ‘cat file | pv > disk’ can just be ‘pv < file > disk’.
But whatever. I’m sure 90% of code stems from something someone read and copied or came most easily to them. It works.
‘dd’ was really useful for finicky media like tapes and doing EBCDIC translation. It’s still great when you combine bs and count. Blow away an MBR, make an xGB file, etc.
It’s a Swiss Army knife. It can do a lot, but isn’t the best tool for most things. I still love it. Probably just muscle memory.
So, in FreeBSD "dd if=/dev/ada0p1 of=/dev/null bs=1 count=1" will fail: disk driver will return EINVAL from read(2) because I/O size (1) is not divisible by physical sector size (usually 512). "cat" with buffer size X (which depends on the implementation) will either work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of a signal.
Summary: dd(1) still has its place and author of original article is getting it wrong.
My best useless use of dd was in the Ubuntu 16.04 days.
I was at a satellite office with all windows PCs for the day, so used a live disk to get a decent environment to get things done.
Only problem was that the DVD drive kept spinning down and every time I did something that was not cached it made me wait forever.
nohup + while loop + sleep 4s + raw dd read from CD for the win :)
EDIT: Reading this article it sounds like dd has no "special" ability to access the disk in a raw way. But surely that's what the nocache option is for...
Occasionally I run a dd on a loop with increasing block sizes to see what is actually fastest. I regularly see instructions on the web saying you should use "1M" or even "4M", but in my tests, smaller block sizes are often faster.
A few years ago, "128K" was usually the fastest choice. Today, on faster systems, "512K" has a slight edge.
I could not tell you why, though. Try it for yourself.
I understand his point, but using 'dd' allows you to set a buffer size which can make cloning a bunch faster. It also has great progress reporting (status=progress) which is really useful for the things dd is usually used for.
And even if you use it without it being needed, it's not a big deal. It doesn't add much overhead, if any.
On the left side of a typical Linux hobbyist experience graph you probably have "this is a disk image file, and this is a disk, I'll just copy the disk image file onto the disk!", then you have a period of "I'll use the cool dd tool (without oflag=direct and/or sync because by this point you know people use it but you don't know why people use it) like I saw on the internet!", then when you understand that everything is a file you have "this is a disk image file, and this is a disk, I'll just copy the disk image file onto the disk!" again.
I personally suggest recommending people the Disks application included with Ubuntu Desktop and Fedora/Centos Workstation. It shows icons representing internal disks, SD cards or flash drives so they know what device they want to work with. If they want to take their time they can see all the information about the drives and partitions, they can start discovering and asking questions and reading up on how computers use disks right from there if they want to. And if they don't want to, it's just extra confirmations that it's the correct disk or DVD that they want to put their image onto. Then when they're sure about the device they can create a disk image and restore a disk image in that same application!
dd lets you specify the write block size. This is essential when writing to 9 track tape. Everything is a byte stream… except for the things which are not.
I don’t think the author is intending to tell everyone to always use something else, I think the author is trying to tell most people who use dd that there are easier ways to do what they are trying to do, and that things are files, which is something that a lot of people here seem to be forgetting.
[+] [-] spudlyo|3 years ago|reply
One of the great things about `dd` is that you have a lot of control how the input and output files are opened. You can bypass the page cache when reading data by using iflag=direct, which would stop this from happening.
[+] [-] bayindirh|3 years ago|reply
[+] [-] cosmotic|3 years ago|reply
[+] [-] ChuckMcM|3 years ago|reply
The subtlety of opening the destination file first, and then writing into it, was what made dd 'special' (and it would open things in RAW mode so there wasn't any translation going on for say, terminals) but that is lost on people. Bypassing the page cache and thus not killing directory and file operations for other users of the system is a level even below that. Only the few remaining who have done things "poorly" an incurred the wrath of the other users of the system sitting in the same room really get a good feel for that :-). Fortunately for nearly everybody these days they will never have to experience that social embarrassment. :-)
[1] Well unless you had noclobber set in which case it would error out.
[+] [-] totetsu|3 years ago|reply
[+] [-] a-dub|3 years ago|reply
there was some hueristic in there that tried to prevent it, but it wasn't very good.
[+] [-] watersb|3 years ago|reply
Early Windows NT was awful with this, pegging the system with a cascade of disk IO at unpredictable times, often for ten seconds or more.
Can anyone suggest ways to avoid blowing the file cache on Windows with large copies? Is this even a problem anymore?
[+] [-] watersb|3 years ago|reply
Please keep in mind that `ditto` is a file copy and archive utility, not a block copy utility like `dd` (which is also available on macOS).
An online man page for ditto: https://ss64.com/osx/ditto.html
[+] [-] guerrilla|3 years ago|reply
[+] [-] bushbaba|3 years ago|reply
[+] [-] throwaway2048|3 years ago|reply
[+] [-] dredmorbius|3 years ago|reply
There remain some useful applications of dd. These may of course be achieved by other mechanisms, but typically less conveniently.
1. Read a specific number of blocks or bytes from a source:
This will make a copy of, say, your master boot record (first 512 bytes of your first disk drive) and stash it in your /root directory.2. Read from specific bytes of a file
Reads 32 bytes after the first 1024 (1k) bytes of "mydata".3. Write to specific bytes of a file
That should write 512 bytes from "source" beginning 10k into "target". (I've not tested this, you should verify.)4. Create a sparse file. Sparse files appear to have a nonzero size, but take up no space on disk, until data is actually written to them. These are often used as "inflating" dynamic filesystem images for virtual machines.
5. Case conversions. Sure, you could use tr(1), but where's the sport? 6. ASCII / EBCDIC conversions When reading to or from IBM data tapes, you might find blocking / unblocking conversions useful. I've done this, but it's so long ago that I don't trust my memory on that any more. Odds are good you'll not have to worry about this.There are other useful applications as well, though these are not typically encountered very often. Do feel free to explore and attempt these on safe media.
[+] [-] photon-torpedo|3 years ago|reply
[+] [-] zimpenfish|3 years ago|reply
Especially handy when you've fed in a huge amount of JSON (sometimes all on one line because, y'know, why not) into jq and you get the inscrutable output:
[+] [-] Waterluvian|3 years ago|reply
What does the (1) mean beside dd? I see this with man pages. Is it a version identifier?
Edit: thank you both for taking the time to share. I appreciate the quick response.
[+] [-] rocqua|3 years ago|reply
[+] [-] cperciva|3 years ago|reply
Note that this can also be done using truncate(1).
[+] [-] yepguy|3 years ago|reply
[+] [-] matja|3 years ago|reply
[+] [-] gnubison|3 years ago|reply
[+] [-] mgerdts|3 years ago|reply
Today I found the magical dd command that causes an NVMe drive to run at almost full speed:
The trick to getting this throughput is telling Linux to do an insanely large IO (4 MiB). The drive can't do 4 MiB reads - the largest IO it can handle is 2 MiB. More in the tread starting here:https://twitter.com/OMGerdts/status/1514376206082269191?s=20...
[+] [-] jagrsw|3 years ago|reply
PS: nvme smart-log shows 100% available spare, and 0% percentage_used, so it doesn't seem to be wear-related.
[+] [-] trasz|3 years ago|reply
$ echo 'like this' | dd bs=1 speed=10
[+] [-] naikrovek|3 years ago|reply
“Well why does the entire internet say to use dd then?”
Because they copy from each other just like you copied from them. Just use cat.
[+] [-] jhardy54|3 years ago|reply
We’ve gone full circle.
https://en.m.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_ca...
[+] [-] kazinator|3 years ago|reply
That's the thing; it isn't. Author forgot to explain (if he knows that at all) that /dev/sda2 on Linux is not a raw device. It's a block device.
So if dd is hailed as something to use on /dev/sda, that's not an example of being hailed for a raw device.
dd's capability to control the read/write size is needed for classic raw devices on Unix, which require transfers to follow certain sizes.
E.g if a classic Unix tape needs 512 byte blocks, but you do 1024 byte writes, you lose half the data; each write creates a block.
The raw/block terminology comes from Unix. You have a raw device and a block device representing the same device. The block device allows arbitrarily sized reads and writes, doing the re-blocking underneath. That overhead costs something, which you can avoid by using the raw device (and doing so correctly).
[+] [-] trasz|3 years ago|reply
[+] [-] licebmi__at__|3 years ago|reply
Heh, I remember finding out this on the good old days when I was finding out how to rip cds and shitting bricks. I mean I read it on a BBS and thought "that can't be right"; I was expecting to find something like the Nero suite back in windows.
Much more recently, I enjoyed the same kind of amazement on the bash tcp pseudo devices.
[+] [-] bcook|3 years ago|reply
[+] [-] MertsA|3 years ago|reply
There's very little reason not to use it, even if it's just to get a nice progress view instead of just the current amount of data copied.
[+] [-] krnlpnc|3 years ago|reply
It's been invaluable as an intuitive tool to recover data from failing disks/drives.
Never thought to use it as a day to day dd, but will give that a try. Thanks for the idea!
[+] [-] pronoiac|3 years ago|reply
[+] [-] bee_rider|3 years ago|reply
> If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.
> dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.
> However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.
[+] [-] lgeorget|3 years ago|reply
When I first learned that dd was not magical, I started using cp but I made some mistakes with partitions number and whatnot (nothing serious).
Maybe it's just the wierd syntax or the fact that I treat dd differently, I'm just more cautious and don't press enter automatically. Of course it's silly but to me at least, that's a good reason to keep using dd.
[+] [-] cc101|3 years ago|reply
[+] [-] a1369209993|3 years ago|reply
More importantly, like useless use of cat, it prevents gzip from trying to delete sda when it's done, which is something it will in fact do:
(For gzip specificially, you can also prevent this by writing `gzip </tmp/sda`, but I've occasionally run into tools that try to 'intellegently' handle stdin file descriptors that point at 'real' files, so I feel better having a separate process blocking the way.)[+] [-] salmo|3 years ago|reply
But whatever. I’m sure 90% of code stems from something someone read and copied or came most easily to them. It works.
‘dd’ was really useful for finicky media like tapes and doing EBCDIC translation. It’s still great when you combine bs and count. Blow away an MBR, make an xGB file, etc.
It’s a Swiss Army knife. It can do a lot, but isn’t the best tool for most things. I still love it. Probably just muscle memory.
[+] [-] orkj|3 years ago|reply
[+] [-] mkup|3 years ago|reply
In FreeBSD (and presumably other UNIX implementations) they aren't: https://docs.freebsd.org/en/books/arch-handbook/driverbasics...
So, in FreeBSD "dd if=/dev/ada0p1 of=/dev/null bs=1 count=1" will fail: disk driver will return EINVAL from read(2) because I/O size (1) is not divisible by physical sector size (usually 512). "cat" with buffer size X (which depends on the implementation) will either work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of a signal.
Summary: dd(1) still has its place and author of original article is getting it wrong.
[+] [-] eggsome|3 years ago|reply
I was at a satellite office with all windows PCs for the day, so used a live disk to get a decent environment to get things done. Only problem was that the DVD drive kept spinning down and every time I did something that was not cached it made me wait forever.
nohup + while loop + sleep 4s + raw dd read from CD for the win :)
EDIT: Reading this article it sounds like dd has no "special" ability to access the disk in a raw way. But surely that's what the nocache option is for...
[+] [-] enasterosophes|3 years ago|reply
[+] [-] LeoPanthera|3 years ago|reply
A few years ago, "128K" was usually the fastest choice. Today, on faster systems, "512K" has a slight edge.
I could not tell you why, though. Try it for yourself.
[+] [-] GekkePrutser|3 years ago|reply
And even if you use it without it being needed, it's not a big deal. It doesn't add much overhead, if any.
[+] [-] chronogram|3 years ago|reply
I personally suggest recommending people the Disks application included with Ubuntu Desktop and Fedora/Centos Workstation. It shows icons representing internal disks, SD cards or flash drives so they know what device they want to work with. If they want to take their time they can see all the information about the drives and partitions, they can start discovering and asking questions and reading up on how computers use disks right from there if they want to. And if they don't want to, it's just extra confirmations that it's the correct disk or DVD that they want to put their image onto. Then when they're sure about the device they can create a disk image and restore a disk image in that same application!
[+] [-] klibertp|3 years ago|reply
[+] [-] jws|3 years ago|reply
[+] [-] naikrovek|3 years ago|reply
[+] [-] mark-r|3 years ago|reply