top | item 31474710

Useless Use of "dd" (2015)

395 points| RaoulP | 3 years ago |vidarholen.net | reply

252 comments

order
[+] spudlyo|3 years ago|reply
In the bad old Linux days, if you read a huge amount of data off the disk (like if you were doing a backup) Linux would try to take all that data and jam it in the page cache. This would push out all the useful stuff in your cache, and sometimes even cause swapping as Linux helpfully swaps out stuff to make room for more useless page cache.

One of the great things about `dd` is that you have a lot of control how the input and output files are opened. You can bypass the page cache when reading data by using iflag=direct, which would stop this from happening.

[+] bayindirh|3 years ago|reply
Moreover, flash drives (and all flash media) have a favorite page size, which is generally 4kB or 512kB. By defining a common denominator page size like 1024kB (with bs=1024kB), you can keep your flash drive happy with enough backlog to write, so it can perform at its peak write speed without churning, which will help with faster writes and lower write amplification, which is a win-win.
[+] cosmotic|3 years ago|reply
It's great to have control over this but I suspect most users never knew this was happening, had no idea dd could bypass that behavior, nor knew which argument to pass to dd to accomplish this. It's like saying 'what makes 3d printers so great is you can make anything!' but you'd be way better off with an industrially forged object than the 3d printed object.
[+] ChuckMcM|3 years ago|reply
Pretty much, and understanding what is going on "under the hood" as it were can be informative. Had the author done a 'cp myfile.foo /dev/sdb' on a UNIX system they would have found they now had a regular file named '/dev/sdb' with the contents of myfile.foo and and their sd card would have remained untouched. But you would only know that if you realized that cp would check to see if the file existed in the destination, unlink it[1], and then create a new file to copy into.

The subtlety of opening the destination file first, and then writing into it, was what made dd 'special' (and it would open things in RAW mode so there wasn't any translation going on for say, terminals) but that is lost on people. Bypassing the page cache and thus not killing directory and file operations for other users of the system is a level even below that. Only the few remaining who have done things "poorly" an incurred the wrath of the other users of the system sitting in the same room really get a good feel for that :-). Fortunately for nearly everybody these days they will never have to experience that social embarrassment. :-)

[1] Well unless you had noclobber set in which case it would error out.

[+] totetsu|3 years ago|reply
Still the bad old days for copying files from iOS to Linux. It seems to make a copy internally of everything you copy internally on the device before sending it, which leads to running out of free space just trying to copy things of :(
[+] a-dub|3 years ago|reply
amusingly this would also occur with writes as well.

there was some hueristic in there that tried to prevent it, but it wasn't very good.

[+] watersb|3 years ago|reply
In my experience, Windows NT (now just Windows) is very fond of its file cache and large copies can blow up into memory paging as well.

Early Windows NT was awful with this, pegging the system with a cascade of disk IO at unpredictable times, often for ten seconds or more.

Can anyone suggest ways to avoid blowing the file cache on Windows with large copies? Is this even a problem anymore?

[+] watersb|3 years ago|reply
On macOS, you can also use the `--nocache` flag for the `ditto` command.

Please keep in mind that `ditto` is a file copy and archive utility, not a block copy utility like `dd` (which is also available on macOS).

An online man page for ditto: https://ss64.com/osx/ditto.html

[+] guerrilla|3 years ago|reply
I think using a small bs also determines the size of the cache you use, as its the buffer.
[+] bushbaba|3 years ago|reply
I’d of thought dd always avoided page cache. what dd use case is that desired behavior?
[+] throwaway2048|3 years ago|reply
linux absolutely still does this FWIW, its one of the reasons that swap is a net negative.
[+] dredmorbius|3 years ago|reply
Useful uses of dd(1).

There remain some useful applications of dd. These may of course be achieved by other mechanisms, but typically less conveniently.

1. Read a specific number of blocks or bytes from a source:

  dd if=/dev/hda of=/root/mbr bs=512 count=1
This will make a copy of, say, your master boot record (first 512 bytes of your first disk drive) and stash it in your /root directory.

2. Read from specific bytes of a file

  dd if=mydata skip=1k bs=32 count=1
Reads 32 bytes after the first 1024 (1k) bytes of "mydata".

3. Write to specific bytes of a file

  dd if=source of=target seek=10k bs=512 count=1 conv=notrunc
That should write 512 bytes from "source" beginning 10k into "target". (I've not tested this, you should verify.)

4. Create a sparse file. Sparse files appear to have a nonzero size, but take up no space on disk, until data is actually written to them. These are often used as "inflating" dynamic filesystem images for virtual machines.

  dd if=/dev/zero of=sparsefile bs=1 count=0 seek=20000M # Create 20 GB sparse file
5. Case conversions. Sure, you could use tr(1), but where's the sport?

  dd if=MixEdCaSE of=lcase conv=lcase   # Convert to lower case
  dd if=MixEdCaSE of=ucase conv=ucase   # Convert to upper case
6. ASCII / EBCDIC conversions

  dd if=ebcdic of=ascii conv=ascii   # ebcdic -> ascii
  dd if=ascii of=ebcdic conv=ebcdic  # ascii -> ebcdic

When reading to or from IBM data tapes, you might find blocking / unblocking conversions useful. I've done this, but it's so long ago that I don't trust my memory on that any more. Odds are good you'll not have to worry about this.

There are other useful applications as well, though these are not typically encountered very often. Do feel free to explore and attempt these on safe media.

[+] photon-torpedo|3 years ago|reply
Careful, your #2 and #3 are incorrect -- skip and seek operate with blocks, not bytes. So your #2 would copy 32 bytes after the first 32kB of data, and #3 would write 512 bytes at position 5120k.
[+] zimpenfish|3 years ago|reply
> Read from specific bytes of a file

Especially handy when you've fed in a huge amount of JSON (sometimes all on one line because, y'know, why not) into jq and you get the inscrutable output:

    parse error: Invalid numeric literal at line 1, column 236162512
[+] Waterluvian|3 years ago|reply
Apologies. Tangent:

What does the (1) mean beside dd? I see this with man pages. Is it a version identifier?

Edit: thank you both for taking the time to share. I appreciate the quick response.

[+] rocqua|3 years ago|reply
For the first option a simple

    head -b 512
Will also copy the first 512 bytes in case you want to avoid dd for clarity. I have actually used that for moving mbrs around.
[+] cperciva|3 years ago|reply
create a sparse file

Note that this can also be done using truncate(1).

[+] yepguy|3 years ago|reply
My most common use for `dd` is using it with `sudo` to direct the output of a unprivileged pipeline to a root-owned file. Instead of running `echo hello >/root/test.txt`, which will fail, I use `echo hello | sudo dd of=/root/test.txt`.
[+] matja|3 years ago|reply
7. Write a new MBR to a disk, keeping the partition table:

    dd bs=440 count=1 if=/usr/lib/syslinux/bios/mbr.bin of=/dev/sda
Even in the age of EFI/GPT, that still gets used often (usually VM providers that only offer MBR boot).
[+] gnubison|3 years ago|reply
Fun fact: 1-4 don’t work in the context of short reads — and GNU’s fullblock extension isn’t specified in POSIX.
[+] mgerdts|3 years ago|reply
While there’s a lot of truth here, there are times when you can do much better than cat. A while back I tweeted:

Today I found the magical dd command that causes an NVMe drive to run at almost full speed:

  # dd if=/dev/nvme3n1 bs=4096k iflag=direct of=/dev/null status=progress
  959090524160 bytes (959 GB, 893 GiB) copied, 178 s, 5.4 GB/s
The trick to getting this throughput is telling Linux to do an insanely large IO (4 MiB). The drive can't do 4 MiB reads - the largest IO it can handle is 2 MiB.

  # nvme id-ctrl /dev/nvme3 | grep mdts
  mdts      : 9
  # echo '4 \* 2^9' | bc -l
  2048
More in the tread starting here:

https://twitter.com/OMGerdts/status/1514376206082269191?s=20...

[+] jagrsw|3 years ago|reply
My SSD - apparently one of the fastest on the market - KINGSTON SKC3000D2048G - rated 7GB/7GB read/write, after writting data to it, stopped being very fast. No idea, if it's something related to how nvme/ssd-s work, or maybe I have a broken unit.

  $ sudo dd if=/dev/nvme0n1 bs=4096k iflag=direct of=/dev/null status=progress
  ...
  9667870720 bytes (9,7 GB, 9,0 GiB) copied, 20,2714 s, 477 MB/s
But when reading from empty space (as in, non-written yet blocks), it goes at full PCIE x4 Gen4 speeds.

  $ sudo dd if=/dev/nvme0n1 bs=4096k iflag=direct of=/dev/null status=progress skip=450000
  ...
  16710107136 bytes (17 GB, 16 GiB) copied, 2,42802 s, 6,9 GB/s
I have another nvme drive - Force MP510 - and it doesn't care if data was previously written or not. When reading from it, I get ~full x4/Gen3 speeds of 3.5GB/s

PS: nvme smart-log shows 100% available spare, and 0% percentage_used, so it doesn't seem to be wear-related.

[+] trasz|3 years ago|reply
And on FreeBSD (and perhaps others) you can also specify the speed limit.

$ echo 'like this' | dd bs=1 speed=10

[+] naikrovek|3 years ago|reply
Oh thank you for submitting this to HN. I’ve been telling people not to use dd for years and everyone looks at me like I just gave birth to a full grown dinosaur or something.

“Well why does the entire internet say to use dd then?”

Because they copy from each other just like you copied from them. Just use cat.

[+] kazinator|3 years ago|reply
> Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices.

That's the thing; it isn't. Author forgot to explain (if he knows that at all) that /dev/sda2 on Linux is not a raw device. It's a block device.

So if dd is hailed as something to use on /dev/sda, that's not an example of being hailed for a raw device.

dd's capability to control the read/write size is needed for classic raw devices on Unix, which require transfers to follow certain sizes.

E.g if a classic Unix tape needs 512 byte blocks, but you do 1024 byte writes, you lose half the data; each write creates a block.

The raw/block terminology comes from Unix. You have a raw device and a block device representing the same device. The block device allows arbitrarily sized reads and writes, doing the re-blocking underneath. That overhead costs something, which you can avoid by using the raw device (and doing so correctly).

[+] trasz|3 years ago|reply
At least this used to be the case. Nowadays FreeBSD doesn’t implement block devices at all - there are only raw disk devices.
[+] licebmi__at__|3 years ago|reply
>cat /dev/cdrom > myfile.iso

Heh, I remember finding out this on the good old days when I was finding out how to rip cds and shitting bricks. I mean I read it on a BBS and thought "that can't be right"; I was expecting to find something like the Nero suite back in windows.

Much more recently, I enjoyed the same kind of amazement on the bash tcp pseudo devices.

[+] bcook|3 years ago|reply
I was much more confused by your post than I should've been because I overlooked that ">" is how HackerNews prefixes quoted text.
[+] MertsA|3 years ago|reply
I think this has a bit of bad advice for using cp or shell redirection to read / write to raw block devices but dd isn't necessarily the best either. Personally any time I'm trying to image some hard drive, damaged or otherwise, I'll just about always jump straight to ddrescue (not dd_rescue). It's similar to dd, surprise surprise, but it keeps a log of which parts of the input and output have been copied / had errors / skipped. Nothing is more annoying than waiting an hour for some large copy to make progress and then run into an error or get interrupted for whatever reason. Using ddrescue because it keeps a log of the status of the operation you can resume it with the same command and it will pick back up right where it left of instead of having to start all over. It's also intelligent enough to not fail the first time and skip over some bad region of the disk on error and come back and reattempt it using various strategies once it's already copied the low hanging fruit.

There's very little reason not to use it, even if it's just to get a nice progress view instead of just the current amount of data copied.

[+] krnlpnc|3 years ago|reply
Came here to mention 'ddrescue' as well.

It's been invaluable as an intuitive tool to recover data from failing disks/drives.

Never thought to use it as a day to day dd, but will give that a try. Thanks for the idea!

[+] pronoiac|3 years ago|reply
Also, ddrescue is in Ubuntu, as gddrescue.
[+] bee_rider|3 years ago|reply
Just a note -- despite the title, the article eventually presents a nuanced view and points out that "dd" has some uses.

> If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.

> dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.

> However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.

[+] lgeorget|3 years ago|reply
One thing dd does for me, that cp and cat do not, is that it forces me to read and check the command at least three times before pressing enter, which is a very good thing when messing with raw devices.

When I first learned that dd was not magical, I started using cp but I made some mistakes with partitions number and whatnot (nothing serious).

Maybe it's just the wierd syntax or the fact that I treat dd differently, I'm just more cautious and don't press enter automatically. Of course it's silly but to me at least, that's a good reason to keep using dd.

[+] cc101|3 years ago|reply
Back in the Dark Ages (1968) we had pre-written JCl scripts. I don't think many people knew JCL. We just appended a script to the front of our Fortran card decks. A DD card was the final card. After the frustrating work of finally getting the JCL right, the DD card was always just the "Do it Damn it" card in my mind.
[+] a1369209993|3 years ago|reply

  > dd if=/dev/sda | gzip > image.gz
This actually serves at least two approximately legitimate purposes: firstly, it ensures that reads from sda are aligned to a (at least nominal, ie 512-byte) disk block, which doesnt matter for normal, kernel-supported drives like IDE/SATA/most USB (which is almost certainly what sda is), but avoids bespoke devices (or their drivers) trying to do something clever when gzip asks for 1 or 7 or 17 bytes. (And writes to poorly-designed devices/drivers can be even worse.)

More importantly, like useless use of cat, it prevents gzip from trying to delete sda when it's done, which is something it will in fact do:

  $ echo test > /tmp/sda
  $ gzip /tmp/sda
  $ cat /tmp/sda
  cat: /tmp/sda: No such file or directory
(For gzip specificially, you can also prevent this by writing `gzip </tmp/sda`, but I've occasionally run into tools that try to 'intellegently' handle stdin file descriptors that point at 'real' files, so I feel better having a separate process blocking the way.)
[+] salmo|3 years ago|reply
It’s funny (given that the name likely is inspired by Useless Use of ‘cat’) that examples here have actual useless uses of ‘cat’. ‘cat file | pv > disk’ can just be ‘pv < file > disk’.

But whatever. I’m sure 90% of code stems from something someone read and copied or came most easily to them. It works.

‘dd’ was really useful for finicky media like tapes and doing EBCDIC translation. It’s still great when you combine bs and count. Blow away an MBR, make an xGB file, etc.

It’s a Swiss Army knife. It can do a lot, but isn’t the best tool for most things. I still love it. Probably just muscle memory.

[+] mkup|3 years ago|reply
In Linux, raw disk devices like /dev/sda1 are cached by kernel (unless opened with O_DIRECT flag).

In FreeBSD (and presumably other UNIX implementations) they aren't: https://docs.freebsd.org/en/books/arch-handbook/driverbasics...

So, in FreeBSD "dd if=/dev/ada0p1 of=/dev/null bs=1 count=1" will fail: disk driver will return EINVAL from read(2) because I/O size (1) is not divisible by physical sector size (usually 512). "cat" with buffer size X (which depends on the implementation) will either work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of a signal.

Summary: dd(1) still has its place and author of original article is getting it wrong.

[+] eggsome|3 years ago|reply
My best useless use of dd was in the Ubuntu 16.04 days.

I was at a satellite office with all windows PCs for the day, so used a live disk to get a decent environment to get things done. Only problem was that the DVD drive kept spinning down and every time I did something that was not cached it made me wait forever.

nohup + while loop + sleep 4s + raw dd read from CD for the win :)

EDIT: Reading this article it sounds like dd has no "special" ability to access the disk in a raw way. But surely that's what the nocache option is for...

[+] enasterosophes|3 years ago|reply
So I should stop using dd as a text editor, is that what you're trying to tell me?
[+] LeoPanthera|3 years ago|reply
Occasionally I run a dd on a loop with increasing block sizes to see what is actually fastest. I regularly see instructions on the web saying you should use "1M" or even "4M", but in my tests, smaller block sizes are often faster.

A few years ago, "128K" was usually the fastest choice. Today, on faster systems, "512K" has a slight edge.

I could not tell you why, though. Try it for yourself.

[+] GekkePrutser|3 years ago|reply
I understand his point, but using 'dd' allows you to set a buffer size which can make cloning a bunch faster. It also has great progress reporting (status=progress) which is really useful for the things dd is usually used for.

And even if you use it without it being needed, it's not a big deal. It doesn't add much overhead, if any.

[+] chronogram|3 years ago|reply
On the left side of a typical Linux hobbyist experience graph you probably have "this is a disk image file, and this is a disk, I'll just copy the disk image file onto the disk!", then you have a period of "I'll use the cool dd tool (without oflag=direct and/or sync because by this point you know people use it but you don't know why people use it) like I saw on the internet!", then when you understand that everything is a file you have "this is a disk image file, and this is a disk, I'll just copy the disk image file onto the disk!" again.

I personally suggest recommending people the Disks application included with Ubuntu Desktop and Fedora/Centos Workstation. It shows icons representing internal disks, SD cards or flash drives so they know what device they want to work with. If they want to take their time they can see all the information about the drives and partitions, they can start discovering and asking questions and reading up on how computers use disks right from there if they want to. And if they don't want to, it's just extra confirmations that it's the correct disk or DVD that they want to put their image onto. Then when they're sure about the device they can create a disk image and restore a disk image in that same application!

[+] klibertp|3 years ago|reply
gparted is also nice, although Disks seem to display more information by default.
[+] jws|3 years ago|reply
dd lets you specify the write block size. This is essential when writing to 9 track tape. Everything is a byte stream… except for the things which are not.
[+] naikrovek|3 years ago|reply
I don’t think the author is intending to tell everyone to always use something else, I think the author is trying to tell most people who use dd that there are easier ways to do what they are trying to do, and that things are files, which is something that a lot of people here seem to be forgetting.
[+] mark-r|3 years ago|reply
I had almost managed to completely forget about 9 track tapes. Now you've made my head hurt.