Show HN: An interactive guide to compression basics

[+] vladdanilov|8 years ago|reply

I find it funny that GIFs are still being referred as GIFs [1] while most places serve them as h264 yuv420p MP4s [2].

[1] https://i.giphy.com/l0Iy1ZcHArR9aAQta.gif (2.7MB)

[2] https://i.giphy.com/media/l0Iy1ZcHArR9aAQta/giphy.mp4 (307KB)

[3] https://media.giphy.com/media/l1J3OGcUiw8NeXuM0/source.mp4 (146KB) <-- optimized with ffmpeg

[+] pmoriarty|8 years ago|reply

It reminds me of how viral images are referred to as memes, often by people who are ignorant of the original and fuller meaning of the term as a viral idea (as opposed to and deliberately reminiscent of the view of a gene being a viral biological entity, as in "The Selfish Gene" by Richard Dawkins, who coined the term "meme" itself).

Of course, the term "viral" is itself used metaphorically here, and is also used as an analogy to a biological entity. But in this case, I think more people are aware of the existence of viruses than they are of the original meaning of the term "meme", though they might not make the conscious connection between a "viral" idea, video, or image and that of a biological virus or how it spreads "virally".

[+] userbinator|8 years ago|reply

...because MP4phy doesn't have quite the same ring to it?

[+] unwttng|8 years ago|reply

I caved and added a "toggle pointless gifs" button, history will judge me well

[+] dfc|8 years ago|reply

I actually came here to commend the author for letting me opt out of the nonsense. It may not seem like a big deal for you but the images are distracting to me, require more scrolling in order to reference something that was mentioned previously and do not add anything to your content.

BTW I think your content is great and I imagine you think so to. You commented on the fact that most of the comments were about the GIFs and not the content. I think that should be a not so subtle clue about the value of some lady waving in front of a green screen.

[+] jnbiche|8 years ago|reply

Too late. I finally added media.giphy.com to my UBlock filter. This one put me over the edge, since I couldn't focus on the content.

However, otherwise an excellent introductory article. Got me reading about Shannon, Kolmogorov, and information theory now.

[+] komali2|8 years ago|reply

I highly recommend Stephen King's "On Writing." The basic lesson - if it isn't needed, take it out. Kill your babies.

The GIFs are the "your babies" of this article.

[+] striking|8 years ago|reply

Thank you. Normally I'd enjoy the "pointless gifs" but they make it hard for me to read your article at work.

[+] akud|8 years ago|reply

This is great content! Do you mind if I ask how you made it? Specifically the interactive demo? I've been working myself on a project[1] to explore tools for interactive demos, but so far I've just done one with a toy piece of content.

1: https://akud.github.io/visualization-blogs/posts/01_content_...

[+] kirkules|8 years ago|reply

I was able to actually read the article because of this. Thanks.

[+] bennyelv|8 years ago|reply

I found this quite hard to read despite the interesting content, mainly due to the animated gifs inserted throughout the article. It's very hard to focus on a line of text when there's an image darting around on the page. I wonder why the author decided to include them?

[+] speps|8 years ago|reply

Same here, they're useless and look very unprofessional. Keep only the ones that are actually useful.

[+] epicide|8 years ago|reply

Quick solution: add the element to your adblocker. Most let you quickly select elements on the page.

Make sure to get the actual container (might need to do the image first) so you aren't left with giant gaps.

[+] Ballantara|8 years ago|reply

The fun images may have seemed like a way to lighten things up, but here they're a distraction from the content.. even as still images, every section break doesn't need a happy monkey or Poison Ivy.

[+] michaelmcmillan|8 years ago|reply

Gary Bernhardt has a great live screencast where he writes a compressor - for anyone that found this hard to follow: https://www.youtube.com/watch?v=3Eu9ZVZEZ3I

[+] joshschreuder|8 years ago|reply

Awesome, I love Gary's videos, and this is one I haven't seen. Thanks!

[+] AdmiralAsshat|8 years ago|reply

Lose the gifs, and you have a very good article. I think they distract, rather than enhance.

[+] ythl|8 years ago|reply

Or at least a way to toggle them all off at the beginning

[+] unwttng|8 years ago|reply

OP here - I'm seeing 50/50 support for the gifs. I'm keepin my gifs. I like the idea of adding, and will probably implement, a toggle for all extraneous gifs. I love that most of the commentary about this article is about the gifs. Gifs.

[+] taco_emoji|8 years ago|reply

I wouldn't mind the GIFs if they were A. smaller (most of them take up a good 2/3rds of the height of my viewport at 1600x900) and B. could be paused. They're humorous at first, but then they're distracting as I'm trying to read the stuff around them.

[+] bennyelv|8 years ago|reply

Personally I didn't mind the content of the gifs, it was the fact that they made it difficult to keep track of where you were in the text around them as I tried to read it.

Perhaps some people are less susceptible to this than others.

[+] unknown|8 years ago|reply

[deleted]

[+] mdevere|8 years ago|reply

the gifs are funny. hacker news is just a bunch of anoraks.

[+] nxrabl|8 years ago|reply

The pixel art doesn't show up for me in Firefox or Edge. (Looks like they're there, just with a height of 0px?) Also, my motion-sensitive lizard brain thanks you for the gif toggle button.

[+] unwttng|8 years ago|reply

On it, just me being a shitty web dev :P

[+] unwttng|8 years ago|reply

Fixed at least on Firefox, please enjoy your pixelly goodness

[+] wdfx|8 years ago|reply

Unless I mis-read or mis-understood you seem to invert the meaning of compression ratio half way through ?

> (100 / 200) = 0.5. Protip: compression ratios less than 1 are frowned upon.

> Unfortunately, it's not that simple. Say we had an algorithm (let's call it A) that, given any input whatsoever, was capable of achieving a compression ratio of strictly less than 1.

[+] unwttng|8 years ago|reply

I did indeed, thanks for the catch - fix incoming

[+] taco_emoji|8 years ago|reply

Yeah came here to point this out, that's definitely an error.

[+] userbinator|8 years ago|reply

Once you understand RLE, LZ is only one step away --- instead of repetitions of individual characters/bytes/etc., you encode repetitions of longer strings.

But starting with RLE is IMHO definitely a good choice --- far better than Huffman, as a lot of introductory material seems to do. A minimal LZ12/4 (4KB window, 18B max length; an old favourite of the demoscene intro packers) compressor/decompressor pair is literally a few hour's worth of work, and yields surprisingly good compression for its simplicity, much better than simple order-0 Huffman.

[+] deadlylazer|8 years ago|reply

This is nice! One minor error - 本 does not mean a tree, but a book or a root. 木 would be appropriate for the word 'tree'.

[+] tw1010|8 years ago|reply

Contrary to the rest of the comments here, I enjoyed the gifs.

[+] hisem|8 years ago|reply

Same here.

[+] karolg|8 years ago|reply

I'm probably getting old because I can't stand this new trend of putting useless gifs (or mp4's actually) every two paragraphs in every article on the web. Oh, and of course all of them have autoplay enabled because sole purpose of my laptop's fan is to happily spin at full speed. </random rant>

[+] namank|8 years ago|reply

Awesome! Thanks!

Any chance you want to do one on compression of a integer time series? How about variable length integers? Such an article would be very appreciated in IoT circles since data (timed voltage values) transfer and storage can get quite expensive for dollars and latency.

Cheers!

[+] jalayir|8 years ago|reply

The first example (tree represented in Japanese) seemed a bit misleading, because the "alphabet" has not been kept as a constant. Since the Japanese alphabet is much larger, it may be argued that the number of bits actually occupied in storage by "本" and "tree" are about the same. Could someone clarify if this is correct reasoning?

[+] wdfx|8 years ago|reply

One could argue that in information theory terms, there is more information encoded in a single "本" than a single "T".

However, this article is dealing with the concept of compression in terms of a simple symbolic representation of data.

[+] unwttng|8 years ago|reply

Certainly, and I deliberately didn't get into bytes and encoding until after this - I was trying to get across the softer idea that in terms of space-on-a-page-using-a-pen, you've saved.

[+] jedimastert|8 years ago|reply

I really like the article. One point if like to see: you stepped over the fact that your using palettes without mentioning even though that gets you down from using 3 (or 4) bytes a pixel to 1/4. It's a compression ratio of 8 that your completely ignoring!

[+] taco_emoji|8 years ago|reply

> This might eek out a little more compression

It's "eke": http://www.dictionary.com/browse/eke

[+] unwttng|8 years ago|reply

Woah, TIL

[+] alexkadis|8 years ago|reply

This is a fantastic explanation. Thanks! (Also, I love the gifs)

[+] ythn|8 years ago|reply

Add a way to toggle the gifs off at the beginning for people that don't like memes distracting them from the content and you've got yourself a winner.

[+] unwttng|8 years ago|reply

Done and done

[+] djhworld|8 years ago|reply

Great article, would be interested in more in depth articles on say, JPEG compression in the future.

Really liked the better/worse thing too, added some nice comparison

[+] rcanepa|8 years ago|reply

I came here to ask the same. Can anyone recommend more in depth articles about compression?

[+] komali2|8 years ago|reply

Fantastic article. I really enjoyed the inclusion of a live, clickable demo with example output. Absolutely well done (after I disabled the GIFs).

66 comments