Chrome has transcended version numbers

[+] wladimir|15 years ago|reply

That upgrader using binary differences (courgette) is impressive. From 10 megabytes to 78 kilobytes. I wonder why Linux distributions such as Ubuntu still download the entire new packages on an upgrade. A lot of upgrade time and bandwidth could be saved by only sending the differences. And it would reduce load on the mirror sites.

Edit: did a bit of looking around and it seems to be planned for Oneric Ocelot

https://blueprints.launchpad.net/ubuntu/+spec/foundations-o-...

[+] nuclear_eclipse|15 years ago|reply

> I wonder why Linux distributions such as Ubuntu still download the entire new packages on an upgrade. A lot of upgrade time and bandwidth could be saved by only sending the differences. And it would reduce load on the mirror sites.

Speaking as someone who has worked on this problem for my own projects [1], I think I can answer this.

Fedora/Yum already supports downloading a binary diff between rpm packages to reduce download size, but this also requires keeping a cache of previous rpm's to run the patch against. There are multiple reasons why you can't rely on binary diffs against the files actually stored on the system, most namely for files like /etc/* that are more than likely modified since installation.

But the real problem with binary diffs is that unless you're doing what Google does to ensure that people stay up to date, the number of binaries you need to diff against grows very quickly, and there are a lot of edge cases to take care of.

For example, let's assume some package A has been released as version 1, 2, and 3. When A has a new release 4, you obviously want to build a diff against release 3, but then you also most likely need or want to build a diff against 2 and maybe even 1 to take care of people who haven't already upgraded to 3. And even if you build a diff against every single version ever released, you will still always need to provide a full version of the package as well for two cases:

1. New installations, or reinstallations, of the package.

2. When the user has cleared their package cache to save room.

And even beyond that, creating diffs involves a lot more effort and knowledge on the part of the packaging team because they not only need to know how to build those diffs, but they also need to keep track of old package versions to build those diffs against.

The end result is that you trade download bandwidth and time on part of the server and end users for a lot of effort, time, and storage space on part of the packagers and distro mirrors. For mirrors that are already encroaching on 50GB for a single release of Ubuntu and/or Fedora, adding a whole bunch of binary diff packages will most likely grow the repository size by at least 30-50%, if not more, depending on how many old versions you diff against.

The question then becomes: does this trade off actually make sense, or does it present further roadblocks for contribution from packagers and donated mirrors?

[1]: If you would like to see how I handled this sort of task, I have a Python library I wrote to handle the client side updating. I know it's not the entire piece of the puzzle because it doesn't cover generating the updates, but it might be useful for someone else. http://github.com/overwatchmod/combine

[+] cubicle67|15 years ago|reply

er yeah. someone want to show this to the guys at Apple who push out the XCode updates?

[+] cookiecaper|15 years ago|reply

For the record, pacman already supports diffs like this, but Arch has not set up official mirrors that host binary diffs instead of full packages. There has been at least third-party repository that hosted diffs for ArchLinux packages.

I too think this should be a much higher priority than it is for many. Fedora has had (non-default) support for this for years, but that's about it. You shouldn't worry so much about diffing against previous versions -- if you diff against the last two versions, it won't use much extra disk space, and the worst case scenario is that someone has to download the full package as a fallback, which everyone has to do now.

[+] scoopr|15 years ago|reply

Lately, I've been less bothered about the download sizes of updates, and more bothered by the update-install times. Takes under a minute to download the usual 100-1000MB update, but then it's 5-15min to install it, be it ubuntu, ps3 firmware or xcode4. Providing bigger stuff on readily usable squishfs images instead of tarballs, even if vastly bigger might actually make my update times shorter.

[+] vijaydev|15 years ago|reply

The launchpad link returns a 404.

[+] p4bl0|15 years ago|reply

Maybe people are looking wrong at Chrome version numbering. Take GNU Emacs for instance. At some point the developers realized that their software would never be the subject of a change in nature big enough to change the major version number, so they ditched it. Now we have Emacs 23 but it's actually Emacs 1.23, and nobody complains.

I think it's really a non-issue and it's not really worth talking about: Chrome just doesn't display the '1.' (or '0.' depending on your view point ^^) in front of its version number :-).

[+] dkersten|15 years ago|reply

Yeah, its more a release number. It makes sense to use a single positive integer and simply increment it every release.

Having said that, though, I quite like Semantic Versioning[1]. The advantage it has over a single incrementing counter is that you know when API compatibility changes.

[1] http://semver.org/

[+] stcredzero|15 years ago|reply

This is exactly the sort of visionary engineering needed to break the field into the next stage. This isn't just a quantitative difference, it's a revolutionary qualitative difference!

Our online infrastructure is broken in ways we're dimly aware of, because it has always been that way. In the same way that people trying to do business demand network, electric, and roadway infrastructure that once didn't exist, we will someday demand software infrastructure with features that do not exist today.

Chief among these will be security features. If Google plays their cards correctly, they can create an ecosystem that stays ahead of the black-hat hackers. By correctly incentivizing white-hat hackers, they could expose and patch security holes fast enough to ruin the economics of the black-hats. This infrastructure will enable Google to make more money, resulting in a virtuous cycle.

If the infrastructure can be extended to the server-side, with web app frameworks that receive security updates with equal rapidity, then Google can establish a secure, smoothly running "toll road" -- an infrastructure subset relatively free from problems faced by the rest of the net. That could be worth billions.

(We'll know this strategy is winning if/when Microsoft starts doing it too. Once that happens, we'll be in a new era of computing.)

[+] thebooktocome|15 years ago|reply

There's a bicycle shop in the area called "virtuous cycles" and this is the first time I've realized that they mean the antonym of "vicious cycle". I always thought they were just vaguely religious.

[+] Splines|15 years ago|reply

> if/when Microsoft starts doing it too

They already do it with Automatic Updates. Turn the update dial to 11 and let your machine apply them at night. I don't believe they provide binary diffs for updates, but I believe it's for logistical reasons rather than technological (e.g., title updates over XBL are surprisingly small).

Of course, MS also hasn't figured out how to update components in-place while they're being used, so expect your machine to be restarted in the morning. :-/

[+] masklinn|15 years ago|reply

> Somehow, we have to be able to automatically update software while it is running without interrupting the user at all. Not if -- but when -- the infinite version arrives, our users probably won't even know.

For what it's worth, this is already available in Erlang (although it was built in for different reasons, closer to getting the fluidity of web applications updates on just about any server software): two versions of the same code can live in parallel in the VM, and there are procedures for processes to update to "their" new version without having to restart anything (basically, you switch functions mid-flight and the next time an updated function is called the right way, the process just switches to the new code path).

You need follow a few procedures and may have to migrate some states, but by and large it's pretty impressive. And it could certainly be used for client-side software. The sole issue I'd see would be the updating of a main GUI window in-flight (how do you do that without closing and re-opening it?). But I doubt this one changes that much in e.g. chrome these days.

[+] jerf|15 years ago|reply

I've caught some flack on HN here recently for saying it's past time to move beyond C, but this is a great example from the real world. Yes, you can update C code live with the proper magic invocations, but you have to be a wizard and even then probably still a bit lucky for it all to work. Or you can build an infrastructure that at the most primitive layer contains this ability, and then create systems where the programmer is encouraged to maintain the invariants that permit this update, and it may still not be trivial and still requires thought but no longer requires a wizard, so maybe it will actually happen.

And there are just so many features like this that we need to get all the way down to the OS level before we can fully harness them, and we aren't going to get them in C. We need an upgrade of our fundamental programming primitives.

[+] jamii|15 years ago|reply

Following ideas from both erlang and android, each gui screen within the application would be attached to a single process. When a new screen is opened the process is killed and a new process starts. If new code has been loaded the new process will switch.

[+] omh|15 years ago|reply

There are disadvantages to constant, automatic updates.

I had a call from someone who'd been using Chrome to regularly print a web page, and one day it just stopped working. The site hadn't changed, but for whatever reason the latest version of Chrome just didn't render it. And of course trying to install an older version of Chrome was quite difficult.

(In Google's case they do now have a way to disable the updates, but not all software is so good about it)

[+] jalada|15 years ago|reply

As frustrating as it is, this is the sort of thing that results in massive corporations stuck on IE6.

[+] jinushaun|15 years ago|reply

I experienced this yesterday. My website renders differently now than it used to on Chrome last month. Firefox and IE still look "correct". I used to think the auto-update feature on Chrome was great, but now I'm not sure. I can see why some companies still stick to IE 6 internally. It's stable.

[+] wccrawford|15 years ago|reply

I stopped looking at Chrome's version numbers (unless I have a specific issue or question about Chrome) back around 9. That's because 9 was the last development version I used... The features I need are all in the stable release now. When 10 came out, my 9-dev turned into 10-stable and I didn't pay attention from there.

At this point, I don't even bother 'updating' (read: close the browser and open it again) for up to a week or 2 after an update comes out, unless I need to close my browser for some other reason.

[+] qjz|15 years ago|reply

Oh, how I wish I had this issue with Android! I'm currently locked at version 1.6...

[+] code_duck|15 years ago|reply

No doubt Google would be glad to develop and update Android in the same way... if we could only get the carriers to step aside.

[+] melling|15 years ago|reply

I run the Canary build so I get an update every day.

http://googlesystem.blogspot.com/2010/07/google-chrome-canar...

It's impressive how stable the nightly has been.

[+] chunky1994|15 years ago|reply

Is the canary build stable enough? It seems to have a lot of impressive features.

[+] slackerIII|15 years ago|reply

John Boyd would be proud. Everything else being equal, the team with the fastest OODA loop usually wins: http://en.wikipedia.org/wiki/OODA_loop

[+] lmarinho|15 years ago|reply

Apples App Store, for both Mac and iOS, could learn a thing or two from this, their software update experience is awful, requiring you to re-download whole multi-gigabyte apps sometimes for minimal updates.

[+] sthulbourn|15 years ago|reply

Not only should Apple do this for iOS apps, they should do it for Xcode. It's 5GB EVERY TIME, it's like a conspiracy...

[+] sthulbourn|15 years ago|reply

It would also be awesome if Apple could use this on iOS sync cycles.

[+] br1|15 years ago|reply

Microsoft actually went to great length to build an update mechanism that doesn't require reloading. It seems this is not so useful after all, and it's not being used: http://jpassing.com/2011/05/01/windows-hotpatching/

[+] kolektiv|15 years ago|reply

There are software systems which do get updated while running though, but perhaps it requires a change in software architecture more than just (very clever) diff tools. Erlang systems, for instance, can have the concept of hot code swapping baked in to them in a more predictable way because that requirement is part of the base system - application life cycle is built in to the platform, not on top of it. Of course, for systems such as telecoms switching, the complexity and cost of this was worthwhile. For browsers... perhaps not. Cost/Benefit analysis is probably the usual trusted friend. What would we hope to gain (and how would we measure it) by letting browsers never restart?

[+] ck2|15 years ago|reply

So how do you roll back with Chrome when it breaks a plugin for example?

I guess this means for ignorant users this is good but for power-users we are having more and more control taken away from us.

Personally I disable all of Chrome's phoning home because it's impolite and does it too many times per day and I have no easy way to verify exactly it's sending all those times.

[+] MikeKusold|15 years ago|reply

It seems as though Google is trying to eliminate this by supporting their own plugins. Flash has been shipping with Chrome for a while, and a PDF reader has been shipping for months.

Those two plugins have 90% of people's plugin needs covered.

[+] dchest|15 years ago|reply

http://i.imgur.com/RSNqg.png (a power user knows what to do)

[+] stcredzero|15 years ago|reply

I guess this means for ignorant users this is good but for power-users we are having more and more control taken away from us.

You can always use Firefox or some other browser instead.

Plugins are an architectural mistake. They should go away and the Chrome team is doing the right things to make it happen.

[+] Typhon|15 years ago|reply

Would they really stop at Chrome infinity ? I'm pretty sure they would make a version aleph one next. And so on.

[+] sehugg|15 years ago|reply

That's a great improvement over a generic binary diff. I remember Symantec was doing something similar for their AV definitions updates. In fact they got some patents: http://www.symantec.com/press/2001/n010207b.html

[+] arkitaip|15 years ago|reply

This is slightly offtopic but Wordpress built-in update feature only works in you have ftp on your server. If you've disabled FTP for security reason updating becomes a manual process. I wish the WP devs would use patch or some other, CLI friendly, solution.

[+] nbpoole|15 years ago|reply

"If you've disabled FTP for security reason updating becomes a manual process."

The automatic update process will also work if the webserver has write permissions on the files (which is a bad idea in the first place). But if it doesn't and you can't/won't give it FTP credentials for a user that does, you do need to go through a somewhat manual process.

Personally, I use vendor branching (http://svnbook.red-bean.com/en/1.1/ch07s05.html) in both SVN and Git in cases like this. I don't have to rely on the developers to generate a patch: I get all of the changes pulled directly from my local repository.

[+] dave1010uk|15 years ago|reply

Agreed. I normally find the list of changed files from a blog post, download the .tar.gz and add them to my local git repo manually. I guess I could get the current SVN tag but doubt that would be simple with my setup.

[+] griffbrad|15 years ago|reply

If you install the ssh2 PECL extension, the updater will also allow you to update over SSH/SFTP.

[+] evangineer|15 years ago|reply

Made a similar observation yesterday. Only times that the Chrome version has mattered in my recent experience have been with regards to the recent WebGL security hole and with Native Client.

[+] kfool|15 years ago|reply

Here is how I see things:

1. Updates should not only be applied in sequence.

It is better to produce a binary diff between any two versions, and apply only that (one) binary diff. The reason for this isn't efficiency, but semantics. Updates not only fix things, but break things. Meaning, updates corrupt application state (data), both in-memory and on-disk. It can be disastrous to apply an intermediate update that removes state, only to realize that a future version reversed the semantics and needs to use that state (which was available, but is now gone).

Peserving backward compatibility is important, which means the ability to skip some version updates is necessary. To the extent possible, reversing updates is important too.

2. The ideal update system should apply updates live, not offline.

With a model that accounts for updating the entire state of an application, updating live is possible. The reason most updates are not applied live yet is that the model is not descriptive enough to change the entire state of the running application.

Notable state that should be updated, but often isn't, is continuations and the stack. This is why GUI applications need to be shut down to update.

Scheme's call/cc (call-with-current-continuation) solved making changes to continuations and stack state decades ago better than Erlang. Erlang cannot force stacks unroll or continue from arbitrary points.

3. Updates must be produced with source code and programmer input.

Updates should not be produced with binaries as input.

The reason is the need to account for application semantics, which binaries do not expose in the detail source code does. Although automated, sophisticated semantic-diffing based on control-flow can be developed, it is sometimes inconclusive whether an update will break things.

4. It is necessary for programmers to provide live update guidance.

In the cases where producing provably safe dynamic updates is not possible, it is input from the programmer that can clear any conservatism of the safety certification process.

Tools are needed for programmers to reason about the semantic safety of their live updates, integrated in the development process. Including tools that help transform application state between versions.

[+] cstrouse|15 years ago|reply

I'm a big fan of their frequent updating even if the version bumps do get out of hand. Thanks Google for continuous improvements and updates!

[+] fendrak|15 years ago|reply

Being a software developer sometimes feels like an especially thankless position -- if you're doing your job well, users never think of you.

86 comments