The user-agent has two implied pieces of functionality:
1) Describe the device that the agent is coming from (operating system)
2) Describe the capabilities of the agent (this browser, those plugins)
One of the things I loathe about the user agent header is the lack of reasonable maximum length, and the inconsistent way in which developers have overloaded the value. Parsing it is difficult (especially given that the length means there is a lot of scope for bad input).
I would love to see user agent be a virtual header comprised of other headers.
The other headers would not be mandatory, but as most browsers would provide them you could reasonably use them in most cases.
These other headers may be things like:
os: Windows
os-version: 7
client: Gecko
client-version: 16
plugins: [{'flash':11}]
Basically... same info but more structure with known acceptable types for certain values.
Headers taking uncompressed space it would also be helpful if shorthand names were accepted: c-v for client-version, etc.
This is me thinking aloud, and perhaps it's an idea that has been thought of before and rejected... but by offering User-Agent as a virtual header that is comprised of all of the other headers you maintain some background compatibility whilst providing something easier to parse, use and trust for developers.
The problem with "fixing" the user-agent string is that making it easier to pass/use only means web-developers will find it easier to continue to abuse it.
In an ideal world web-developers should be testing if individual pieces of functionality exist rather than inferring what is supported based on the browser.
I think JS does a fairly good job of allowing developers to test for functionality, unfortunately CSS does not. I am well aware that it is meant to "fail gracefully" but to a lot of developers they want to supply alternative looks where functionality isn't available and CSS doesn't lend itself to that.
So you wind up inferring CSS support from JS support which is just as broken as inferring JS support from the browser's version/name/platform.
The user agent string is too loaded with backwards compatibility to remove or change. So the next best thing to do is supersede it - add a new agent-id or some such which is mandated by standard to be in the form "BrowserName/Version", e.g. "Chrome/22.0" or "Firefox/15.0.1", while keeping the old user-agent. Problem is I guess it's not really worth it - it doesn't expose any new information not already in the user agent, and it doesn't stop site authors relying on specific agent-ids. So I guess the way forward is try to ignore the user agent completely and just use feature detection.
I wonder why browsers with a modern automatic-update process don't set their user-agent to something that discards all this madness ("Chrome/23.4.5678 (Windows)", or similar) for the cutting-edge/nightly builds only (or even betas, if they wanted to discourage casual users from switching to them, but I don't think that's the case at this point). Surely their users have signed up for a little breakage in exchange for the latest features? And if they actually get website operators to stop or at least fix their sniffing, the whole prisoners-dilemma situation would disappear.
(I guess this assumes that the huge user-agent that my Chrome is currently sending is necessarily bad, and in the real world maybe no one really cares...)
When we actually do this, it does not necessarily convince publishers to fix things. For example: for several months Mozilla has been testing one tiny reduction to the User-Agent string in Firefox Nightly builds (replacing "Gecko/20100101" with "Gecko/16.0"). Zillow.com is the highest-profile site that is broken by this change, and after five months they still haven't even responded to any of our attempts to contact them: http://bugzil.la/754680
It's much better to resist adding things to the UA in the first place, since removing anything later on is a huge pain and inevitably breaks things for users. Mozilla has managed to keep the UA relatively minimal (and successfully reduced it a bit in Firefox 4): https://developer.mozilla.org/en/Gecko_user_agent_string_ref...
You're implying that if nightly builds of a browser with a simplified UA broke a website that the website owners would fix their code, but that is unlikely to happen. Most websites, particularly the sort with bad UA sniffing, have a high cost to change (engineering, QA, making releases) and no incentive ("it broke on the new Chrome, probably a Chrome bug").
Even a relatively flexible company like Google gets UA sniffing wrong for many of its domains. At one point (as an author of Chrome and an employee of Google) I tried to track down the right people to get things fixed and ran into more or less the above problems. (The non-Chrome non-Safari webkit browsers these days must spoof Chrome to not fall into some "other" browser bucket.)
1. I think the most interesting thing about that blog post is that it illustrates how the incentives in standards building get warped. I like to describe this sort of thing as "the effect of economics on programming" - not because there is money involved, but because of the nature of the incentives.
2. Graceful degradation. We've sniffed UA's from the minute they were invented. Any change whatsoever would create untold problems for untold millions of people. The UA is just an arbitrary string so… who cares? Very few people (you and I are amongst these "very few") have to be concerned with this compared to the people such a change would affect.
It's because of 1 and 2 (my second point is really an instance of the first) that we're stuck with Javascript. No one in their right mind thinks it's a good language, but getting all the different browser vendors to adopt a good bytecode would be nightmarish (and not necessarily in the interest of every browser vendor).
Imagine the problems that would cause! You have Chrome 58 Beta, and stuff works one way. Then they say it's good and release Chrome 58 final, and all of a sudden, stuff changes all over the web.
UA string is just one example of unfortunate hacks that evolved in the web protocols. Compared to probably everything else in HTML it's probably just not even worth it to consider fixing it. We'll always need the old string for compatibility, so it's really only to save a few lines of parsing. Compared to the nightmare of parsing rules for HTTP and HTML, it's not even relevant.
Not only the user agent, either. Try javascript `navigator.appName` in any browser, and you'll get "netscape". `navigator.appCodeName` in most browsers returns "mozilla".
Mike Taylor gave a talk about this and more at yesterday's GothamJS conference:
I wanted to try making an HTTP request from Telnet the other day. I tried Wikipedia, using the Host header. I got a 403 for not including a user agent, so I tried again with User-Agent: Telnet and it worked!
It's one of the most important headers for clients, since if you don't include it you might not get a 200.
In the particular case of Wikipedia, I think they check User-Agent to prevent people from unthinkingly wasting gigabytes of bandwidth scraping Wikipedia via tools like wget. In Wikipedia's case, better ways exist to download large quantities of their content in a more usable form.
I return a 403 if User-Agent or Host headers are missing. And my firewall will lock you out completely if you use "User-agent" instead of "User-Agent" (among many other obvious giveaways in the User-Agent header).
This ugly quagmire makes me wary of compatibility fixes where mimicking another browser is somehow involved. When i heard about non-WebKit browsers adopting -webkit CSS vendor prefixes, the user agent string mess was the first thing that came to mind.
The problem with the user agent is that you can't fix it without repeating the same cycle. All you'd do is make it easier.
It's a good depiction of the issues you have with trying to write code once, and have it work the same in many different environments, though. It's just with browsers, rather than operating systems or hardware.
[+] [-] y0ghur7_xxx|13 years ago|reply
And then JavaScript driven feature detection came to be, and everyone thought it was a good idea. And the people wrung their hands and wept
[+] [-] tomjen3|13 years ago|reply
[+] [-] TazeTSchnitzel|13 years ago|reply
[+] [-] buro9|13 years ago|reply
1) Describe the device that the agent is coming from (operating system)
2) Describe the capabilities of the agent (this browser, those plugins)
One of the things I loathe about the user agent header is the lack of reasonable maximum length, and the inconsistent way in which developers have overloaded the value. Parsing it is difficult (especially given that the length means there is a lot of scope for bad input).
I would love to see user agent be a virtual header comprised of other headers.
The other headers would not be mandatory, but as most browsers would provide them you could reasonably use them in most cases.
These other headers may be things like:
Basically... same info but more structure with known acceptable types for certain values.Headers taking uncompressed space it would also be helpful if shorthand names were accepted: c-v for client-version, etc.
This is me thinking aloud, and perhaps it's an idea that has been thought of before and rejected... but by offering User-Agent as a virtual header that is comprised of all of the other headers you maintain some background compatibility whilst providing something easier to parse, use and trust for developers.
[+] [-] UnoriginalGuy|13 years ago|reply
In an ideal world web-developers should be testing if individual pieces of functionality exist rather than inferring what is supported based on the browser.
I think JS does a fairly good job of allowing developers to test for functionality, unfortunately CSS does not. I am well aware that it is meant to "fail gracefully" but to a lot of developers they want to supply alternative looks where functionality isn't available and CSS doesn't lend itself to that.
So you wind up inferring CSS support from JS support which is just as broken as inferring JS support from the browser's version/name/platform.
[+] [-] AshleysBrain|13 years ago|reply
[+] [-] riffraff|13 years ago|reply
[+] [-] decklin|13 years ago|reply
(I guess this assumes that the huge user-agent that my Chrome is currently sending is necessarily bad, and in the real world maybe no one really cares...)
[+] [-] mbrubeck|13 years ago|reply
It's much better to resist adding things to the UA in the first place, since removing anything later on is a huge pain and inevitably breaks things for users. Mozilla has managed to keep the UA relatively minimal (and successfully reduced it a bit in Firefox 4): https://developer.mozilla.org/en/Gecko_user_agent_string_ref...
[+] [-] evmar|13 years ago|reply
The two instances of UA spoofing I know of in Chrome are for large sites -- hotmail and Yahoo mail. My vague memory of the hotmail case is that Microsoft agreed to fix their code but said it'd take months to make the push. (http://neugierig.org/software/chromium/notes/2009/02/user-ag... , http://neugierig.org/software/chromium/notes/2009/02/user-ag... )
Even a relatively flexible company like Google gets UA sniffing wrong for many of its domains. At one point (as an author of Chrome and an employee of Google) I tried to track down the right people to get things fixed and ran into more or less the above problems. (The non-Chrome non-Safari webkit browsers these days must spoof Chrome to not fall into some "other" browser bucket.)
[+] [-] phillmv|13 years ago|reply
2. Graceful degradation. We've sniffed UA's from the minute they were invented. Any change whatsoever would create untold problems for untold millions of people. The UA is just an arbitrary string so… who cares? Very few people (you and I are amongst these "very few") have to be concerned with this compared to the people such a change would affect.
It's because of 1 and 2 (my second point is really an instance of the first) that we're stuck with Javascript. No one in their right mind thinks it's a good language, but getting all the different browser vendors to adopt a good bytecode would be nightmarish (and not necessarily in the interest of every browser vendor).
[+] [-] MichaelGG|13 years ago|reply
UA string is just one example of unfortunate hacks that evolved in the web protocols. Compared to probably everything else in HTML it's probably just not even worth it to consider fixing it. We'll always need the old string for compatibility, so it's really only to save a few lines of parsing. Compared to the nightmare of parsing rules for HTTP and HTML, it's not even relevant.
[+] [-] mnutt|13 years ago|reply
Mike Taylor gave a talk about this and more at yesterday's GothamJS conference:
http://miketaylr.com/pres/gothamjs/shower/
[+] [-] benatkin|13 years ago|reply
It's one of the most important headers for clients, since if you don't include it you might not get a 200.
[+] [-] JoshTriplett|13 years ago|reply
[+] [-] jackalope|13 years ago|reply
[+] [-] JBiserkov|13 years ago|reply
All this could have been avoided if Webmasters used <noframes>, but I'm not sure when it was added to HTML.
[+] [-] yuhong|13 years ago|reply
[+] [-] joshtynjala|13 years ago|reply
[+] [-] FuzzyDunlop|13 years ago|reply
It's a good depiction of the issues you have with trying to write code once, and have it work the same in many different environments, though. It's just with browsers, rather than operating systems or hardware.
Such is the evolution of the internet.
[+] [-] w0utert|13 years ago|reply
[+] [-] cvursache|13 years ago|reply
[+] [-] cvursache|13 years ago|reply