Search: .lenght - Github

[+] theli0nheart|14 years ago|reply

I remember seeing a Github bot a couple weeks ago that strips out whitespace and adds a .gitignore file to a repo (I also remember this really rubbing some people the wrong way). This search indicates that it would probably be useful to have a linter bot running on Github for all the popular languages. It would find syntax errors, common mispellings, and compilation issues, and then submit pull requests to fix the issues.

I have no time to work on something like this myself, but I'm sure a lot of people would find it useful, especially if it acted as a "first defense" before deployment. Curious what other HN'ers think about this.

[+] Mizza|14 years ago|reply

I wrote that bot!

https://github.com/Miserlou/WhitespaceBot

Feel free to fork it to do whatever you want, that's why I made it.

[+] tux1968|14 years ago|reply

Perhaps it would be more productive to advocate the use of local pre-commit hooks. Git makes it very easy to configure validation locally long before anything gets sent to Github.

Would be nice if Github provided better documentation and a selection of validation templates to include in new projects. This would better leverage the power of Git and its distributed nature than a bot running on Github.

[+] andrewcamel|14 years ago|reply

I actually just replied with a suggestion that Github should implement some built-in functionality for simple error-checking. I think it's a great idea. It would be really helpful if you got a simple little list of notifications for a commit indicating possible error points.

[+] zeratul|14 years ago|reply

I thought this is actually interesting but I would like to know if >4k misspellings is a lot or not. Here is one way to do it:

    LANGUAGE #LENGHT #LENGTH = #LENGHT/#LENGTH  
    JavaScript 4252 2907459 = 0.0015
    C 18981 2902857 = 0.0065
    Java 7706 2348900 = 0.0033
    Ruby 10789 1690604 = 0.0064
    C++ 9458 1315552 = 0.0072
    PHP 3116 1167924 = 0.0027
    C# 1352 937647 = 0.0014
    Python 3662 737292 = 0.0050
    Ruby 1232 380484 = 0.0032
    Perl 1239 258892 = 0.0048
    Objective-C 679 238051 = 0.0029

P.S. There is something wrong with Github's language breakdown algorithm, sometimes it shows same language twice with a different number of hits.

[+] jond3k|14 years ago|reply

I noticed another problem is that the highlighter will select the language name as well as the term which means <?php and (c) Copyright are shown instead of the actual mistake.

I put together a GitHub Illiteracy Index script https://github.com/jond3k/sandbox/tree/master/github-illiter... which you can play around with if you like :D

[+] koenigdavidmj|14 years ago|reply

C# I can understand being so low, since it's almost always written in Visual Studio or MonoDevelop (both of which provide autocompletion). But how is JavaScript the next lowest?

[+] cleaver|14 years ago|reply

Odd that compiled languages (C, C++, Java) are higher than some interpreted languages (PHP, Javascript). Of course, the search will match comments as well as code, so it may just mean they have better comments.

Also fun to search on "functino".

[+] rorrr|14 years ago|reply

    LANGUAGE   #LENGHT    #LENGTH    = #LENGHT/#LENGTH  
    C#            1352     937647    = 0.0014   <- best
    JavaScript    4252    2907459    = 0.0015
    PHP           3116    1167924    = 0.0027
    Objective-C    679     238051    = 0.0029
    Ruby          1232     380484    = 0.0032
    Java          7706    2348900    = 0.0033
    Perl          1239     258892    = 0.0048
    Python        3662     737292    = 0.0050
    Ruby         10789    1690604    = 0.0064
    C            18981    2902857    = 0.0065
    C++           9458    1315552    = 0.0072   <- worst

[+] josegonzalez|14 years ago|reply

For the record, Github's search index is wayyy out of date sometimes. The second user here is me and I deleted that user like two years ago: http://cl.ly/0y271f0T3G0X2J1L022E

[+] eik3_de|14 years ago|reply

Same here, I contacted support two times in two years and they said "we're working on it". Obviously, that isn't true and they just don't care about the outdated search index.

[+] jond3k|14 years ago|reply

  #  Language    Illiteracy
  1  C           0.02877583  
  2  Perl        0.01635618  
  3  Ruby        0.01560477  
  4  JavaScript  0.01330989  
  5  Shell       0.01235425  
  6  Python      0.01046104  
  7  PHP         0.00910218  
  8  Java        0.00736395

(For height, length and hierarchy, averaged out)

And you thought this would end up being a PHP joke...

https://github.com/jond3k/sandbox/tree/master/github-illiter...

[+] xcud|14 years ago|reply

'wtf' is a good search term when coming into contact with a new codebase; https://github.com/search?type=Code&language=JavaScript&...

[+] timdorr|14 years ago|reply

"hieght" is also a good one: https://github.com/search?type=Code&language=JavaScript&...

[+] alpb|14 years ago|reply

not an attribute of a standard object type, though.

[+] angrycoder|14 years ago|reply

Even the search has a bug. The query is for ".lenght" but many of the highlighted results are just lenght without the dot.

[+] cpr|14 years ago|reply

Prob a reg exp so matches any char...

[+] southern|14 years ago|reply

A common typo, it seems. But I'm a bit confused as to why this was submitted.

[+] amirhhz|14 years ago|reply

In JavaScript, if you check for a non-existent property on a variable (e.g. aVar.lenght vs aVar.length) it will return "undefined". So people often rely on this behaviour to check if something is an array or not (no comment on whether this is good or bad), with:

    if(somethingThatMightBeAnArray.length){
        // do things with array
    }

So misspelling of length can be making a lot of code out there behave in an unexpected way.

[+] kaffeinecoma|14 years ago|reply

In a static language this would be flagged as an error. I assume something less than ideal happens in languages such as Ruby.

I once worked at a company where a very early piece of code had a typo "properites" instead of "properties". This misspelling became institutionalized, and was used throughout the codebase because it was deemed too expensive to fix. And this was with a static language (with good IDE refactoring support)!

[+] strictfp|14 years ago|reply

I had this problem as a junior dev when my english was weaker. The problem stems from that 'height' is spelled with 'ht', but width with 'th'. Since one often write those words in conjunction, it is easy to mix the endings up. If you're then a non-native speaker and don't run spellcheck on your code, you might end up writing 'lenght' and 'heigth' quite a few times, I know I did :)

[+] billpg|14 years ago|reply

My experience is more with languages that are typically compiled and would report this error as an error fairly early on, so the coder would correct it long before checking the code in.

What's the trade-off by having "undefined" returned instead of having an error reported as soon as the code is loaded?

[+] nostrademons|14 years ago|reply

It prevents you from later defining a 'lenght' method and using it at runtime without a recompile.

For core methods like 'length', it seems silly to think that you'd want to redefine it. And indeed, it's usually counterproductive - that's why any experienced JavaScript dev will have coding conventions like "Don't muck with the prototypes of built-in objects."

But at the application layer, this can be really useful. Imagine you're adding a new field to a message deep in the storage system, and then you want to pass that along to a template in the rendered HTML. It's really useful to be able to do this without recompiling & restarting each individual server between the backend and the frontend, and just edit a few template files and have them automatically pick up any changes to backend data formats.

Ditto adding a new database column, if you're using an RDBMS - it's pretty handy to have your model objects instantly reflect the new field, instead of needing to manually add accessors to each of your model classes. Rails and Django are built on this principle.

Also, you have a versioning problem with statically-compiled code in a distributed system. Imagine that you add this new 'lenght' field to a backend message, and add it to the frontend, and they both compile & deploy. Now imagine that a message from an old backend hits a new frontend (it's not possible to upgrade a whole distributed system at once without downtime). What does the new frontend do with it? It needs a piece of data, but the backend had no idea that it had to provide that piece of data. The only thing it can do is return the equivalent of 'undefined'.

In C++/Java code, you usually deal with these by inventing frameworks. Google code, for example, is littered with

  if (msg.has_new_field()) {
    run_long_complicated_ui_display_routine(msg.new_field());
  } else {
    fall_back_to_old_behavior(msg.old_field());
  }

checks. If you use a more dynamic language like Python, you can use language mechanisms to represent undefined values or fields that are defined at runtime. If you use a static language, you're stuck mimicking them with hashmaps and null.

[+] aardvark179|14 years ago|reply

Whether your language is compiled is not the issue, it's how you model objects and calling methods on them. In smalltalk and other languages that take a message passing approach doing a.b() sends a message "b" to object a, and the object can do anything it likes with that.

Now the normal (and optimized) route is to find the method on a’s method table and then call that, but if a doesn't have that method then a second method may be called to allow this to be handled. Once you have that sort of mechanism you can make ORM libraries that dynamically examine a schematic at run time and generate accessor methods only as they are needed, decorators, proxies and many other patterns become wonderfully simple, and there are often many more opportunities for meta-programming at run time.

The downside is of course that it becomes harder to find errors when writing or compiling, but tight integration of your development environment with your runtime can help with this.

[+] joblessjunkie|14 years ago|reply

It should be possible to build a bot that automatically generates patches and pull requests for these kinds of typos.

[+] j_baker|14 years ago|reply

Equally scary to me is "UFT8".

https://github.com/search?langOverride=&language=&q=...

[+] gren|14 years ago|reply

What about this one: https://github.com/search?type=Code&language=JavaScript&...

[+] veyron|14 years ago|reply

Someone wrote a spellchecker a while ago using perl spellchecker: http://blog.holdenkarau.com/2011/08/automatic-spelling-corre...

[+] eik3_de|14 years ago|reply

105395 results for heigth, now beat that ;)

[+] flexd|14 years ago|reply

And this is why we have testing frameworks.

[+] azth|14 years ago|reply

... and compiled languages. Testing won't ensure 100% code coverage.

[+] wahnfrieden|14 years ago|reply

And static analysis.

[+] davidmccann|14 years ago|reply

Recieve. Has to be my number one pet pieve.

https://github.com/search?type=Code&language=JavaScript&...

[+] mrchess|14 years ago|reply

This reminds me of a US company I worked with that outsourced some of their service layer work to a company with heavy European influence. As a result, API methods also had the spelling of certain words eg. getColour() or getFavourites(). Good times.

[+] gus_massa|14 years ago|reply

In the LaTex editor that I'm using (WinEdt), I have a custom color highlighting that marks \rigth and \heigth in red+bold+strikeout, so I don't have to wait to compile and see a strange error to spot the mistake.

[+] andrewcamel|14 years ago|reply

It'd be great if Github would scan your code for errors like these and just let you know they exist (in case you didn't want them to, which I would assume you wouldn't for the most part).

[+] unknown|14 years ago|reply

[deleted]

102 comments