I love this blog, but I'm not sure that I agree with this:
> In my mind, I’d just rather not have a reject! at all, and callers who need to mutate an array in-place can figure out how to do safely with respect to their own use cases.
We ought to have a standard version of the reject! code precisely because it's so difficult to get right. Saying application developers should solve the problem themselves just sounds like some sort of political move: 'sure they may get the edge cases wrong, but then it will be their fault not ours'.
One of the best things about Ruby is that having helper methods for every kind of possible operation frees application developers from re-solving these edge-case problems and allows them to just write their business logic.
I think the argument here is to embrace the immutability of arrays in Ruby. The reason why this is called `reject!` rather than `reject` is because the designers of Ruby wanted you to favor returning new objects rather than mutating objects in place. Sure, you might need to do it from time-to-time and I'm really happy we have methods that enable such behavior, but this brings to mind the old adage "just because you can, doesn't mean you should". The author might have been trying to say in that quote that app developers should find other ways of solving whatever problem they have without resorting to mutating objects in place. I don't think we should go as far as to not include such methods in Ruby if everyone finds them useful from time-to-time, but I also think we should be staying away from mutating objects in place, rather than returning new versions of said objects and using that.
Edit: I know this sounds like I am trying to dismiss the problem. That's not my point. I'm trying to say the problem is compounded by the choice to bundle the solutions to several different problems. If people have criticisms I would love to read a reply. </Edit>
> We ought to have a standard version of the reject! code precisely because it's so difficult to get right.
No, it's easy to get right in any specific case. It's iterating over a dang list, it's the first thing you learn how to do as a programmer. It's only hard to get right if you're trying to solve it for every use case with a single interface.
Essentially: you are taking two different algorithms and trying two switch between them behind a single interface. Meaty if-statements behind flags are an indication this is happening. It's cool of you want that but put it in a library and give it a GitHub per.
An example: I don't use a library to merge attributes on objects. I cut and paste one of a few snippets or write from scratch. Sometimes you want to mutate an object, sometimes you want a copy, sometimes you want to shallow copy or keep a reference to the parent. I could write a nasty complicated function that intelligently solved "the problem". But all that does is bundle together a bunch of different solutions to different problems and make it hard for me to understand which one it picked.
Edit 2: I could be persuaded that what I'm saying is just not Rubyish. I'm a JavaScript programmer (after many years of Ruby) so maybe there's some truth in that. I do think that distancing ourselves from the mechanics of list manipulation can be bad for our code quality.
Yeah, I like ruby (and especially rails) because it has stuff like this. I also like that most languages have sort() as part of the stdlib. If you need something specific, sometimes they come with more than one sort, or it would be easy enough to write yourself, but I appreciate and expect a high-level language to include the basics so I don't need to keep data structures and algorithms books within reach.
Rails, especially rails 5 with API-only mode is particularly great as well.
The problem with `reject!` is that it's too ad-hoc: the more fundamental operation is to filter the elements of a stream. The input stream could be generated from any data structure (not necessarily an array), and the output stream could similarly be redirected into any data structure (not necessarily an array, let alone physically the same array).
After years working with Ruby, I felt that "bang" methods should only be used for methods which mutate the receiver. Many methods which do do so do not have a "!" at the end.
I remember coming across it before when the left-pad incident happened, and marvelling at the fact there were already many entries there then. The problem is now, whenever a new entry on this blog gets linked, I find myself compelled to go back and read all the entries in it again. I can't help thinking that over time, that process is going to become prohibitively slow.
One of the more confusing parts of the C++ standard library algorithms is remove/remove_if. A new user would think calling remove(v.begin(), v.end(), value_to_remove) would result in the value being removed from the container[1]. Instead, they'll find that all it did was shift around the ordering of the vector and leave the elements you wanted removed in place at the end (if you read the linked article you can already see where this is going) which is a very unintuitive result. They might look up the function signature and notice it returns an iterator which is also surprising.
Even after discovering they have to call the container's erase method using the result of remove and an iterator to the end of the container[2] they'd probably just decide that remove just has a terrible design. It takes experience and some data structures knowledge to realize why it is designed the way it is.
The design of remove means every element that remains are only be shifted (in the case of an array/vector), at most, one time. The naive approach results in the problem described in the article where you are constantly shifting the elements forward as you progress through the container.
The usability of remove is still a problem (the concept shouldn't need a wikipedia page explaining it) but the way it was done was done for good reasons (within the design decisions of STL at least). It's probably fixed in the C++ ranges proposal but I haven't looked.
D's standard algorithm library remove[3] (which is very much based on Stepanov's STL too) goes one performance step further and lets you specify the swap strategy so that if stability isn't a requirement the number of shifts required is the absolute minimum possible (essentially moving elements from the end of the range into spots freed up by the items being removed).
1. Slightly more experienced users would quickly notice that it can't remove the value, it has no reference to the container to change its length.
My homegrown C dynamic array has remove() macro with the same purpose (I didn't know it is in c++). Except that it takes range instead of index and doesn't move to the end, because you checkout/delete item first, and then remove, not vice versa. It also has insert function that does guess what. The module takes around one screen.
Cool thing is that it is not C++, where remove is hardcore stl magic, and I can just iterate over elements' memory and remove them in sliding ranges, so multi-remove problem simply doesn't exist. Idk why c++ overengineers everything. We abstract a generic vector away from our convenience only to to store items in regular array, because it is the only fast solution for generic cases.
This is one of those things that makes perfect sense after you actually tried to
implement it, but not necessarily beforehand. I think this happens a lot with
C++ that design decision are quite well though through, but it is hard to
appreciate this without taking time to understand the rationale.
Same reason ruby has both "if" and "Unless", or capybara has .to_not as well as .not_to... there are times when the inverse expression is just more natural and less confusing.
Rubocop, on the other hand against what I've just said, will tell you in almost every case that you might use Unless, or in agreement with what I think it is that You are saying here, the preferred idiom is to not use the inverse, and you should only say "unless" when what you meant to say is unambiguously !if.
I've been using Ruby (in the context of Rails apps) for almost six months now, after a decade or so in mostly Python. Ruby is just "messy" in my opinion, and having the separate methods Array#select and Array#reject is just a single symptom of that.
Another example is that something in Rails adds Array#fourth. I can't imagine a circumstance where using my_array.fourth is clearer than my_array[3].
I think that only half avoids it, or shifts the wrong side of the big O to opposite of wherever it is now. select! also mutates in place.
But I'm not totally sure about that, after reading the Rubydoc for Array[1], only reject! has the warning about mutating the array every time the block is called, select! does not have any such warning. You might be right.
I can't agree with the conclusion. Sure complexity is complex, but the solution here would seem to be better performance analysis of the standard library rather than blindly avoiding risk. How do you know where to stop? All built in block-accepting methods for enumerator types would seem to be equally risky in this analysis.
Without much knowledge at all, I'd agree with the conclusion that "block-accepting methods for enumerator types would seem to be equally risky". It sounds to me like having non-pure functions that can terminate at any point. And that sounds like asking for trouble in your code.
That's what languages with immutable data structures do. Internally they implement simple data structures (lists, arrays) with more complex ones not to get a performance hit. For example, they don't want to copy every byte every time they add or remove from a list.
In the case of Ruby, it's a language build on mutability so let's use mutable data. There are already plenty of languages with immutable data to use if we want to.
I find it weird that the case is called a "bug" multiple times. Sure, if its o(n^2), that's a problem, but if the code works correctly at the end, it's not a bug.
O(n^2) algorithms are insidious in that they will work fine on test data, but will ruin performance when there's an unusually large input in production. They're landmines in your code.
Unfortunately, it's not that simple. It's really depends on context.
There are cases where accidental quadratic complexity would definitively be considered a bug. If a fleet of GitHub servers were burning cycles due to a quadratic reject!, I'm pretty sure the engineers would classify the misbehavior as such.
If the function was implemented as an infinite loop, would you consider the code working correctly as well? After all, the result after execution of the function is always correct (ex falso quodlibet).
Programs without specification have only surprising behaviour no bugs. As far
as I can see "reject!" does not specify complexity, so in this sense it would
be completely fine. Additionally it claims to change the array instantly every
time the block is called, which doesn't seem to give any other options than
making it at least quadratic (presuming flat representation for arrays). This
feels quite overspecified as accessing the array from within the block seems
quite unusual case to cater for.
Though, this behaviour was changed after all which leaves me wondering how much
Ruby people care about backward compatibility.
A peeve I have with Ruby is that methods in the standard library don't have defined big-O complexities. If they did, I don't think reject! would have been specified to be O(n²), and so it would be a bug.
[+] [-] drodgers|9 years ago|reply
> In my mind, I’d just rather not have a reject! at all, and callers who need to mutate an array in-place can figure out how to do safely with respect to their own use cases.
We ought to have a standard version of the reject! code precisely because it's so difficult to get right. Saying application developers should solve the problem themselves just sounds like some sort of political move: 'sure they may get the edge cases wrong, but then it will be their fault not ours'.
One of the best things about Ruby is that having helper methods for every kind of possible operation frees application developers from re-solving these edge-case problems and allows them to just write their business logic.
[+] [-] tomphoolery|9 years ago|reply
[+] [-] erikpukinskis|9 years ago|reply
> We ought to have a standard version of the reject! code precisely because it's so difficult to get right.
No, it's easy to get right in any specific case. It's iterating over a dang list, it's the first thing you learn how to do as a programmer. It's only hard to get right if you're trying to solve it for every use case with a single interface.
Essentially: you are taking two different algorithms and trying two switch between them behind a single interface. Meaty if-statements behind flags are an indication this is happening. It's cool of you want that but put it in a library and give it a GitHub per.
An example: I don't use a library to merge attributes on objects. I cut and paste one of a few snippets or write from scratch. Sometimes you want to mutate an object, sometimes you want a copy, sometimes you want to shallow copy or keep a reference to the parent. I could write a nasty complicated function that intelligently solved "the problem". But all that does is bundle together a bunch of different solutions to different problems and make it hard for me to understand which one it picked.
Edit 2: I could be persuaded that what I'm saying is just not Rubyish. I'm a JavaScript programmer (after many years of Ruby) so maybe there's some truth in that. I do think that distancing ourselves from the mechanics of list manipulation can be bad for our code quality.
[+] [-] leereeves|9 years ago|reply
Either implementation will be a poor choice for some use cases (hence the bug reports about both).
[+] [-] seanp2k2|9 years ago|reply
Rails, especially rails 5 with API-only mode is particularly great as well.
[+] [-] catnaroek|9 years ago|reply
[+] [-] pmarreck|9 years ago|reply
After years working with Ruby, I felt that "bang" methods should only be used for methods which mutate the receiver. Many methods which do do so do not have a "!" at the end.
To that end I created a little library called "Bang All The Things" :) https://github.com/pmarreck/ruby-snippets/blob/master/bang_a...
I'm convinced that this simplification helps prevent bugs, at the minor cost of changing some conventions possibly borrowed from other languages.
I have since (mostly) moved on to Elixir which does not have the "mutability problem" at all.
[+] [-] steveklabnik|9 years ago|reply
I mentioned this upthread too, but it's a bit broader than that. Consider http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod... vs http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod... for example. The ! version raises an exception rather than return nil.
[+] [-] vinhboy|9 years ago|reply
`delete` always gets me.
[+] [-] ouid|9 years ago|reply
[+] [-] jameshart|9 years ago|reply
[+] [-] emfree|9 years ago|reply
[+] [-] inopinatus|9 years ago|reply
[+] [-] thunderbong|9 years ago|reply
[+] [-] eco|9 years ago|reply
Even after discovering they have to call the container's erase method using the result of remove and an iterator to the end of the container[2] they'd probably just decide that remove just has a terrible design. It takes experience and some data structures knowledge to realize why it is designed the way it is.
The design of remove means every element that remains are only be shifted (in the case of an array/vector), at most, one time. The naive approach results in the problem described in the article where you are constantly shifting the elements forward as you progress through the container.
The usability of remove is still a problem (the concept shouldn't need a wikipedia page explaining it) but the way it was done was done for good reasons (within the design decisions of STL at least). It's probably fixed in the C++ ranges proposal but I haven't looked.
D's standard algorithm library remove[3] (which is very much based on Stepanov's STL too) goes one performance step further and lets you specify the swap strategy so that if stability isn't a requirement the number of shifts required is the absolute minimum possible (essentially moving elements from the end of the range into spots freed up by the items being removed).
1. Slightly more experienced users would quickly notice that it can't remove the value, it has no reference to the container to change its length.
2. So common and unintuitive it gets its own wikipedia page: https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom
3. http://dlang.org/phobos/std_algorithm_mutation.html#.remove
[+] [-] wruza|9 years ago|reply
Cool thing is that it is not C++, where remove is hardcore stl magic, and I can just iterate over elements' memory and remove them in sliding ranges, so multi-remove problem simply doesn't exist. Idk why c++ overengineers everything. We abstract a generic vector away from our convenience only to to store items in regular array, because it is the only fast solution for generic cases.
[+] [-] wuch|9 years ago|reply
[+] [-] sriharis|9 years ago|reply
Haskell's Data.List doesn't seem to even have a remove/reject. Clojure's remove is simply a complement of filter.
[+] [-] CGamesPlay|9 years ago|reply
[+] [-] yebyen|9 years ago|reply
Rubocop, on the other hand against what I've just said, will tell you in almost every case that you might use Unless, or in agreement with what I think it is that You are saying here, the preferred idiom is to not use the inverse, and you should only say "unless" when what you meant to say is unambiguously !if.
IOW sometimes you really mean reject
[+] [-] LyndsySimon|9 years ago|reply
I've been using Ruby (in the context of Rails apps) for almost six months now, after a decade or so in mostly Python. Ruby is just "messy" in my opinion, and having the separate methods Array#select and Array#reject is just a single symptom of that.
Another example is that something in Rails adds Array#fourth. I can't imagine a circumstance where using my_array.fourth is clearer than my_array[3].
[+] [-] Dirlewanger|9 years ago|reply
[+] [-] yebyen|9 years ago|reply
But I'm not totally sure about that, after reading the Rubydoc for Array[1], only reject! has the warning about mutating the array every time the block is called, select! does not have any such warning. You might be right.
[1]: https://ruby-doc.org/core-2.2.0/Array.html#method-i-reject-2...
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] skywhopper|9 years ago|reply
[+] [-] rocqua|9 years ago|reply
I like my safeties in languages though.
[+] [-] tzwm|9 years ago|reply
[+] [-] pmontra|9 years ago|reply
In ascending order of complexity
http://theerlangelist.blogspot.com/2013/05/working-with-immu...
http://concurrencyfreaks.blogspot.com/2013/10/immutable-data...
http://debasishg.blogspot.com/2010/05/grokking-functional-da...
In the case of Ruby, it's a language build on mutability so let's use mutable data. There are already plenty of languages with immutable data to use if we want to.
[+] [-] danaliv|9 years ago|reply
[+] [-] jerianasmith|9 years ago|reply
[+] [-] pjfitzgibbons|9 years ago|reply
[+] [-] gkya|9 years ago|reply
[+] [-] pdw|9 years ago|reply
[+] [-] perfmode|9 years ago|reply
There are cases where accidental quadratic complexity would definitively be considered a bug. If a fleet of GitHub servers were burning cycles due to a quadratic reject!, I'm pretty sure the engineers would classify the misbehavior as such.
[+] [-] ouid|9 years ago|reply
[+] [-] baddox|9 years ago|reply
[+] [-] nikic|9 years ago|reply
[+] [-] wuch|9 years ago|reply
Though, this behaviour was changed after all which leaves me wondering how much Ruby people care about backward compatibility.
[+] [-] parenthephobia|9 years ago|reply
A peeve I have with Ruby is that methods in the standard library don't have defined big-O complexities. If they did, I don't think reject! would have been specified to be O(n²), and so it would be a bug.
[+] [-] mkonecny|9 years ago|reply
[+] [-] choward|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] throwaway1579|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]