top | item 13691303

Ruby `reject!`

306 points| dmit | 9 years ago |accidentallyquadratic.tumblr.com | reply

141 comments

order
[+] drodgers|9 years ago|reply
I love this blog, but I'm not sure that I agree with this:

> In my mind, I’d just rather not have a reject! at all, and callers who need to mutate an array in-place can figure out how to do safely with respect to their own use cases.

We ought to have a standard version of the reject! code precisely because it's so difficult to get right. Saying application developers should solve the problem themselves just sounds like some sort of political move: 'sure they may get the edge cases wrong, but then it will be their fault not ours'.

One of the best things about Ruby is that having helper methods for every kind of possible operation frees application developers from re-solving these edge-case problems and allows them to just write their business logic.

[+] tomphoolery|9 years ago|reply
I think the argument here is to embrace the immutability of arrays in Ruby. The reason why this is called `reject!` rather than `reject` is because the designers of Ruby wanted you to favor returning new objects rather than mutating objects in place. Sure, you might need to do it from time-to-time and I'm really happy we have methods that enable such behavior, but this brings to mind the old adage "just because you can, doesn't mean you should". The author might have been trying to say in that quote that app developers should find other ways of solving whatever problem they have without resorting to mutating objects in place. I don't think we should go as far as to not include such methods in Ruby if everyone finds them useful from time-to-time, but I also think we should be staying away from mutating objects in place, rather than returning new versions of said objects and using that.
[+] erikpukinskis|9 years ago|reply
Edit: I know this sounds like I am trying to dismiss the problem. That's not my point. I'm trying to say the problem is compounded by the choice to bundle the solutions to several different problems. If people have criticisms I would love to read a reply. </Edit>

> We ought to have a standard version of the reject! code precisely because it's so difficult to get right.

No, it's easy to get right in any specific case. It's iterating over a dang list, it's the first thing you learn how to do as a programmer. It's only hard to get right if you're trying to solve it for every use case with a single interface.

Essentially: you are taking two different algorithms and trying two switch between them behind a single interface. Meaty if-statements behind flags are an indication this is happening. It's cool of you want that but put it in a library and give it a GitHub per.

An example: I don't use a library to merge attributes on objects. I cut and paste one of a few snippets or write from scratch. Sometimes you want to mutate an object, sometimes you want a copy, sometimes you want to shallow copy or keep a reference to the parent. I could write a nasty complicated function that intelligently solved "the problem". But all that does is bundle together a bunch of different solutions to different problems and make it hard for me to understand which one it picked.

Edit 2: I could be persuaded that what I'm saying is just not Rubyish. I'm a JavaScript programmer (after many years of Ruby) so maybe there's some truth in that. I do think that distancing ourselves from the mechanics of list manipulation can be bad for our code quality.

[+] leereeves|9 years ago|reply
Rather than an edge case, I'd call this a case of trying to satisfy two incompatible goals: fast and internally consistent after each deletion.

Either implementation will be a poor choice for some use cases (hence the bug reports about both).

[+] seanp2k2|9 years ago|reply
Yeah, I like ruby (and especially rails) because it has stuff like this. I also like that most languages have sort() as part of the stdlib. If you need something specific, sometimes they come with more than one sort, or it would be easy enough to write yourself, but I appreciate and expect a high-level language to include the basics so I don't need to keep data structures and algorithms books within reach.

Rails, especially rails 5 with API-only mode is particularly great as well.

[+] catnaroek|9 years ago|reply
The problem with `reject!` is that it's too ad-hoc: the more fundamental operation is to filter the elements of a stream. The input stream could be generated from any data structure (not necessarily an array), and the output stream could similarly be redirected into any data structure (not necessarily an array, let alone physically the same array).
[+] pmarreck|9 years ago|reply
Shameless plug:

After years working with Ruby, I felt that "bang" methods should only be used for methods which mutate the receiver. Many methods which do do so do not have a "!" at the end.

To that end I created a little library called "Bang All The Things" :) https://github.com/pmarreck/ruby-snippets/blob/master/bang_a...

I'm convinced that this simplification helps prevent bugs, at the minor cost of changing some conventions possibly borrowed from other languages.

I have since (mostly) moved on to Elixir which does not have the "mutability problem" at all.

[+] vinhboy|9 years ago|reply
Agreed. For whatever reason, my mind really gravitate towards the idea of bang being a mutator. That should have been a standard.

`delete` always gets me.

[+] ouid|9 years ago|reply
I think that perhaps the most interesting part of this article is that it comes from a blog called 'accidentallyquadratic' with more than one entry.
[+] jameshart|9 years ago|reply
I remember coming across it before when the left-pad incident happened, and marvelling at the fact there were already many entries there then. The problem is now, whenever a new entry on this blog gets linked, I find myself compelled to go back and read all the entries in it again. I can't help thinking that over time, that process is going to become prohibitively slow.
[+] inopinatus|9 years ago|reply
Looking forward to sister blog "Inadvertently Exponential" and the satirical "Maliciously Factorial".
[+] thunderbong|9 years ago|reply
Yes! And posts dealing with quadratic behavior in languages!
[+] eco|9 years ago|reply
One of the more confusing parts of the C++ standard library algorithms is remove/remove_if. A new user would think calling remove(v.begin(), v.end(), value_to_remove) would result in the value being removed from the container[1]. Instead, they'll find that all it did was shift around the ordering of the vector and leave the elements you wanted removed in place at the end (if you read the linked article you can already see where this is going) which is a very unintuitive result. They might look up the function signature and notice it returns an iterator which is also surprising.

Even after discovering they have to call the container's erase method using the result of remove and an iterator to the end of the container[2] they'd probably just decide that remove just has a terrible design. It takes experience and some data structures knowledge to realize why it is designed the way it is.

The design of remove means every element that remains are only be shifted (in the case of an array/vector), at most, one time. The naive approach results in the problem described in the article where you are constantly shifting the elements forward as you progress through the container.

The usability of remove is still a problem (the concept shouldn't need a wikipedia page explaining it) but the way it was done was done for good reasons (within the design decisions of STL at least). It's probably fixed in the C++ ranges proposal but I haven't looked.

D's standard algorithm library remove[3] (which is very much based on Stepanov's STL too) goes one performance step further and lets you specify the swap strategy so that if stability isn't a requirement the number of shifts required is the absolute minimum possible (essentially moving elements from the end of the range into spots freed up by the items being removed).

1. Slightly more experienced users would quickly notice that it can't remove the value, it has no reference to the container to change its length.

2. So common and unintuitive it gets its own wikipedia page: https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom

3. http://dlang.org/phobos/std_algorithm_mutation.html#.remove

[+] wruza|9 years ago|reply
My homegrown C dynamic array has remove() macro with the same purpose (I didn't know it is in c++). Except that it takes range instead of index and doesn't move to the end, because you checkout/delete item first, and then remove, not vice versa. It also has insert function that does guess what. The module takes around one screen.

Cool thing is that it is not C++, where remove is hardcore stl magic, and I can just iterate over elements' memory and remove them in sliding ranges, so multi-remove problem simply doesn't exist. Idk why c++ overengineers everything. We abstract a generic vector away from our convenience only to to store items in regular array, because it is the only fast solution for generic cases.

[+] wuch|9 years ago|reply
This is one of those things that makes perfect sense after you actually tried to implement it, but not necessarily beforehand. I think this happens a lot with C++ that design decision are quite well though through, but it is hard to appreciate this without taking time to understand the rationale.
[+] sriharis|9 years ago|reply
Why is `reject!` even a separate, and different implementation than `select!`? Perf reasons?

Haskell's Data.List doesn't seem to even have a remove/reject. Clojure's remove is simply a complement of filter.

[+] CGamesPlay|9 years ago|reply
They are identical (inverted) implementations as of ruby 2.3, so I think it was just an oversight.
[+] yebyen|9 years ago|reply
Same reason ruby has both "if" and "Unless", or capybara has .to_not as well as .not_to... there are times when the inverse expression is just more natural and less confusing.

Rubocop, on the other hand against what I've just said, will tell you in almost every case that you might use Unless, or in agreement with what I think it is that You are saying here, the preferred idiom is to not use the inverse, and you should only say "unless" when what you meant to say is unambiguously !if.

IOW sometimes you really mean reject

[+] LyndsySimon|9 years ago|reply
Because it's Ruby.

I've been using Ruby (in the context of Rails apps) for almost six months now, after a decade or so in mostly Python. Ruby is just "messy" in my opinion, and having the separate methods Array#select and Array#reject is just a single symptom of that.

Another example is that something in Rails adds Array#fourth. I can't imagine a circumstance where using my_array.fourth is clearer than my_array[3].

[+] Dirlewanger|9 years ago|reply
Avoiding this should be as simple as using the inverse, `select!`, and then flipping whatever the check is in the block, right?
[+] yebyen|9 years ago|reply
I think that only half avoids it, or shifts the wrong side of the big O to opposite of wherever it is now. select! also mutates in place.

But I'm not totally sure about that, after reading the Rubydoc for Array[1], only reject! has the warning about mutating the array every time the block is called, select! does not have any such warning. You might be right.

[1]: https://ruby-doc.org/core-2.2.0/Array.html#method-i-reject-2...

[+] skywhopper|9 years ago|reply
I can't agree with the conclusion. Sure complexity is complex, but the solution here would seem to be better performance analysis of the standard library rather than blindly avoiding risk. How do you know where to stop? All built in block-accepting methods for enumerator types would seem to be equally risky in this analysis.
[+] rocqua|9 years ago|reply
Without much knowledge at all, I'd agree with the conclusion that "block-accepting methods for enumerator types would seem to be equally risky". It sounds to me like having non-pure functions that can terminate at any point. And that sounds like asking for trouble in your code.

I like my safeties in languages though.

[+] tzwm|9 years ago|reply
So what is the right way to do the same thing? Use `select` and get new array included elements we want to keep instead of using `reject!`?
[+] pmontra|9 years ago|reply
That's what languages with immutable data structures do. Internally they implement simple data structures (lists, arrays) with more complex ones not to get a performance hit. For example, they don't want to copy every byte every time they add or remove from a list.

In ascending order of complexity

http://theerlangelist.blogspot.com/2013/05/working-with-immu...

http://concurrencyfreaks.blogspot.com/2013/10/immutable-data...

http://debasishg.blogspot.com/2010/05/grokking-functional-da...

In the case of Ruby, it's a language build on mutability so let's use mutable data. There are already plenty of languages with immutable data to use if we want to.

[+] danaliv|9 years ago|reply
There's a non-mutating version of `reject!` called `reject`. (Likewise there's a mutating version of `select` called `select!`.)
[+] jerianasmith|9 years ago|reply
Quite a comprehensive Post ! What makes Ruby block an essential feature of language is it's power and elegance.
[+] pjfitzgibbons|9 years ago|reply
I think worth clarifying, the fix on svn r49255 is in git tag v2_3_1, so if you're on ~2.3.1 Then you're back to linear #reject.
[+] gkya|9 years ago|reply
I find it weird that the case is called a "bug" multiple times. Sure, if its o(n^2), that's a problem, but if the code works correctly at the end, it's not a bug.
[+] pdw|9 years ago|reply
O(n^2) algorithms are insidious in that they will work fine on test data, but will ruin performance when there's an unusually large input in production. They're landmines in your code.
[+] perfmode|9 years ago|reply
Unfortunately, it's not that simple. It's really depends on context.

There are cases where accidental quadratic complexity would definitively be considered a bug. If a fleet of GitHub servers were burning cycles due to a quadratic reject!, I'm pretty sure the engineers would classify the misbehavior as such.

[+] ouid|9 years ago|reply
Yes it is? Asymptotic complexity is a well defined property of an algorithm. It is an output, and it doesn't match the expected output.
[+] baddox|9 years ago|reply
Would a sleep(100000) at the top of the method implementation not count as a bug either?
[+] nikic|9 years ago|reply
If the function was implemented as an infinite loop, would you consider the code working correctly as well? After all, the result after execution of the function is always correct (ex falso quodlibet).
[+] wuch|9 years ago|reply
Programs without specification have only surprising behaviour no bugs. As far as I can see "reject!" does not specify complexity, so in this sense it would be completely fine. Additionally it claims to change the array instantly every time the block is called, which doesn't seem to give any other options than making it at least quadratic (presuming flat representation for arrays). This feels quite overspecified as accessing the array from within the block seems quite unusual case to cater for.

Though, this behaviour was changed after all which leaves me wondering how much Ruby people care about backward compatibility.

[+] parenthephobia|9 years ago|reply
It's not a bug in the method, but it should be.

A peeve I have with Ruby is that methods in the standard library don't have defined big-O complexities. If they did, I don't think reject! would have been specified to be O(n²), and so it would be a bug.

[+] mkonecny|9 years ago|reply
If it can take down your webserver due to unexpected CPU overload, then I would consider it bug
[+] choward|9 years ago|reply
If it's n^2 it may never return in your lifetime. Does the code work? Nobody knows. It this situation you may consider it a bug.