top | item 8686934

Composer – Disable GC when computing deps and refs

300 points| Damin0u | 11 years ago |github.com

119 comments

order

Seldaek|11 years ago

For those looking for a technical explanation, the PHP garbage collector in this case is probably wasting a ton of CPU cycles trying to collect thousands of objects (a LOT of objects are created to represent all the inter-package rules when solving dependencies) during the solving process. It keeps trying and trying as objects are allocated and it can not collect anything but still has to check them all every time it triggers.

Disabling GC just kills the advanced GC but leaves the basic reference counting approach to freeing memory, so Composer can keep trucking without using much more memory as the GC wasn't really collecting anything. The memory reduction many people report is rather due to some other improvements we have made yesterday.

As to why the problem went unnoticed for so long, it seems that the GC is not able to be observed by profilers, so whenever we looked at profiles to improve things we obviously did not spot the issue. In most cases though this isn't an issue and I would NOT recommend everyone disables GC on their project :) GC is very useful in many cases especially long running workers, but the Composer solver falls out of the use cases it's made for.

sillysaurus3|11 years ago

As to why the problem went unnoticed for so long, it seems that the GC is not able to be observed by profilers, so whenever we looked at profiles to improve things we obviously did not spot the issue.

That sounds like a bug in the profiler, not with Composer. Observing internal time is pretty important for any profiler.

icehawk219|11 years ago

Out of curiosity what tools do you use for profiling and finding these sorts of things? Plain old xdebug, xhprof, or other things? I'm going to have to jump into debugging a fairly large Symfony application within the next couple months and am on the look out for good tools to help me along.

munificent|11 years ago

> It keeps trying and trying as objects are allocated

Worse. GC gets triggered just by assigning references, allocation isn't even needed.

tomp|11 years ago

I had the same issue in Python recently. The project runs as a server that loads a huge amount of objects from the database, and could use as much as 10GB memory! Python's reference counting works great, but every so often, the full-heap-scanning cycle collector would run, and it took quite a lot of time to scan a mutli-GB heap.

We noticed the issue happened most often when deserializing objects (loading them from Redis to memory). As it turns out, Python would schedule a collection every time the object_created counter was sufficiently higher than object_destroyed counter. In general, this makes sense, because that way you can be sure that objects are being created and not being freed, which most likely means a resource leak or a reference cycle. However, the same thing happens during deserialization - many new objects are created, and none are freed. Coupled with Python's low threshold (700), GC was triggered many many times in every serialization loop (usually in vain, as no new objects became recyclable). Disabling GC and running full collections manually solved the problem

Gigablah|11 years ago

Looks like someone disabled garbage collection on that comment thread as well :)

mahouse|11 years ago

The truth is, I don't understand the point on having to download MBs of stupid animated images I will not even look at when I expect to see a commit diff.

pavel_lishin|11 years ago

It would be nice if Chrome let you disable animated gifs, without disabling all images.

Mithaldu|11 years ago

As far as i understand composer is roughly the same thing as the cpan client. And they just simply disabled the garbage collector for it.

What is this guy doing that he needs gigabytes of memory to install a bunch of php libraries?

    Before: Memory usage: 2194.78MB (peak: 3077.39MB), time: 1324.69s
    After:  Memory usage: 4542.54MB (peak: 4856.12MB), time:  232.66s

debacle|11 years ago

That user is using PEAR, which is the old shitty PHP "CPAN"

That's the reason for the huge memory usage. We're slowly moving away from PEAR, but since it works for now not everyone has/will transition.

Edit: I should also point out that there are a few packages that almost everyone uses (PHPMD, PHPCS, phpUnit) that are still mostly pulled from PEAR, though I think phpUnit has a composer option.

84koba|11 years ago

Recursively gathering all dependencies a project might have. A huge downside of the modern scripting languages landscape is that dependency graphs can get quite convoluted

Maken|11 years ago

So, did they exchange a 70% reduction in execution time for a 100% increment in memory usage?

smackfu|11 years ago

Yeah, seems like a bug "fix."

1ris|11 years ago

Yeah, I wondered when They turn it on again.

DangerousPie|11 years ago

Interesting. I was looking at the comments hoping for some more technical background, but unfortunately they seem to have been run over by the animated gif crowd.

Any more details on this?

rossriley|11 years ago

It seems when you start to hit the memory limit PHP's automatic garbage collection will loop through the constructed objects to see if any can be cleaned up.

If none can (and in the case of Composer all the objects exist for a reason) then it's wasting time analysing the objects.

So in this case there's only a large waste of cpu doing nothing with gc enabled.

mtmail|11 years ago

Wonderful commit.

(I didn't know animated gifs in github comments are a thing. Maybe I work too much with boring projects.)

dec0dedab0de|11 years ago

Could someone more versed with PHP, and this project explain why turning off garbage collection helped so much? and why they didn't turn it back on at the end of the function?

jcampbell1|11 years ago

PHP is reference counted, so memory is typically freed as soon as an object is no longer needed. Cycles are the exception which can cause memory leaks, so in version 5.3 php added a cycle collector, which reads every object in memory and very occasionally deletes objects that are disconnected and have greater than zero reference counts (cycles).

In my opinion, the php cycle collector is a pointless waste of time. In objective-c, apple just let's the memory leak by default, and they give you tools to find the leaks, and then you modify the code to break the cycles.

There is no need to turn cycle collection back on at the end of the program, because OS frees the memory at program termination.

masklinn|11 years ago

> Could someone more versed with PHP, and this project explain why turning off garbage collection helped so much?

The cycle collector is relatively recent, I expect it's not very performant (since most PHP applications don't need it) and composer's dependency resolution may be hitting a pathological case (create lots of objects without cycles, triggering lots of collections but no actually useful work)

> and why they didn't turn it back on at the end of the function?

Since it's a package manager, I'd guess the expectation is the process will die soon-ish afterwards (once it's installed whatever it's resolved). There's a discussion of re-enabling it after dependency resolution (so postinstall hooks run with GC enabled) though.

daveid|11 years ago

Garbage collection is slow, but reduces memory usage. So disabling it costs memory. Also, Composer does not keep running, once the job is done, the script terminates, so you don't have to enable GC back again (it's only disabled in the context of the current execution).

kornakiewicz|11 years ago

I remember story of my friend in algorithmic contest for high school students in Poland (which are quite hard). He solved problems correctly, but in his implementation he got to check in every iteration of loop if a collection still got any elements. He used col.size()==0 instead of col.isEmpty(). The first was O(n) and it fucked up all performance.

tantalor|11 years ago

That's a bug.

shaurz|11 years ago

Wait, when did Github become the new 4chan?

meowface|11 years ago

It's much closer to reddit than 4chan, otherwise the nature of the images posted would be a little different.

And it's been like this for 2 or 3 years now. I've seen comment spam of images for commits and issues for quite a while.

username223|11 years ago

4chan? I always thought it was a mashup of Myspace and Dropbox. Can I find unspeakably awful porn there, too?

markartur|11 years ago

What is wrong with the comments...

trebor|11 years ago

Nothing actually, they're working correctly. The people on the other hand... that's the questionable part. ;)

gus_massa|11 years ago

I found an interesting comment between the gif: https://github.com/composer/composer/commit/ac676f47f7bbc619... by h4cc

> Behold, found something in the docs about garbage collection:

>> Therefore, it is probably wise to call gc_collect_cycles() just before you call gc_disable() to free up the memory that could be lost through possible roots that are already recorded in the root buffer. [...]

headius|11 years ago

Am I the only one that considers this disgusting? If the GC is so bad that it causes 2-10x slower operation in this use case, then it's a bad GC. I mean really, really bad. Short-lived objects in any modern GC should be swept away trivially without a lot of overhead. Of course we're talking about PHP here, so perhaps it's redundant to say something about it sucks, but jesus...runtimes that require hacks like this should be taken out back and shot.

NDizzle|11 years ago

Wow, OSX 64bit chrome can't handle that many animated gifs. 32bit could just fine. What gives?!

r109|11 years ago

BeamSyncDropper v2

munificent|11 years ago

I was curious, so I did some investigation, starting here:

http://php.net/manual/en/features.gc.php

Here's when I found:

PHP uses ref-counting for most garbage collection. That means non-cyclic data structures are collected eagerly, as soon as the last reference to an object is removed.

Naïve ref-counting can't collect cyclic data structures, though. Normally, cycles are "collected" in PHP by just waiting until the request is done and ditching everything. That works great for web sites, but makes less sense for a command line app like Composer.

To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count is decremented but not zero, that means a new island of detached cyclic objects could have been created. When this happens, it adds that object to an array of possible cyclic roots.

When that array gets full (10,000 elements), the cycle collector is triggered. This walks the array and tries to collect any cyclic objects. They reference this paper[1] for their algorithm for doing this, but what they describe just sounds like a regular simple synchronous cycle collector to me.

The basic process is pretty simple. Starting at an object that could be the beginning of some cyclic graph, speculatively decrement the ref-count of everything it refers to. If any of them go to zero, recursively do that to everything they refer to and so on. When that's done, if you end up with any objects that are at zero references, they can be collected. For everything left, undo the speculative decrements.

If you have a large live object graph, this process can be super slow: you have to traverse the entire object graph. If there are few dead objects, you burn a bunch of time doing this and don't get anything back.

Meanwhile, you're busy adding and removing references to live objects, so that potential root array is constantly filling up, re-triggering the same ineffective collection over and over again. Note that this happens even when you aren't allocating: just assigning references is enough to fill the array.

To me, this is the real problem compared to other languages. You shouldn't thrash your GC if you aren't allocating anything!

Disabling the GC (which only disables the cycle collector, not the regular delete-on-zero-refs) avoids that. However, it has a side effect. Once the potential root array is full, any new potential roots get discarded. That means even if you re-enable the cycle collector later, those cyclic objects may never be collected. Probably not a problem for Composer since its a command-line app that exits when done, but not a good idea for a long-running app.

There are other things PHP could do here:

1. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC based on allocation pressure, not just by mutating memory. Obviously, this would be a big change!

2. Consider prioritizing and incrementally processing the root array. If it kept track of how often the same object reappeared in the root array each GC, it can get a sense of "hey, we're probably not going to collect this". Sort the array by priority so that potentially cyclic objects that have been live in the past are at one end. Then don't process the whole array: just process for a while and stop.

[1]: http://media.junglecode.net/media/Bacon01Concurrent.pdf

echeese|11 years ago

That page has over 200MB worth of animated gifs, just as a warning.

aidenn0|11 years ago

Naive mark and sweep: making refcounting look fast for 50 years.

caiob|11 years ago

Some insightful comments would've been nice.

eXpl0it3r|11 years ago

I see two lines changed! Click bait! :P

btbuildem|11 years ago

Do you work with 13yr-olds?

benihana|11 years ago

The commit is great. I love that the comments have spiraled completely out of control. At this point, 30 minutes after the link was posted, the comment thread is now a competition to see who can post the best gif.

I know we're serious here, but stuff like this reminds me why I love the internet so much. It's fun to cut loose once in a while.

bshimmin|11 years ago

Agreed. It's such a shame that HackerNews doesn't let you post animated GIFs - I think it'd really add a lot of value to the discussions here.

arenaninja|11 years ago

I think it's great too, but I shudder to think what it might look like 5 years from now. I can only figure that there will be dead gifs everywhere

ProAm|11 years ago

No kidding, when did GitHub turn into Reddit/4Chan/9Gag?

xfalcox|11 years ago

I love it, but github could supress them behind a click to view (maybe all comments on mobile) because I just lost 10% of my data plan...

feld|11 years ago

pictures/gifs don't belong in github comments. this is the dumbest thing.

Treasdex|11 years ago

Warning NSFW Commit detected. "Pedophile 11-year-old girl images from 4Chan"

Stay classy programmers.

Retrazder|11 years ago

Warning: The commit is NSFW.

The commits are embarrassing, stupid and really exposes why developers are considered idiots. Why troll?

Because they are jerks. Period. Grow up noobs.

cdnsteve|11 years ago

What was I looking at again? I forgot because all of these animated GIF's are amazing!

unknown|11 years ago

[deleted]

chronial|11 years ago

Please try and keep your attitude in check. Your tone is not helping.