top | item 23047193

(no title)

> It occurs to me that for this particular algorithm there might be a chance to rescue it. Instead of simply spinning, a waiting workgroup might make a small amount of progress recomputing the aggregate for the partition it's waiting on. After a finite number of spins (the partition size divided by this grain of progress), it would have the aggregate for the partition so would be able to move on to the next partition. Thus the correctness concern becomes a performance concern, where a very high probability of the spin yielding early is likely "good enough."

Yes, I've seen this workaround perform well in practice, sorry for not mentioning it >.<. It does blow up any asymptotic efficiency guarantees though, which may or may not be acceptable for the application at hand.

> I'll not make any promises about another blog - I worry that subgroup size tuning is too much in the weeds other than for a very specialized audience. But I certainly do hope to blog more about piet-gpu and will see if I can touch on the topic then.

Looking forward to any future piet-gpu posts you may write :-).

discuss

No comments yet.