(no title)
V1ndaar | 1 year ago
His code is doing a few more things than necessary. The actual algorithm is inside the `uniqCEcvm` template. The `it` it receives is anything you can iterate over (a collection or an iterator). Multiple things in one line really only appear where they directly relate to the part at the beginning of the line.
The `when isMainModule` is Nim's way of Python's `if __name__ == "__main__"`. The entire part below that is really just a mini CL interface to bench different (random) examples. Final thing to note maybe, the last expression of a block (e.g. of the template here) will be returned.
And well, the style of comments is just personal preference of course. Whether you prefer to stay strictly below 80 cols or not, shrug.
I grant you that the usage of 2 sets + pointer access to them + swapping makes it harder to follow than necessary. But I assume the point of it was not on "how to write the simplest looking implementation of the algorithm as it appears in the paper". But rather to showcase a full implementation of a reasonably optimized version.
Here's a version (only the algorithm) following the paper directly:
proc estimate[T](A: seq[T], ε, δ: float): float =
let m = A.len
var p = 1.0
var thr = ceil(12.0 / ε^2 * log2(8*m.float / δ))
var χ = initHashSet[T](thr.round.int)
for i in 0 ..< m:
χ.excl A[i]
if rand(1.0) < p:
χ.incl A[i]
if χ.card.float >= thr:
for el in toSeq(χ): # clean out ~half probabilistically
if rand(1.0) < 0.5:
χ.excl el
p /= 2.0
if χ.card.float >= thr:
return -1.0
result = χ.card.float / p
menthe|1 year ago