Solving Algorithmic Problems in Python with Pytest (2019)

[+] morelandjs|5 years ago|reply

Side remark, but I really wish interviewers gave me the option of a terminal with vim since that’s pretty much standard on any Unix machine I might ssh into these days.

All too often someone puts me in some weird IDE and I feel like a cat with boots on.

[+] Fiveplus|5 years ago|reply

Someone on our school's group chat shared they had a technical interview where she was asked to write code on a shared google doc. I really hope that was an outlier.

[+] abdullahkhalids|5 years ago|reply

How would you write tests for the following problem with random output?

Write a function biasedcoin(n,p) that takes the number of coin flips, n, and the probability of heads, p. It flips a biased coin n times, and returns the ratio of number of heads/number of tails. p is guaranteed to have two significant numbers eg. p = 0.60 or p = 0.74. You should use the random.randrange function to generate random numbers.

Example implementation

def biasedcoin(n,p):

    import randrange from random

    heads = tails = 0

    for i in range(n):

        if randrange(100) < 100*p:

            heads += 1

        else:

            tails += 1

    return heads/tails

[+] mlthoughts2018|5 years ago|reply

Simple: use mocker.patch to patch out random.randrange and have it return a fixed sequence of results that would imply a known result.

Your test should not depend on the internal workings of randrange or anything relying on unfixed random state. Your test is only checking if, given correct results from randrange (or any other external world source of random draws) that the rest of your function correctly produces the two-digit bias number.

Otherwise you are just testing randrange itself.

If you are asking how to test a pseudorandom number generator, you have a few choices. You can either fix the random seed and test the algorithm’s precise implementation on a large number of known results. Or you can define tests statistically with margins of error, for example testing the entropy in a series of uniformly generated bits or the resulting distribution in a series of random draws from a fixed list, and decide what level of precision is tolerable before considering a test failed.

[+] matsemann|5 years ago|reply

If it were a non-trivial example I would split it in two: One function to generate the data, and one function to count and do the statistics. Then you can pass the second one known datasets and verify it gives you correct statistics, and at least be confident that that part is correct.

Of course that still leaves the generation and randomness part. You could make a separate function for numberToTailOrHead(...) that you can test. Or maybe you should have the ability to send in which source of randomness it uses. And then you can provide a fake random implementation that tests the edge cases (0, 0.5, 0.999).

[+] henrikeh|5 years ago|reply

You could approach this a number of ways.

First, if you can ensure the seed of the generator, then that could be used on a case by case basis. But this tricky, since it introduces a subtle dependency on the implementation of biasedcoin.

Another option is to split the function/process into a stochastic and deterministic part. Test the deterministic part thoroughly. biasedcoin is too trivial for this, but it works nicely in more complicated setups.

For biasedcoin, I would probably go with a statistical approach. Given a certain bias, the expected value and distribution is known. I would simply test that several trials of the function lie within some bound.

[+] eidorb|5 years ago|reply

I like using doctest in these simple cases - the tests go in the docstrings!

(Also, it’s part of the standard library.)

https://docs.python.org/3/library/doctest.html

[+] tmoertel|5 years ago|reply

When testing solutions to algorithmic problems, it's often useful to use randomized property checking to verify that the solution's expected properties hold for approximately all inputs. The QuickCheck family of testing tools is probably the best known application of this approach. It's also pretty easy to roll your own. Some hand-rolled examples in Python:

https://github.com/tmoertel/practice/blob/master/dailycoding...

[+] fnord123|5 years ago|reply

Hypohesis is a tool to do this https://github.com/HypothesisWorks/hypothesis/tree/master/hy...

[+] makapuf|5 years ago|reply

I see the point of pytest and the greatness of it. I like pytest but I like minimalism more. I dont find it a good showcase here: couldn't you have kept only the asserts and just run the script?

[+] jaimebuelta|5 years ago|reply

In that case, only the first fail will be reported and exit. Which is fair enough in some cases, but may be confusing for long sequences of tests.

(There’s the option to get that behaviour in pytest as well with a parameter. Also, pytest may run the tests in parallel, which may speed up the results)

[+] rational_indian|5 years ago|reply

Why do you need the minimum == 0 test in the following chunk of code ...

  if i > 0 and (minimum == 0 or i < minimum):
      minimum = i

[+] mlthoughts2018|5 years ago|reply

The stated problem requires returning the smallest integer greater than 0 in the list. 0 will only be the running minimum if only non-positive numbers have been seen so far, and since i is greater than zero in the condition, if minimum == 0 it means i is the first positive integer we’ve seen, so even though i > minimum (i > 0) we need to swap i as the minimum at that point.

[+] henrikeh|5 years ago|reply

> unittest is an old horse and cart, while pytest is the batmobile.

I really wish more was said than just this. Why is it that pytest is often preferred and unittest is considered a bad choice? What is the impact on the quality of the tests?

[+] andreareina|5 years ago|reply

Pytest:

- lets you use regular assert statements, rather than assertEqual, assertTrue, etc.

- does not impose the use of classes

- the way it does fixtures just feels nicer to use than unittest

Basically, it's more ergonomic and IME more flexible.

[+] 0x008|5 years ago|reply

Test driven development is just such a good habit to get into. You trade just a little bit of ramp-up speed at the beginning of an implementation for a massive reduction in cognitive overhead.

[+] Yajirobe|5 years ago|reply

I'd love to see how test driven development looks like for ML systems.

You write the test for your non-existent model, then write the model and try to train the model to 'pass' the test? Do you also train it on the test case or on other data only?

[+] barefeg|5 years ago|reply

[deleted]

43 comments