Hypothesis: Property-based testing

In this notebook we use property-based testing to find problems in our code. Hypothesis is a library similar to Haskell’s Quickcheck. We’ll get to know it in more detail later, along with other test libraries: Hypothesis. Hypothesis can also provide mock objects and tests for numpy data types.

1. Imports

[1]:
import re

from hypothesis import assume, given
from hypothesis.strategies import emails, integers, tuples

2. Find range

[2]:
def calculate_range(tuple_obj):
    return max(tuple_obj) - min(tuple_obj)

3. Test with strategies and given

With hypothesis.strategies you can create different test data. For this, Hypothesis provides strategies for most types and arguments restrict the possibilities to suit your needs. In the example below, we use the integers strategy, which is applied to the function with the Python-Decorator @given. More specifically, it takes our test function and converts it into a parameterised one to run over wide ranges of matching data:

[3]:
@given(tuples(integers(), integers(), integers()))
def test_calculate_range(tup):
    result = calculate_range(tup)
    assert isinstance(result, int)
    assert result > 0
[4]:
test_calculate_range()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 test_calculate_range()

Cell In[3], line 2, in test_calculate_range()
      1 @given(tuples(integers(), integers(), integers()))
----> 2 def test_calculate_range(tup):
      3     result = calculate_range(tup)
      4     assert isinstance(result, int)

    [... skipping hidden 1 frame]

Cell In[3], line 5, in test_calculate_range(tup)
      3 result = calculate_range(tup)
      4 assert isinstance(result, int)
----> 5 assert result > 0

AssertionError:
Falsifying example: test_calculate_range(
    tup=(0, 0, 0),
)

Now we correct the test with >= and check it again:

[5]:
@given(tuples(integers(), integers()))
def test_calculate_range(tup):
    result = calculate_range(tup)
    assert isinstance(result, int)
    assert result >= 0
[6]:
test_calculate_range()

3. Check against regular expressions

Regular expressions can be used to check strings for certain syntactical rules. In Python, you can use re.match to check regular expressions.

Note

On the website regex101 you can first try out your regular expressions.

As an example, let’s try to find out the username and the domain from email addresses:

[7]:
def parse_email(email):
    result = re.match(
        "(?P<username>\w+).(?P<domain>[\w\.]+)",
        email,
    ).groups()
    return result

Now we write a test test_parse_email to check our method. As input values we use the emails strategy of Hypothesis. As result we expect for example:

('0', 'A.com')
('F', 'j.EeHNqsx')
…

In the test, we assume on the one hand that two entries are always returned and that a dot (.) occurs in the second entry.

[8]:
@given(emails())
def test_parse_email(email):
    result = parse_email(email)
    # print(result)
    assert len(result) == 2
    assert "." in result[1]
[9]:
test_parse_email()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 test_parse_email()

Cell In[8], line 2, in test_parse_email()
      1 @given(emails())
----> 2 def test_parse_email(email):
      3     result = parse_email(email)
      4     # print(result)

    [... skipping hidden 1 frame]

Cell In[8], line 3, in test_parse_email(email)
      1 @given(emails())
      2 def test_parse_email(email):
----> 3     result = parse_email(email)
      4     # print(result)
      5     assert len(result) == 2

Cell In[7], line 5, in parse_email(email)
      1 def parse_email(email):
      2     result = re.match(
      3         "(?P<username>\w+).(?P<domain>[\w\.]+)",
      4         email,
----> 5     ).groups()
      6     return result

AttributeError: 'NoneType' object has no attribute 'groups'
Falsifying example: test_parse_email(
    email='=@A.ac',
)

With Hypothesis, two examples were found that make it clear that our regular expression in the parse_email method is not yet sufficient: 0/0@A.ac and /@A.ac. After we have adapted our regular expression accordingly, we can call the test again:

[10]:
def parse_email(email):
    result = re.match(
        "(?P<username>[\.\w\-\!~#$%&\|{}\+\/\^\`\=\*']+).(?P<domain>[\w\.\-]+)",
        email,
    ).groups()
    return result
[11]:
test_parse_email()