Data cleansing and validation#

In the following, we want to give you a practical overview of various libraries and methods for data cleansing and validation with Python. Besides well-known libraries like NumPy and Pandas, we also use several small, specialised libraries like dedupe, fuzzywuzzy, voluptuous, bulwark, tdda and hypothesis. We prefer these more lightweight solutions to large, universal systems like Great Expectations or MobyDQ.