Data cleansing and validation

In the following, we want to give you a practical overview of various libraries and methods for data cleansing and validation with Python. Besides well-known libraries like NumPy and ,:doc:/workspace/pandas/index we also use several small, specialised libraries like dedupe, fuzzywuzzy, voluptuous, tdda and hypothesis. We prefer these more lightweight solutions to large, universal systems like Great Expectations or MobyDQ.

Overview

Dormant projects

GitHub-Insights

Name

Stars

Mitwirkende

Commit-Aktivität

Lizenz

Bulwark

https://raster.shields.io/github/stars/ZaxR/bulwark https://raster.shields.io/github/contributors/ZaxR/bulwark https://raster.shields.io/github/commit-activity/y/ZaxR/bulwark https://raster.shields.io/github/license/ZaxR/bulwark

PandasSchema

https://raster.shields.io/github/stars/multimeric/PandasSchema https://raster.shields.io/github/contributors/multimeric/PandasSchema https://raster.shields.io/github/commit-activity/y/multimeric/PandasSchema https://raster.shields.io/github/license/multimeric/PandasSchema

pandas-validation

https://raster.shields.io/github/stars/jmenglund/pandas-validation https://raster.shields.io/github/contributors/jmenglund/pandas-validation https://raster.shields.io/github/commit-activity/y/jmenglund/pandas-validation https://raster.shields.io/github/license/jmenglund/pandas-validation

Opulent-Pandas

https://raster.shields.io/github/stars/danielvdende/opulent-pandas https://raster.shields.io/github/contributors/danielvdende/opulent-pandas https://raster.shields.io/github/commit-activity/y/danielvdende/opulent-pandas https://raster.shields.io/github/license/danielvdende/opulent-pandas

signpost

https://raster.shields.io/github/stars/ilsedippenaar/signpost https://raster.shields.io/github/contributors/ilsedippenaar/signpost https://raster.shields.io/github/commit-activity/y/ilsedippenaar/signpost https://raster.shields.io/github/license/ilsedippenaar/signpost