Read, persist and provide data#
You can get an overview of public repositories with research data e.g. in Open data.
- pandas I/O API
The pandas I/O API is a set of top level
readerfunctions that return a pandas object. In most cases corresponding
writemethods are also available.
Framework for extracting data from websites as JSON, CSV or XML files.
Python module for data mining, natural language processing, ML and network analysis.
- Web Scraping Reference
Overview of web scraping with Python.
For the storage of other data types we introduce you to different NoSQL databases and concepts.
Next, we will show you how to provide the data via an Application Programming Interface (API).
With DVC we present you a tool that allows data provenance, i.e. the traceability of the origin of the data and the way they are created.
Finally in the next chapter you will learn some good practices and helpful Python packages to clean up and validate data.