Read, persist and provide data#
You can get an overview of public repositories with research data e.g. in Open data.
In addition to specific Python libraries for accessing Overview and Geodata, we will introduce you to different serialisation formats and three tools in more detail that make data accessible:
See also
- pandas I/O API
The pandas I/O API is a set of top level
reader
functions that return a pandas object. In most cases correspondingwrite
methods are also available.- Scrapy
Framework for extracting data from websites as JSON, CSV or XML files.
- Pattern
Python module for data mining, natural language processing, ML and network analysis.
- Web Scraping Reference
Overview of web scraping with Python.
We introduce PostgreSQL, SQLAlchemy and PostGIS for storing relational data, Python objects and geodata.
For the storage of other data types we introduce you to different NoSQL databases and concepts.
Next, we will show you how to provide the data via an Application Programming Interface (API).
With DVC we present you a tool that allows data provenance, i.e. the traceability of the origin of the data and the way they are created.
Finally in the next chapter you will learn some good practices and helpful Python packages to clean up and validate data.