Read, persist and provide data

You can get an overview of public repositories with research data e.g. in Open data.

In addition to specific Python libraries for accessing Overview and Geodata, we will introduce you to different serialisation formats and three tools in more detail that make data accessible:

See also

pandas I/O API

The pandas I/O API is a set of top level reader functions that return a pandas object. In most cases corresponding write methods are also available.

Scrapy

Framework for extracting data from websites as JSON, CSV or XML files.

Pattern

Python module for data mining, natural language processing, ML and network analysis.

Web Scraping Reference

Overview of web scraping with Python.

We introduce PostgreSQL, SQLAlchemy and PostGIS for storing relational data, Python objects and geodata.

For the storage of other data types we introduce you to different NoSQL databases and concepts.

Next, we will show you how to provide the data via an Application Programming Interface (API).

With DVC we present you a tool that allows data provenance, i.e. the traceability of the origin of the data and the way they are created.

Finally in the next chapter you will learn some good practices and helpful Python packages to clean up and validate data.