Read, persist and provide data¶

You can get an overview of public repositories with research data e.g. in Open data.

In addition to specific Python libraries for accessing File systems and Geodata, we will introduce you to different serialisation formats and three tools in more detail that make data accessible:

Tip

Read, write and provide data with Python

See also

pandas I/O API: The pandas I/O API is a set of top level reader functions that return a pandas object. In most cases corresponding write methods are also available.
Scrapy: Framework for extracting data from websites as JSON, CSV or XML files.
Pattern: Python module for data mining, natural language processing, ML and network analysis.
Web Scraping Reference: Overview of web scraping with Python.

We introduce PostgreSQL, SQLAlchemy and PostGIS for storing relational data, Python objects and geodata.

For the storage of other data types we introduce you to different NoSQL databases and concepts.

Next, we will show you how to provide the data via an Application Programming Interface (API).

With DVC we present you a tool that allows data provenance, i.e. the traceability of the origin of the data and the way they are created.

Finally in the next chapter you will learn some good practices and helpful Python packages to clean up and validate data.