Introduction¶

Target groups¶

The target groups are diverse, from data scientists to data engineers and analysts to systems engineers. Their skills and workflows are very different. However, one of the great strengths of Python for Data Science is that it allows these different experts to work closely together in cross-functional teams.

Data scientists: explore data with different parameters and summarise the results.
Data engineers: check the quality of the code and make it more robust, efficient and scalable.
Data analysts: use the code provided by data engineers to systematically analyse the data.
System engineers: provide the research platform based on the JupyterHub on which the other roles can perform their work.

In this tutorial we address system engineers who want to build and run a platform based on Jupyter notebooks. We then explain how this platform can be used effectively by data scientists, data engineers and analysts.

Structure of the Python for Data Science tutorial¶

From Chapter 2, the tutorial follows the prototype of a research project:

Workspace with the installation and configuration of IPython, Jupyter notebooks with nbextensions and ipywidgets.
Read, persist and provide data either through a REST API or directly from an HTML page.
Data cleansing and validation is a recurring task that involves removing or changing redundant, inconsistent or incorrectly formatted data.
Visualise data has been moved to a separate tutorial with the many different possibilities.
Performance introduces ways to make your code run faster.
Create a product shows what is necessary to achieve reproducible results: not only reproducible environments are needed, but also versioning of the source code and data. The source code should be packed into programme libraries with documentation, licence(s), tests and Logging. Finally, the chapter includes advice on improving code quality and secure operation.
Create web applications can either generate dashboards from Jupyter notebooks or require more comprehensive application logic, such as demonstrated in Integrating bokeh plots into Flask, or provide data via a RESTful API.

Status¶

Follow us¶

Pull-Requests¶

If you have suggestions for improvements and additions, we recommend that you create a Fork of our GitHub Repository and make your changes there. You are also welcome to make a pull request. If the changes contained therein are small and atomic, we will be happy to look at your suggestions.

The following guidelines help us to maintain the German translation of the tutorial:

Write commit messages in English
Start commit messages with a Gitmoji
Stick to English names of files and folders.