Introduction#

Target groups#

The target groups are diverse, from data scientists to data engineers and analysts to systems engineers. Their skills and workflows are very different. However, one of the great strengths of Python for Data Science is that it allows these different experts to work closely together in cross-functional teams.

Data scientists

explore data with different parameters and summarise the results.

Data engineers

check the quality of the code and make it more robust, efficient and scalable.

Data analysts

use the code provided by data engineers to systematically analyse the data.

System engineers

provide the research platform based on the JupyterHub on which the other roles can perform their work.

In this tutorial we address system engineers who want to build and run a platform based on Jupyter notebooks. We then explain how this platform can be used effectively by data scientists, data engineers and analysts.

Structure of the Python for Data Science tutorial#

From Chapter 2, the tutorial follows the prototype of a research project:

  1. Workspace with the installation and configuration of IPython, Jupyter notebooks with nbextensions and ipywidgets.

  2. Read, persist and provide data either through a REST API or directly from an HTML page.

  3. Data cleansing and validation is a recurring task that involves removing or changing redundant, inconsistent or incorrectly formatted data.

  4. Visualise data has been moved to a separate tutorial with the many different possibilities.

  5. Performance introduces ways to make your code run faster.

  6. Create a product shows what is necessary to achieve reproducible results: not only reproducible environments are needed, but also versioning of the source code and data. The source code should be packed into programme libraries with documentation, licence(s), tests and logging. Finally, the chapter includes advice on improving code quality and secure operation.

  7. Create web applications can either generate dashboards from Jupyter notebooks or require more comprehensive application logic, such as demonstrated in Bokeh-Plots in Flask einbinden, or provide data via a RESTful API.

:

Status#

Contributors License pre-commit.ci status Docs https://zenodo.org/badge/DOI/10.5281/zenodo.8024719.svg Mastodon

:

Follow us#

Pull-Requests#

If you have suggestions for improvements and additions, I recommend that you create a Fork of my GitHub Repository and make your changes there. . You are also welcome to make a pull request. If the changes contained therein are small and atomic, I’ll be happy to look at your suggestions.

The following guidelines help us to maintain the German translation of the tutorial:

  • Write commit messages in Englisch

  • Start commit messages with a Gitmoji

  • Stick to English names of files and folders.