Intake-GUI: Exploring data in a graphical user interface#

Intake GUI has been re-implemented so that it can be made available not only in Jupyter notebooks, but also in other web applications. It displays the contents of all installed catalogs and enables local and remote catalogs to be selected and to be searched and selected from.

Intake supports the division of labor between data engineers who curate, manage, and deploy data, and data scientists who analyse and visualise data without having to know how it’s stored.

The Intake GUI is based on Panel, with the control panel offering a composite dashboard solution for displaying plots, images, tables, texts and widgets. Panel works both in a Jupyter notebook and in a standalone Tornado application.

From a data engineer’s point of view, this means that you can deploy the recording GUI at an endpoint and use it as a data exploration tool for your data users. This also means that it’s easy to adapt and reorganise the GUI in order to insert your own logo, reuse parts of it in your own applications or add new functions.

In the future, Intake-GUI should also allow the input of user parameters as well as the editing and saving of catalogs.

[1]:
import intake


intake.gui
[1]:

The GUI contains three main areas:

  1. a list of catalogs. The builtin catalog shown by defaul tcontains data records installed in the system, just like intake.cat.

  2. a list of the sources in the currently selected catalog.

  3. a description of the currently selected source.

Ad 1: Catalogs#

No catalog is currently displayed in the list of catalogs. However, under the three main areas there are three buttons that can be used to add, remove, or search catalogs.

The buttons are also available through the API, e.g. for Add Catalog with:

[2]:
intake.gui.add("./us_crime/us_crime.yaml")

Remote catalogs are e.g. available under

Ad 2. Sources#

Selecting a source from the list updates the descriptive text on the left side of the user interface.

This is also available via the API:

[3]:
intake.gui.sources
[3]:
[name: us_crime
 container: dataframe
 plugin: ['csv']
 driver: ['csv']
 description: US Crime data [UCRDataTool](https://www.ucrdatatool.gov/Search/Crime/State/StatebyState.cfm)
 direct_access: forbid
 user_parameters: []
 metadata:
   plots:
     line_example:
       kind: line
       y: ['Robbery', 'Burglary']
       x: Year
     violin_example:
       kind: violin
       y: ['Burglary rate', 'Larceny-theft rate', 'Robbery rate', 'Violent Crime rate']
       group_label: Type of crime
       value_label: Rate per 100k
       invert: True
 args:
   urlpath: {{ CATALOG_DIR }}/data/crime.csv]

This consists of a list of regular Intake data source entries. To look at the first entries, we can enter the following:

[4]:
source = intake.gui.sources[0]

source.to_dask().head()
[4]:
[200~Year Population Violent crime total Murder and nonnegligent Manslaughter Legacy rape /1 Revised rape /2 Robbery Aggravated assault Property crime total Burglary ... Violent Crime rate Murder and nonnegligent manslaughter rate Legacy rape rate /1 Revised rape rate /2 Robbery rate Aggravated assault rate Property crime rate Burglary rate Larceny-theft rate Motor vehicle theft rate
0 1960 179323175 288460 9110 17190 NaN 107840 154320 3095700 912100 ... 160.9 5.1 9.6 NaN 60.1 86.1 1726.3 508.6 1034.7 183.0
1 1961 182992000 289390 8740 17220 NaN 106670 156760 3198600 949600 ... 158.1 4.8 9.4 NaN 58.3 85.7 1747.9 518.9 1045.4 183.6
2 1962 185771000 301510 8530 17550 NaN 110860 164570 3450700 994300 ... 162.3 4.6 9.4 NaN 59.7 88.6 1857.5 535.2 1124.8 197.4
3 1963 188483000 316970 8640 17650 NaN 116470 174210 3792500 1086400 ... 168.2 4.6 9.4 NaN 61.8 92.4 2012.1 576.4 1219.1 216.6
4 1964 191141000 364220 9360 21420 NaN 130390 203050 4200400 1213200 ... 190.6 4.9 11.2 NaN 68.2 106.2 2197.5 634.7 1315.5 247.4

5 rows × 22 columns

[5]:
source.gui
[5]:
[6]:
intake.gui.source.description
[6]:
[7]:
cat = intake.open_catalog("./us_crime/us_crime.yaml")

cat.gui