pandas IO tools

pandas has a number of functions for reading table data as DataFrame objects, including

Function

Description

pandas.read_csv

loads CSV data from a file, URL or file-like object; usually a comma is used as separator

pandas.read_fwf

loads fwf, which is data in column format with a fixed width

pandas.read_clipboard

reads data from the clipboard and passes it to read_csv; useful for converting tables from web pages, among other things

pandas.read_excel

reads table data from an Excel XLS or XLSX file

pandas.read_hdf

reads HDF5 files

pandas.read_html

reads all tables from the specified HTML document

pandas.read_json

reads data from a JSON file

pandas.read_feather

reads the Feather binary file format

pandas.read_orc

reads Apache ORC binary data

pandas.read_parquet

reads Apache Parquet binary file format

pandas.read_pickle

reads any object stored in Python Pickle format

pandas.read_sas

reads a SAS data set

pandas.read_spss

reads a data file created by SPSS

pandas.read_sql

reads the results of an SQL query (with SQLAlchemy) as a pandas DataFrame

pandas.read_sql_table

reads an entire SQL table (with SQLAlchemy) as a pandas DataFrame (corresponds to a query that selects everything Rin this table with read_sql)

pandas.read_stata

reads a data set from the Stata file format

See also

pandas I/O API

The pandas I/O API is a collection of reader functions that return a pandas object. In most cases, corresponding writer methods are also available.

First, I will give an overview of some of these functions that are designed to convert text and excel data into a pandas DataFrame: CSV, JSON and Excel. The optional arguments for these functions can be divided into the following categories:

Indexing

Can one or more columns index the returned DataFrame, and whether the column names should be retrieved from the file, the arguments you specify, or not at all.

Type inference and data conversion

This includes the custom value conversions and the custom list of missing value flags.

Date and time parsing

This includes the combining capability, including combining date and time information spread across multiple columns into a single column in the result.

Iteration

Support for iteration over parts of very large files.

Problems with unclean data

Skipping of rows or footers, comments or other trivia such as numeric data with thousands separated by commas.

Since data can be very messy in the real world, some of the data loading functions (especially read_csv) have accumulated a long list of optional arguments over time. The online documentation for pandas contains many examples of each function.

Some of these functions, like pandas.read_csv, perform type inference because the data types of the columns are not part of the data format. This means that you don’t necessarily have to specify which columns are numeric, integer, boolean or string. With other data formats such as HDF5, ORC and Parquet, however, the data type information is already embedded in the format.