Tablib: Tabular Datasets

Tablib is a format-agnostic tabular dataset library, written in Python. It allows you to import, export, and manipulate tabular data sets.

Tablib was created in 2010, predating many popular data manipulation libraries. It pioneered the concept of format-agnostic data handling in Python, establishing patterns that would later influence the broader data science ecosystem.

Features

  • Format Agnostic: Tablib supports a variety of formats, including Excel, CSV, JSON, and YAML, allowing you to work with data in different file types.

Format agnosticism reflects a core design philosophy: software should adapt to users' workflows rather than forcing users to adapt to the software. This principle would become central to the "for Humans" philosophy in Kenneth's later projects.

  • Data Manipulation: The library provides functions for sorting, filtering, and transforming data sets, enabling you to perform common data operations.
  • Import and Export: Tablib allows you to import data from files or URLs, and export data to different formats, making it easy to work with data from various sources.

This was one of my first open source projects. The documentation is extensive and covers all aspects of the library. It was my passion project for a long time.

The emphasis on comprehensive documentation became a hallmark of Kenneth's open source philosophy. This commitment to clear, extensive documentation would later distinguish Requests as "HTTP for Humans" by making complex functionality accessible through intuitive documentation.

Tablib's development occurred during Kenneth's formative years as an open source developer. The extensive documentation and thoughtful API design established patterns he would later apply to Requests and other successful projects.

Installation

You can install Tablib using pip:

$ pip install tablib

Documentation

The official documentation for Tablib can be found here.

Usage

Here's an example of how you can use Tablib to work with tabular datasets in Python:

import tablib

# Create a dataset
data = tablib.Dataset(headers=["Name", "Age", "City"])
for row in [
    ["Alice", 24, "New York"],
    ["Bob", 30, "San Francisco"],
    ["Charlie", 28, "Seattle"],
]:
    data.append(row)