Skip to content

Working with Tables in Terminal

Table operations including querying, validation, statistics, and schema management for tabular data files.

The fairspec table command provides utilities for working with tables:

  • describe - Get table statistics and summary information
  • query - Query tables using SQL syntax
  • validate - Validate table data against a Table Schema
  • infer-schema - Automatically infer Table Schema from table data
  • render-schema - Render Table Schema as HTML or Markdown documentation
  • validate-schema - Validate a Table Schema file
  • infer-format - Detect table format automatically
  • script - Interactive REPL session with loaded table

Get statistical summary information about a table:

Terminal window
# Describe a CSV file
fairspec table describe data.csv
# Describe a remote table
fairspec table describe https://example.com/data.csv
# Describe from a dataset
fairspec table describe --from-dataset dataset.json --from-resource sales
# Output as JSON
fairspec table describe data.csv --json

Returns statistics for each column including:

  • count - Number of non-null values
  • null_count - Number of null values
  • mean - Average value (numeric columns)
  • std - Standard deviation (numeric columns)
  • min - Minimum value
  • max - Maximum value
  • median - Median value (numeric columns)
  • --from-dataset <path> - Load table from dataset descriptor
  • --from-resource <name> - Specify resource name from dataset
  • --debug - Show debug information
  • --json - Output as JSON

All standard format options are available (see Format Options section below).

Execute SQL queries on tables using Polars SQL engine:

Terminal window
# Basic query
fairspec table query data.csv "SELECT * FROM self WHERE age > 25"
# Aggregate data
fairspec table query sales.csv "SELECT region, SUM(amount) as total FROM self GROUP BY region"
# Filter and sort
fairspec table query users.csv "SELECT name, email FROM self WHERE active = true ORDER BY name"
# Query from dataset resource
fairspec table query --from-dataset dataset.json --from-resource users \
"SELECT * FROM self WHERE created_at > '2024-01-01'"
  • Use self as the table name in queries
  • Supports SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, etc.
  • Full Polars SQL syntax supported
  • Results are output as formatted tables
  • --from-dataset <path> - Load table from dataset descriptor
  • --from-resource <name> - Specify resource name from dataset
  • --debug - Show debug information
  • --json - Output as JSON

Validate table data against a Table Schema:

Terminal window
# Validate with explicit schema
fairspec table validate data.csv --table-schema schema.json
# Validate with inferred schema
fairspec table validate data.csv
# Validate from dataset (uses embedded schema)
fairspec table validate --from-dataset dataset.json --from-resource users
# Output validation report as JSON
fairspec table validate data.csv --table-schema schema.json --json

Returns a validation report with:

  • valid - Boolean indicating if validation passed
  • errors - Array of validation errors (if any)

Example validation errors:

{
"valid": false,
"errors": [
{
"type": "table/constraint",
"propertyName": "age",
"rowNumber": 5,
"message": "value 200 exceeds maximum of 150"
},
{
"type": "table/type",
"propertyName": "email",
"rowNumber": 12,
"message": "invalid email format"
}
]
}
  • --table-schema <path> - Path to Table Schema file
  • --from-dataset <path> - Load table from dataset descriptor
  • --from-resource <name> - Specify resource name from dataset
  • --debug - Show debug information
  • --json - Output as JSON

Automatically generate a Table Schema from table data:

Terminal window
# Infer schema from local file
fairspec table infer-schema data.csv
# Infer from remote file
fairspec table infer-schema https://example.com/data.csv
# Save inferred schema to file
fairspec table infer-schema data.csv --json > schema.json
# Infer with custom options
fairspec table infer-schema data.csv --sample-rows 1000 --confidence 0.95
  • --sample-rows <number> - Number of rows to sample for inference (default: 100)
  • --confidence <number> - Confidence threshold for type detection (0-1, default: 0.9)
  • --keep-strings - Keep original string types instead of inferring
  • --column-types <json> - Override types for specific columns
  • --comma-decimal - Treat comma as decimal separator
  • --month-first - Parse dates as month-first (MM/DD/YYYY)

The inferred schema automatically detects:

  • Column types (string, integer, number, boolean, date, datetime, etc.)
  • Required columns based on presence
  • Enum values for columns with limited distinct values
  • Numeric constraints (minimum, maximum)
  • String patterns
  • Missing value indicators

Given this CSV data:

id,name,price,quantity,active,created_at
1,Product A,19.99,100,true,2024-01-15
2,Product B,29.99,50,false,2024-01-20
3,Product C,39.99,75,true,2024-02-01

Infer the schema:

Terminal window
fairspec table infer-schema products.csv --json

Generated schema:

{
"properties": {
"id": { "type": "integer" },
"name": { "type": "string" },
"price": { "type": "number" },
"quantity": { "type": "integer" },
"active": { "type": "boolean" },
"created_at": { "type": "date" }
},
"required": ["id", "name", "price", "quantity", "active", "created_at"]
}

Render a Table Schema as human-readable HTML or Markdown documentation:

Terminal window
# Render as Markdown
fairspec table render-schema schema.json --to-format markdown
# Render as HTML
fairspec table render-schema schema.json --to-format html
# Save to file
fairspec table render-schema schema.json --to-format markdown --to-path schema.md
fairspec table render-schema schema.json --to-format html --to-path schema.html
  • markdown - Generates Markdown documentation with column descriptions, types, and constraints
  • html - Generates styled HTML table documentation
  • --to-format <format> (required) - Output format (markdown or html)
  • --to-path <path> - Save to file instead of stdout
  • --silent - Suppress output messages
  • --debug - Show debug information

Validate that a Table Schema file is valid:

Terminal window
# Validate a schema file
fairspec table validate-schema schema.json
# Validate from remote source
fairspec table validate-schema https://example.com/schema.json
# Output as JSON
fairspec table validate-schema schema.json --json

This validates that the schema itself is:

  • Valid JSON
  • Compliant with Table Schema specification
  • Has correct property definitions
  • Uses valid column types and constraints
{
"valid": true,
"errors": []
}

Or if invalid:

{
"valid": false,
"errors": [
{
"type": "schema/invalid",
"message": "Invalid column type: 'txt' (did you mean 'text'?)"
}
]
}
  • --silent - Suppress output messages
  • --debug - Show debug information
  • --json - Output as JSON

Automatically detect the format of a table file:

Terminal window
# Infer format from file
fairspec table infer-format data.csv
# Infer from remote file
fairspec table infer-format https://example.com/data.xlsx
# Output as JSON
fairspec table infer-format data.parquet --json

The command can detect:

  • csv - Comma-separated values
  • tsv - Tab-separated values
  • json - JSON format
  • jsonl - JSON Lines (newline-delimited JSON)
  • xlsx - Excel spreadsheet
  • ods - OpenDocument Spreadsheet
  • parquet - Apache Parquet
  • arrow - Apache Arrow/Feather
  • sqlite - SQLite database
{
"name": "csv",
"delimiter": ",",
"quoteChar": "\""
}

Start an interactive REPL session with a loaded table:

Terminal window
# Load table and start REPL
fairspec table script data.csv
# Script table from dataset
fairspec table script --from-dataset dataset.json --from-resource users
  • fairspec - Full fairspec library
  • table - Loaded table (LazyDataFrame)
fairspec> table
LazyDataFrame { ... }
fairspec> await table.collect()
DataFrame { ... }
fairspec> await table.select(["name", "age"]).collect()
DataFrame { ... }
fairspec> await table.filter(pl.col("age").gt(25)).collect()
DataFrame { ... }

All table commands support these format options for loading data:

  • --format <name> - Format name (csv, tsv, etc.)
  • --delimiter <char> - Column delimiter (default: ,)
  • --line-terminator <chars> - Row terminator (default: \n)
  • --quote-char <char> - Quote character (default: ")
  • --null-sequence <string> - Null value indicator
  • --header-rows <numbers> - Header row indices (e.g., [1,2])
  • --header-join <char> - Character to join multi-row headers
  • --comment-rows <numbers> - Comment row indices to skip
  • --comment-prefix <char> - Comment line prefix (e.g., #)
  • --column-names <names> - Override column names (JSON array)
  • --json-pointer <pointer> - JSON pointer to data array (e.g., /data/users)
  • --row-type <type> - Row format: object or array
  • --sheet-number <number> - Sheet index (0-based)
  • --sheet-name <name> - Sheet name
  • --table-name <name> - Table name in database

All table commands support these schema-related options:

  • --sample-rows <number> - Sample size for type inference
  • --confidence <number> - Confidence threshold (0-1)
  • --keep-strings - Don’t infer types, keep as strings
  • --column-types <json> - Override types (e.g., {"age":"integer"})
  • --missing-values <values> - Missing value indicators (JSON array)
  • --decimal-char <char> - Decimal separator (default: .)
  • --group-char <char> - Thousands separator (default: ,)
  • --comma-decimal - Use comma as decimal (shorthand)
  • --true-values <values> - Custom true values (JSON array)
  • --false-values <values> - Custom false values (JSON array)
  • --datetime-format <format> - Datetime format string
  • --date-format <format> - Date format string
  • --time-format <format> - Time format string
  • --month-first - Parse dates as month-first
  • --array-type <type> - Array item type
  • --list-delimiter <char> - List delimiter (default: ;)
  • --list-item-type <type> - List item type
Terminal window
# 1. Infer the format
fairspec table infer-format unknown-data.txt
# 2. Get basic statistics
fairspec table describe unknown-data.txt
# 3. Infer the schema
fairspec table infer-schema unknown-data.txt --json > schema.json
# 4. Query the data
fairspec table query unknown-data.txt "SELECT * FROM self LIMIT 10"
Terminal window
# 1. Create schema from sample data
fairspec table infer-schema sample.csv --json > schema.json
# 2. Validate the schema itself
fairspec table validate-schema schema.json
# 3. Generate documentation
fairspec table render-schema schema.json --to-format markdown --to-path docs.md
# 4. Validate production data
fairspec table validate production.csv --table-schema schema.json
Terminal window
# Check for data quality issues
fairspec table validate data.csv --table-schema schema.json
# Get detailed statistics
fairspec table describe data.csv
# Query for specific issues
fairspec table query data.csv "SELECT * FROM self WHERE email NOT LIKE '%@%'"
# Find duplicates
fairspec table query data.csv "SELECT id, COUNT(*) as cnt FROM self GROUP BY id HAVING cnt > 1"
Terminal window
# Start interactive session
fairspec table script data.csv
# In REPL:
# - Explore: await table.head(10).collect()
# - Filter: await table.filter(pl.col("status").eq("active")).collect()
# - Aggregate: await table.groupBy("category").agg(pl.sum("amount")).collect()
# - Transform: await table.withColumn(pl.col("price").mul(1.1).alias("new_price")).collect()
Terminal window
# Query and output as JSON
fairspec table query data.csv "SELECT * FROM self" --json > output.json
# Get statistics and save
fairspec table describe large-file.parquet --json > stats.json

Human-readable output with formatted tables:

Terminal window
fairspec table describe data.csv

Output:

# count mean std min max
id 100 50.5 29.01 1 100
price 100 29.99 15.43 9.99 99.99
quantity 100 75 28.87 1 150

Machine-readable JSON for automation:

Terminal window
fairspec table describe data.csv --json
Terminal window
# Get overview of sales data
fairspec table describe sales.csv
# Find top customers
fairspec table query sales.csv \
"SELECT customer, SUM(amount) as total FROM self GROUP BY customer ORDER BY total DESC LIMIT 10"
# Validate data quality
fairspec table validate sales.csv --table-schema sales-schema.json
Terminal window
# Load Excel data
fairspec table describe report.xlsx --sheet-name "Q1 Sales"
# Query specific sheet
fairspec table query report.xlsx --sheet-name "Q1 Sales" \
"SELECT region, SUM(revenue) FROM self GROUP BY region"
# Validate against schema
fairspec table validate report.xlsx --sheet-name "Q1 Sales" --table-schema schema.json
Terminal window
# Infer schema from remote data
fairspec table infer-schema https://api.example.com/export.csv --json > remote-schema.json
# Validate local data against remote schema
fairspec table validate local-data.csv --table-schema remote-schema.json
Terminal window
# Validate SQLite export
fairspec table validate export.db --table-name users --table-schema expected-schema.json
# Get statistics from database
fairspec table describe export.db --table-name users
# Query database table
fairspec table query export.db --table-name users \
"SELECT status, COUNT(*) FROM self GROUP BY status"