Working with Tables in Terminal
Table operations including querying, validation, statistics, and schema management for tabular data files.
Available Commands
Section titled “Available Commands”The fairspec table command provides utilities for working with tables:
describe- Get table statistics and summary informationquery- Query tables using SQL syntaxvalidate- Validate table data against a Table Schemainfer-schema- Automatically infer Table Schema from table datarender-schema- Render Table Schema as HTML or Markdown documentationvalidate-schema- Validate a Table Schema fileinfer-format- Detect table format automaticallyscript- Interactive REPL session with loaded table
Describe Tables
Section titled “Describe Tables”Get statistical summary information about a table:
# Describe a CSV filefairspec table describe data.csv
# Describe a remote tablefairspec table describe https://example.com/data.csv
# Describe from a datasetfairspec table describe --from-dataset dataset.json --from-resource sales
# Output as JSONfairspec table describe data.csv --jsonOutput
Section titled “Output”Returns statistics for each column including:
count- Number of non-null valuesnull_count- Number of null valuesmean- Average value (numeric columns)std- Standard deviation (numeric columns)min- Minimum valuemax- Maximum valuemedian- Median value (numeric columns)
Options
Section titled “Options”--from-dataset <path>- Load table from dataset descriptor--from-resource <name>- Specify resource name from dataset--debug- Show debug information--json- Output as JSON
Format Options
Section titled “Format Options”All standard format options are available (see Format Options section below).
Query Tables
Section titled “Query Tables”Execute SQL queries on tables using Polars SQL engine:
# Basic queryfairspec table query data.csv "SELECT * FROM self WHERE age > 25"
# Aggregate datafairspec table query sales.csv "SELECT region, SUM(amount) as total FROM self GROUP BY region"
# Filter and sortfairspec table query users.csv "SELECT name, email FROM self WHERE active = true ORDER BY name"
# Query from dataset resourcefairspec table query --from-dataset dataset.json --from-resource users \ "SELECT * FROM self WHERE created_at > '2024-01-01'"SQL Syntax
Section titled “SQL Syntax”- Use
selfas the table name in queries - Supports SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, etc.
- Full Polars SQL syntax supported
- Results are output as formatted tables
Options
Section titled “Options”--from-dataset <path>- Load table from dataset descriptor--from-resource <name>- Specify resource name from dataset--debug- Show debug information--json- Output as JSON
Validate Tables
Section titled “Validate Tables”Validate table data against a Table Schema:
# Validate with explicit schemafairspec table validate data.csv --table-schema schema.json
# Validate with inferred schemafairspec table validate data.csv
# Validate from dataset (uses embedded schema)fairspec table validate --from-dataset dataset.json --from-resource users
# Output validation report as JSONfairspec table validate data.csv --table-schema schema.json --jsonValidation Report
Section titled “Validation Report”Returns a validation report with:
valid- Boolean indicating if validation passederrors- Array of validation errors (if any)
Example validation errors:
{ "valid": false, "errors": [ { "type": "table/constraint", "propertyName": "age", "rowNumber": 5, "message": "value 200 exceeds maximum of 150" }, { "type": "table/type", "propertyName": "email", "rowNumber": 12, "message": "invalid email format" } ]}Options
Section titled “Options”--table-schema <path>- Path to Table Schema file--from-dataset <path>- Load table from dataset descriptor--from-resource <name>- Specify resource name from dataset--debug- Show debug information--json- Output as JSON
Infer Table Schema
Section titled “Infer Table Schema”Automatically generate a Table Schema from table data:
# Infer schema from local filefairspec table infer-schema data.csv
# Infer from remote filefairspec table infer-schema https://example.com/data.csv
# Save inferred schema to filefairspec table infer-schema data.csv --json > schema.json
# Infer with custom optionsfairspec table infer-schema data.csv --sample-rows 1000 --confidence 0.95Schema Inference Options
Section titled “Schema Inference Options”--sample-rows <number>- Number of rows to sample for inference (default: 100)--confidence <number>- Confidence threshold for type detection (0-1, default: 0.9)--keep-strings- Keep original string types instead of inferring--column-types <json>- Override types for specific columns--comma-decimal- Treat comma as decimal separator--month-first- Parse dates as month-first (MM/DD/YYYY)
Generated Schema
Section titled “Generated Schema”The inferred schema automatically detects:
- Column types (string, integer, number, boolean, date, datetime, etc.)
- Required columns based on presence
- Enum values for columns with limited distinct values
- Numeric constraints (minimum, maximum)
- String patterns
- Missing value indicators
Example
Section titled “Example”Given this CSV data:
id,name,price,quantity,active,created_at1,Product A,19.99,100,true,2024-01-152,Product B,29.99,50,false,2024-01-203,Product C,39.99,75,true,2024-02-01Infer the schema:
fairspec table infer-schema products.csv --jsonGenerated schema:
{ "properties": { "id": { "type": "integer" }, "name": { "type": "string" }, "price": { "type": "number" }, "quantity": { "type": "integer" }, "active": { "type": "boolean" }, "created_at": { "type": "date" } }, "required": ["id", "name", "price", "quantity", "active", "created_at"]}Render Table Schema
Section titled “Render Table Schema”Render a Table Schema as human-readable HTML or Markdown documentation:
# Render as Markdownfairspec table render-schema schema.json --to-format markdown
# Render as HTMLfairspec table render-schema schema.json --to-format html
# Save to filefairspec table render-schema schema.json --to-format markdown --to-path schema.mdfairspec table render-schema schema.json --to-format html --to-path schema.htmlOutput Formats
Section titled “Output Formats”markdown- Generates Markdown documentation with column descriptions, types, and constraintshtml- Generates styled HTML table documentation
Options
Section titled “Options”--to-format <format>(required) - Output format (markdown or html)--to-path <path>- Save to file instead of stdout--silent- Suppress output messages--debug- Show debug information
Validate Table Schema
Section titled “Validate Table Schema”Validate that a Table Schema file is valid:
# Validate a schema filefairspec table validate-schema schema.json
# Validate from remote sourcefairspec table validate-schema https://example.com/schema.json
# Output as JSONfairspec table validate-schema schema.json --jsonSchema Validation
Section titled “Schema Validation”This validates that the schema itself is:
- Valid JSON
- Compliant with Table Schema specification
- Has correct property definitions
- Uses valid column types and constraints
Validation Report
Section titled “Validation Report”{ "valid": true, "errors": []}Or if invalid:
{ "valid": false, "errors": [ { "type": "schema/invalid", "message": "Invalid column type: 'txt' (did you mean 'text'?)" } ]}Options
Section titled “Options”--silent- Suppress output messages--debug- Show debug information--json- Output as JSON
Infer Format
Section titled “Infer Format”Automatically detect the format of a table file:
# Infer format from filefairspec table infer-format data.csv
# Infer from remote filefairspec table infer-format https://example.com/data.xlsx
# Output as JSONfairspec table infer-format data.parquet --jsonDetected Formats
Section titled “Detected Formats”The command can detect:
csv- Comma-separated valuestsv- Tab-separated valuesjson- JSON formatjsonl- JSON Lines (newline-delimited JSON)xlsx- Excel spreadsheetods- OpenDocument Spreadsheetparquet- Apache Parquetarrow- Apache Arrow/Feathersqlite- SQLite database
Example Output
Section titled “Example Output”{ "name": "csv", "delimiter": ",", "quoteChar": "\""}Interactive Scripting
Section titled “Interactive Scripting”Start an interactive REPL session with a loaded table:
# Load table and start REPLfairspec table script data.csv
# Script table from datasetfairspec table script --from-dataset dataset.json --from-resource usersAvailable in Session
Section titled “Available in Session”fairspec- Full fairspec librarytable- Loaded table (LazyDataFrame)
Example Session
Section titled “Example Session”fairspec> tableLazyDataFrame { ... }
fairspec> await table.collect()DataFrame { ... }
fairspec> await table.select(["name", "age"]).collect()DataFrame { ... }
fairspec> await table.filter(pl.col("age").gt(25)).collect()DataFrame { ... }Format Options
Section titled “Format Options”All table commands support these format options for loading data:
CSV/TSV Options
Section titled “CSV/TSV Options”--format <name>- Format name (csv, tsv, etc.)--delimiter <char>- Column delimiter (default:,)--line-terminator <chars>- Row terminator (default:\n)--quote-char <char>- Quote character (default:")--null-sequence <string>- Null value indicator--header-rows <numbers>- Header row indices (e.g.,[1,2])--header-join <char>- Character to join multi-row headers--comment-rows <numbers>- Comment row indices to skip--comment-prefix <char>- Comment line prefix (e.g.,#)--column-names <names>- Override column names (JSON array)
JSON Options
Section titled “JSON Options”--json-pointer <pointer>- JSON pointer to data array (e.g.,/data/users)--row-type <type>- Row format:objectorarray
Excel/ODS Options
Section titled “Excel/ODS Options”--sheet-number <number>- Sheet index (0-based)--sheet-name <name>- Sheet name
SQLite Options
Section titled “SQLite Options”--table-name <name>- Table name in database
Table Schema Options
Section titled “Table Schema Options”All table commands support these schema-related options:
Type Inference
Section titled “Type Inference”--sample-rows <number>- Sample size for type inference--confidence <number>- Confidence threshold (0-1)--keep-strings- Don’t infer types, keep as strings--column-types <json>- Override types (e.g.,{"age":"integer"})
Value Parsing
Section titled “Value Parsing”--missing-values <values>- Missing value indicators (JSON array)--decimal-char <char>- Decimal separator (default:.)--group-char <char>- Thousands separator (default:,)--comma-decimal- Use comma as decimal (shorthand)--true-values <values>- Custom true values (JSON array)--false-values <values>- Custom false values (JSON array)
Date/Time Parsing
Section titled “Date/Time Parsing”--datetime-format <format>- Datetime format string--date-format <format>- Date format string--time-format <format>- Time format string--month-first- Parse dates as month-first
Array/List Parsing
Section titled “Array/List Parsing”--array-type <type>- Array item type--list-delimiter <char>- List delimiter (default:;)--list-item-type <type>- List item type
Common Workflows
Section titled “Common Workflows”Explore Unknown Data
Section titled “Explore Unknown Data”# 1. Infer the formatfairspec table infer-format unknown-data.txt
# 2. Get basic statisticsfairspec table describe unknown-data.txt
# 3. Infer the schemafairspec table infer-schema unknown-data.txt --json > schema.json
# 4. Query the datafairspec table query unknown-data.txt "SELECT * FROM self LIMIT 10"Schema-Driven Validation
Section titled “Schema-Driven Validation”# 1. Create schema from sample datafairspec table infer-schema sample.csv --json > schema.json
# 2. Validate the schema itselffairspec table validate-schema schema.json
# 3. Generate documentationfairspec table render-schema schema.json --to-format markdown --to-path docs.md
# 4. Validate production datafairspec table validate production.csv --table-schema schema.jsonData Quality Checks
Section titled “Data Quality Checks”# Check for data quality issuesfairspec table validate data.csv --table-schema schema.json
# Get detailed statisticsfairspec table describe data.csv
# Query for specific issuesfairspec table query data.csv "SELECT * FROM self WHERE email NOT LIKE '%@%'"
# Find duplicatesfairspec table query data.csv "SELECT id, COUNT(*) as cnt FROM self GROUP BY id HAVING cnt > 1"Interactive Analysis
Section titled “Interactive Analysis”# Start interactive sessionfairspec table script data.csv
# In REPL:# - Explore: await table.head(10).collect()# - Filter: await table.filter(pl.col("status").eq("active")).collect()# - Aggregate: await table.groupBy("category").agg(pl.sum("amount")).collect()# - Transform: await table.withColumn(pl.col("price").mul(1.1).alias("new_price")).collect()Format Conversion
Section titled “Format Conversion”# Query and output as JSONfairspec table query data.csv "SELECT * FROM self" --json > output.json
# Get statistics and savefairspec table describe large-file.parquet --json > stats.jsonOutput Formats
Section titled “Output Formats”Text Output (default)
Section titled “Text Output (default)”Human-readable output with formatted tables:
fairspec table describe data.csvOutput:
# count mean std min maxid 100 50.5 29.01 1 100price 100 29.99 15.43 9.99 99.99quantity 100 75 28.87 1 150JSON Output
Section titled “JSON Output”Machine-readable JSON for automation:
fairspec table describe data.csv --jsonExamples
Section titled “Examples”CSV Data Analysis
Section titled “CSV Data Analysis”# Get overview of sales datafairspec table describe sales.csv
# Find top customersfairspec table query sales.csv \ "SELECT customer, SUM(amount) as total FROM self GROUP BY customer ORDER BY total DESC LIMIT 10"
# Validate data qualityfairspec table validate sales.csv --table-schema sales-schema.jsonMulti-Format Pipeline
Section titled “Multi-Format Pipeline”# Load Excel datafairspec table describe report.xlsx --sheet-name "Q1 Sales"
# Query specific sheetfairspec table query report.xlsx --sheet-name "Q1 Sales" \ "SELECT region, SUM(revenue) FROM self GROUP BY region"
# Validate against schemafairspec table validate report.xlsx --sheet-name "Q1 Sales" --table-schema schema.jsonRemote Data Validation
Section titled “Remote Data Validation”# Infer schema from remote datafairspec table infer-schema https://api.example.com/export.csv --json > remote-schema.json
# Validate local data against remote schemafairspec table validate local-data.csv --table-schema remote-schema.jsonDatabase Export Validation
Section titled “Database Export Validation”# Validate SQLite exportfairspec table validate export.db --table-name users --table-schema expected-schema.json
# Get statistics from databasefairspec table describe export.db --table-name users
# Query database tablefairspec table query export.db --table-name users \ "SELECT status, COUNT(*) FROM self GROUP BY status"