Working with Tabular Data in TypeScript
High-performance data processing and schema validation for tabular data built on nodejs-polars (a Rust-based DataFrame library).
Installation
Section titled “Installation”npm install fairspecGetting Started
Section titled “Getting Started”The table package provides core utilities for working with tabular data:
normalizeTable- Convert table data to match a schemadenormalizeTable- Convert normalized data back to raw formatinferTableSchemaFromTable- Automatically infer schema from table datainspectTable- Get table structure informationqueryTable- Query tables using SQL-like syntax
For example:
import { loadCsvTable, inferTableSchemaFromTable } from "fairspec"
const table = await loadCsvTable({ data: "data.csv" })const schema = await inferTableSchemaFromTable(table)Basic Usage
Section titled “Basic Usage”Schema Inference
Section titled “Schema Inference”Automatically infer Table Schema from data:
import * as pl from "nodejs-polars"import { inferTableSchemaFromTable } from "fairspec"
// Create a table from dataconst table = pl.DataFrame({ id: ["1", "2", "3"], price: ["10.50", "25.00", "15.75"], date: ["2023-01-15", "2023-02-20", "2023-03-25"], active: ["true", "false", "true"]}).lazy()
const schema = await inferTableSchemaFromTable(table, { sampleRows: 100, // Sample size for inference confidence: 0.9 // Confidence threshold})
// Result: automatically detected integer, number, date, and boolean typesTable Normalization
Section titled “Table Normalization”Convert table data to match a Table Schema (type conversion):
import * as pl from "nodejs-polars"import { normalizeTable } from "fairspec"import type { TableSchema } from "fairspec"
// Create table with string dataconst table = pl.DataFrame({ id: ["1", "2", "3"], price: ["10.50", "25.00", "15.75"], active: ["true", "false", "true"], date: ["2023-01-15", "2023-02-20", "2023-03-25"]}).lazy()
// Define schemaconst schema: TableSchema = { properties: { id: { type: "integer" }, price: { type: "number" }, active: { type: "boolean" }, date: { type: "date" } }}
const normalized = await normalizeTable(table, schema)const result = await normalized.collect()
// Result has properly typed columns:// { id: 1, price: 10.50, active: true, date: Date('2023-01-15') }Table Denormalization
Section titled “Table Denormalization”Convert normalized data back to raw format (for saving):
import { denormalizeTable } from "fairspec"
// Denormalize for saving (converts dates to strings, etc.)const denormalized = await denormalizeTable(table, schema, { nativeTypes: ["string", "number", "boolean"]})Advanced Features
Section titled “Advanced Features”Working with Table Schema
Section titled “Working with Table Schema”Define schemas with column properties and constraints:
import type { TableSchema } from "fairspec"
const schema: TableSchema = { properties: { id: { type: "integer", minimum: 1 }, name: { type: "string", minLength: 1, maxLength: 100 }, email: { type: "string", pattern: "^[^@]+@[^@]+\\.[^@]+$" }, age: { type: "integer", minimum: 0, maximum: 150 }, status: { type: "string", enum: ["active", "inactive", "pending"] } }, required: ["id", "name", "email"], primaryKey: ["id"]}Schema Inference Options
Section titled “Schema Inference Options”Customize how schemas are inferred:
import { inferTableSchemaFromTable } from "fairspec"
const schema = await inferTableSchemaFromTable(table, { sampleRows: 100, // Number of rows to sample confidence: 0.9, // Confidence threshold for type detection keepStrings: false, // Keep original string types columnTypes: { // Override types for specific columns id: "integer", status: "categorical" }})Handling Missing Values
Section titled “Handling Missing Values”Define missing value indicators:
const schema: TableSchema = { properties: { value: { type: "number" } }, missingValues: ["", "N/A", "null", -999]}Primary Keys and Constraints
Section titled “Primary Keys and Constraints”Define table-level constraints:
const schema: TableSchema = { properties: { user_id: { type: "integer" }, email: { type: "string" } }, primaryKey: ["user_id"], uniqueKeys: [ { columnNames: ["email"] } ]}Supported Column Types
Section titled “Supported Column Types”Primitive Types
Section titled “Primitive Types”string- Text datainteger- Whole numbersnumber- Decimal numbersboolean- True/false values
Temporal Types
Section titled “Temporal Types”date- Calendar datesdatetime- Date and timetime- Time of dayduration- Time spans
Spatial Types
Section titled “Spatial Types”geojson- GeoJSON geometrieswkt- Well-Known Text geometrieswkb- Well-Known Binary geometries
Complex Types
Section titled “Complex Types”array- Fixed-length arrayslist- Variable-length listsobject- JSON objects
Specialized Types
Section titled “Specialized Types”email- Email addressesurl- URLscategorical- Categorical database64- Base64 encoded datahex- Hexadecimal data
Table Type
Section titled “Table Type”The package uses LazyDataFrame from nodejs-polars for efficient processing:
import type { Table } from "fairspec"import * as pl from "nodejs-polars"
// Table is an alias for pl.LazyDataFrameconst table: Table = pl.DataFrame({ id: [1, 2, 3] }).lazy()