Working with Datasets in TypeScript
Updated May 11, 2026
A dataset is a collection of related data resources (tables and files) with metadata, table schemas, and relationships. The fairspec package provides a programming API to load, infer, validate, and save datasets across the supported plugins — local folders, Zip archives, GitHub repositories, Zenodo deposits, and CKAN portals.
Installation
npm install fairspecGetting Started
The dataset API provides these top-level utilities:
loadDataset- Load a dataset descriptor from a local or remote sourcesaveDataset- Save a dataset to a folder, Zip archive, or external serviceinferDataset- Infer a dataset descriptor from a list of filesvalidateDataset- Validate a dataset descriptor and all its resources
import { loadDataset, validateDataset } from "fairspec"
const dataset = await loadDataset("dataset.json")
const report = await validateDataset(dataset)Loading a Dataset
loadDataset(source) dispatches to the first plugin that recognises the source. The bundled plugins resolve folder, zip, github, zenodo, and ckan sources automatically.
import { loadDataset } from "fairspec"
const local = await loadDataset("./datasets/sales/dataset.json")
const remote = await loadDataset(
"https://example.com/datasets/sales/dataset.json",
)
const folder = await loadDataset("./datasets/sales")
const zip = await loadDataset("./datasets/sales.zip")
const repo = await loadDataset("https://github.com/fairspec/example-dataset")The returned value is a Dataset — a JSON-compatible object containing resources, metadata fields, and any extension properties.
import { loadDataset } from "fairspec"
const dataset = await loadDataset("dataset.json")
console.log(dataset.title)
console.log(dataset.resources?.length)If no plugin can load the source, loadDataset throws.
Saving a Dataset
saveDataset(dataset, options) writes a dataset to a target location. The target option selects the plugin (folder path, Zip archive path, or remote URL).
import { loadDataset, saveDataset } from "fairspec"
const dataset = await loadDataset(
"https://example.com/datasets/sales/dataset.json",
)
await saveDataset(dataset, { target: "./local-sales" })
await saveDataset(dataset, { target: "./sales.zip" })
await saveDataset(dataset, {
target: "./local-sales",
withRemote: true,
})By default the plugin updates resource paths in the saved descriptor to point at local copies. Set withRemote: true to also download remote files into the target folder.
Inferring a Dataset
inferDataset(dataset, options?) walks dataset.resources and fills in missing format and schema details by running per-resource inference in parallel. It does not take a path — pass a dataset with at least the data paths of each resource.
import { inferDataset } from "fairspec"
const dataset = await inferDataset({
resources: [
{ data: "users.csv" },
{ data: "orders.csv" },
{ data: "products.csv" },
],
})
console.log(dataset.resources?.[0]?.tableSchema)Customise the inference per-column or per-format with the second argument:
import { inferDataset } from "fairspec"
const dataset = await inferDataset(
{ resources: [{ data: "products.csv" }] },
{
sampleRows: 1000,
confidence: 0.95,
columnTypes: { sku: "string" },
},
)The result is a new Dataset object — combine it with saveDataset to persist a descriptor:
import { inferDataset, saveDataset } from "fairspec"
const dataset = await inferDataset({
resources: [{ data: "users.csv" }, { data: "orders.csv" }],
})
await saveDataset(dataset, { target: "./inferred-dataset" })Validating a Dataset
validateDataset(source) accepts either a path or an in-memory Dataset value. It loads the descriptor, validates every resource against its schema, and checks foreign-key relationships, returning a report.
import { validateDataset } from "fairspec"
const report = await validateDataset("dataset.json")
if (report.valid) {
console.log("Dataset is valid")
} else {
for (const error of report.errors) {
console.error(error.type, error)
}
}When you already have a dataset in memory, pass it directly to skip the load step:
import { loadDataset, validateDataset } from "fairspec"
const dataset = await loadDataset("dataset.json")
const report = await validateDataset(dataset)Working with Resources
dataset.resources is an array of Resource descriptors. Each resource has a data path (local or remote), an optional fileDialect, and either a tableSchema (table resources) or a dataSchema (JSON data resources).
import type { Resource } from "fairspec"
import { loadDataset, loadTable } from "fairspec"
const dataset = await loadDataset("dataset.json")
const usersResource = dataset.resources?.find(r => r.name === "users")
if (!usersResource) throw new Error("Missing users resource")
const users = await loadTable(usersResource)
const frame = await users?.collect()
console.log(frame?.toRecords())The same Resource shape is what loadTable, loadData, loadFile, validateTable, validateData, and validateFile accept — datasets and individual resources share a single API surface.
Common Workflows
Create a dataset from data files
import { inferDataset, saveDataset, validateDataset } from "fairspec"
const dataset = await inferDataset({
resources: [
{ name: "customers", data: "customers.csv" },
{ name: "orders", data: "orders.csv" },
{ name: "products", data: "products.csv" },
],
})
dataset.title = "Sales Database Export"
dataset.description = "Customer orders and product catalog"
await saveDataset(dataset, { target: "./sales-dataset" })
const report = await validateDataset("./sales-dataset/dataset.json")
console.log(report.valid)Clone a remote dataset locally
import { loadDataset, saveDataset, validateDataset } from "fairspec"
const remote = await loadDataset("https://example.com/datasets/climate")
await saveDataset(remote, { target: "./climate", withRemote: true })
const report = await validateDataset("./climate/dataset.json")
if (!report.valid) {
console.error(report.errors)
}Dataset CI check
import { validateDataset } from "fairspec"
const report = await validateDataset("dataset.json")
if (!report.valid) {
for (const error of report.errors) {
console.error(`${error.type}: ${JSON.stringify(error)}`)
}
process.exit(1)
}Examples
Convert a folder to a Zip archive
import { loadDataset, saveDataset } from "fairspec"
const dataset = await loadDataset("./sales-dataset")
await saveDataset(dataset, { target: "./sales-dataset.zip", withRemote: true })Refresh schemas from current data
import { loadDataset, inferDataset, saveDataset } from "fairspec"
const dataset = await loadDataset("dataset.json")
const refreshed = await inferDataset(dataset, { sampleRows: 1000 })
await saveDataset(refreshed, { target: "./dataset-v2" })Aggregate validation errors by resource
import { validateDataset } from "fairspec"
const report = await validateDataset("dataset.json")
const byResource = new Map<string, number>()
for (const error of report.errors) {
const key = String(error.resourceName ?? "?")
byResource.set(key, (byResource.get(key) ?? 0) + 1)
}
for (const [name, count] of byResource) {
console.log(`${name}: ${count} error(s)`)
}Created with ❤ and Livemark