TAB-SEB — Related Work

Related Approaches Comparative Analysis Positioning

Note on scope. The comparison with Guasch, Lodi & Van Dooren (2022) and the WHOW Toolkit was recommended during the pre-examination review process as evidence of methodological convergence across independent research lines and across domains. The analysis below responds directly to that recommendation, situating TAB-SEB within a broader landscape of declarative, RML-based knowledge graph construction workflows.

Related Approaches

Two external approaches are selected for comparison: a peer-reviewed conference contribution proposing a declarative pipeline for public procurement data, and its subsequent toolkit evolution applied to environmental open data.

Semantic Knowledge Graphs for Distributed Data Spaces: The Public Procurement Pilot Experience

DOI (ISWC 2022)

Authors: Guasch, T.; Lodi, G.; Van Dooren, S.
Venue: ISWC 2022 — International Semantic Web Conference
Domain: Public procurement data (EU open data)
Core contribution: Declarative RML-based pipeline for converting heterogeneous procurement records into RDF knowledge graphs within the International Data Spaces (IDS) framework

This paper describes a pilot application of a systematic CSV-to-RDF conversion pipeline within the context of EU public procurement open data. The workflow is formalised as an OWL ontology, uses RML for declarative mapping, and operates within the IDS architectural model. The approach targets machine-actionability and interoperability from the outset, with the workflow specification serving both as documentation and as an executable artefact.

WHOW Toolkit — Water and Health Open data Workflow

GitHub repository Zenodo release

Domain: Environmental open data (water quality, health)
Relation to above: Direct evolution of the Guasch et al. pipeline, extended and operationalised
Key additions: Apache Airflow orchestration, SHACL validation, Docker Compose containerisation
Input scope: Heterogeneous formats: XML, JSON, relational databases, CSV

The WHOW Toolkit represents the operationalised and extended evolution of the Guasch et al. approach, applied to the WHOW project's environmental open data objectives. It retains the RML declarative mapping paradigm and OWL workflow formalisation while introducing fully automated orchestration via Apache Airflow, structural validation via SHACL shapes, and a reproducible execution environment via Docker Compose. This combination makes WHOW one of the most complete publicly available implementations of a declarative RDF construction workflow.

Comparative Analysis

The table below structures the comparison across nine dimensions, grouped into convergences and differences. The table is horizontally scrollable to accommodate the addition of further approaches in future revisions.

← scroll horizontally to view all columns →

Dimension	TAB-SEB	Guasch, Lodi & Van Dooren (ISWC 2022)	WHOW Toolkit
Convergences
Primary goal	Systematic, reproducible conversion of tabular cultural heritage metadata into RDF knowledge graphs, with explicit FAIR assessment	Systematic conversion of heterogeneous procurement records into RDF KGs within the IDS data-space framework	Systematic, reproducible conversion of environmental open data (water, health) into RDF KGs with automated orchestration
Mapping paradigm	✓ Present Declarative RML via YARRRML; executed by Morph-KGC	✓ Present Declarative RML mappings	✓ Present Declarative RML mappings
SHACL validation	◎ Planned Identified as a future development; not yet integrated into the current specification	— Not documented	✓ Present SHACL shapes used for structural validation of output graphs
FAIR orientation	✓ Present Explicit FAIR assessment applied (62/79, 78.5%); FAIR compliance is a primary evaluation criterion	~ Implicit FAIR-oriented metadata quality; not evaluated against an explicit rubric	~ Implicit FAIR-oriented; open data publication model
Differences
Workflow formalisation	Prose-based Human-readable blueprint on Protocols.io; CWL encoding identified as a primary future development	OWL ontology Workflow specified as a formal OWL ontology; machine-readable and registrable	OWL ontology OWL formalisation retained and extended from Guasch et al.
Orchestration model	Human-in-the-loop Deliberate design choice: curatorial oversight at each phase, reflecting GLAM production constraints	Semi-automated Partially automated pipeline; degree of human intervention not fully specified	Fully automated Apache Airflow DAG-based orchestration; end-to-end automated execution
Input scope	Tabular / CSV only Deliberately constrained to tabular/CSV inputs; a principled boundary of the blueprint's current scope	Heterogeneous XML, JSON, relational databases, CSV	Heterogeneous XML, JSON, relational databases, CSV
Containerisation	Not yet Identified as a current limitation; no containerised execution environment provided	— Not documented	Docker Compose Full-stack Docker Compose configuration for reproducible deployment
Domain	Cultural heritage / GLAM (museum objects, digitisation paradata, bibliographic metadata)	Public procurement (EU open data, IDS data spaces)	Environmental open data (water quality, health monitoring)

Positioning

Methodological Convergence

The comparative analysis reveals substantial methodological convergence between TAB-SEB and the WHOW/Guasch et al. line of work, most notably in the shared commitment to declarative RML-based mapping as the core mechanism for CSV-to-RDF conversion and in the shared orientation towards FAIR data quality. This convergence is significant precisely because it emerges independently across research groups and across domains — public procurement, environmental data, and cultural heritage — suggesting that the declarative mapping paradigm is consolidating as a cross-domain standard for reproducible knowledge graph construction.

Principled Divergences

The primary divergences between TAB-SEB and the WHOW line of work fall into two distinct categories. The first comprises aspects that TAB-SEB explicitly acknowledges as current limitations and which directly inform its future development agenda: the absence of a machine-readable workflow formalisation (addressable via CWL encoding) and the absence of a containerised execution environment. The second category comprises deliberate design choices that reflect the specific constraints and epistemological requirements of GLAM production contexts: the human-in-the-loop supervision model is not a limitation to be overcome but a principled response to the curatorial judgements necessarily embedded in cultural heritage data processing; the restriction to tabular/CSV input is similarly a scope boundary rather than a technical constraint.

Implications for Future Development

The WHOW Toolkit's combination of OWL workflow formalisation, Apache Airflow orchestration, and SHACL validation constitutes a reference implementation against which TAB-SEB's future development trajectory can be measured. The adoption of CWL encoding (identified in the Future Development section) would partially close the gap in workflow formalisation, while the addition of SHACL validation would align TAB-SEB's quality control apparatus with both WHOW and established semantic web best practice. Full automation via a pipeline orchestrator remains outside TAB-SEB's design scope for as long as the blueprint targets GLAM contexts requiring human curatorial oversight.