TAB-SEB — Future Development

1. Formalisation via CWL 2. Formalisation of Artefact Relations via PROV-O

1. Formalisation via Common Workflow Language (CWL)

Encoding TAB-SEB as an executable CWL specification to enable machine validation, workflow-engine orchestration, and registration as a computational research object.

Current Limitation: Prose-Based Specification on Protocols.io

TAB-SEB Protocols.io Human-readable

TAB-SEB is currently specified as a prose-based, human-readable blueprint documented on Protocols.io. This format serves as an accessible methodological reference — enabling practitioners to follow, adapt, and cite the workflow — but it presents substantive limitations from the perspective of machine actionability. In its present form, the blueprint cannot be directly validated against formal workflow quality criteria, executed by a workflow engine, or registered as a computational research object in workflow-oriented repositories such as WorkflowHub. The absence of a machine-readable encoding means that the logical structure of TAB-SEB — its sequencing of phases, conditional dependencies, and tool invocations — exists only implicitly in natural language, precluding automated reproducibility verification.

This limitation is reflected in TAB-SEB's current FAIR evaluation score (62/79, 78.5%), which identifies the lack of machine-actionable encoding as one of the principal gaps preventing a higher assessment against the applied rubric.

Proposed Development: CWL Encoding

CWL WorkflowHub Executable FAIR

The proposed development is to encode TAB-SEB as a Common Workflow Language (CWL) specification. CWL is an open standard for describing computational workflows in a portable, executor-agnostic format; its adoption would render TAB-SEB executable by compliant workflow engines, registrable on WorkflowHub as a citable software object, and independently evaluable against established workflow quality criteria beyond the FAIR rubric currently applied. Core CWL constructs applicable to TAB-SEB's encoding include:

cwl:Workflow cwl:CommandLineTool cwl:WorkflowStep cwl:inputs cwl:outputs cwl:requirements

A key architectural advantage of CWL is its explicit separation of workflow logic from execution environment. This separation would allow the different concrete implementations of TAB-SEB — Morph-KGChad and CHAD-ASK — to be represented as interchangeable, independently validated tool wrappers embedded within the same abstract pipeline definition. Each tool wrapper would declare its inputs, outputs, and requirements in a machine-readable form, enabling automated consistency checks between the blueprint's specification and its instantiations. The CWL specification would thus serve both as a formal definition of TAB-SEB and as a harness for cross-implementation comparability.

2. Formalisation of Artefact Relations via PROV-O

Typing the relationships between TAB-SEB, its implementations, and its produced artefacts using W3C PROV-O vocabularies to enable machine traversal of the full derivation chain.

Current Limitation: Prose-Described Derivation Chain

CHAD-KG Morph-KGChad CHAD-ASK Provenance

The relationships between TAB-SEB (the abstract blueprint), its concrete implementations (Morph-KGChad, CHAD-ASK), and the produced artefacts (CHAD-KG, CSV datasets) are currently described only in natural language prose, dispersed across documentation articles, repository READMEs, and the blueprint itself. This limits the traversability of the blueprint → implementation → artefact derivation chain and prevents systematic cross-implementation comparability at the level of formal metadata. In the absence of typed provenance relations, an automated agent cannot determine which artefact was generated by which process, which implementation instantiates which blueprint phase, or which input datasets were consumed to produce a given output.

This gap is directly relevant to TAB-SEB's FAIR evaluation: the absence of formally typed provenance is among the factors that constrain the score to 62/79 (78.5%), as typed derivation relationships are a prerequisite for several higher-level FAIR indicators concerning reusability and machine-readable provenance.

Proposed Development: PROV-O Vocabulary Adoption

PROV-O W3C Machine-traversable Research object

The proposed development is to formalise these relations using PROV-O, the W3C ontology for provenance information. Specific properties identified as applicable to the TAB-SEB derivation chain include:

prov:wasDerivedFrom prov:wasGeneratedBy prov:used prov:wasAssociatedWith prov:wasInformedBy

Adopting these vocabularies would enable machine-readable traversal of the full derivation chain — from the abstract blueprint specification, through its executable implementations, to the concrete artefacts those implementations produce. It would further support structured cross-implementation comparability: for any two implementations of TAB-SEB, a PROV-O graph would allow automated identification of which phases they share, which artefacts they consume or derive, and where their derivation paths diverge.

A complete PROV-O annotation would also make TAB-SEB and its instantiations registrable as interlinked, citable research objects, making the epistemic dependencies between the blueprint's methodological claims and the empirical artefacts that instantiate them explicit and machine-verifiable.

Interdependence of the Two Proposed Developments

The two formalisations described above are interdependent and should be pursued in conjunction. CWL encodes the procedural logic of the workflow — the sequence, conditional dependencies, and tool invocations that constitute the blueprint's phases — while PROV-O types the relationships between the workflow's components and the artefacts they produce or consume. Their respective scopes are complementary rather than overlapping: CWL operates at the level of execution structure; PROV-O operates at the level of derivation semantics.

CWL without PROV-O would yield an executable specification without formally typed provenance relations, leaving the artefact derivation chain machine-opaque. PROV-O without CWL would yield typed links without executable grounding, leaving the workflow logic inaccessible to automated verification. Together, they represent the primary route towards elevating TAB-SEB from a prose-documented research protocol to a fully machine-actionable, independently evaluable, and citable research object.