External Resources

Schematic register of the doctoral project outputs, grouped according to the same macro-structure used in the chapter: data preparation, CSV-to-RDF materialisation, data publication and dissemination, quality control and assessment, and the workflow blueprint.

1. Data Preparation 2. CSV-to-RDF Materialisation 3. Data Publication and Dissemination 4. Quality Control and Assessment 5. Workflow Blueprint 6. Publications 7. Terminological Vademecum

1. Data Preparation: Retrieval, Collection, Harmonisation, Metadata Crosswalking

Templates, schema-alignment materials, and retrieval/harmonisation software that support the preparation of structured tabular inputs before RDF materialisation.

CHANGES Data Collection Templates and Samples

Title
Modelli Spreadsheet CHANGES - Acquisizione e Oggetti
Authors
Arianna Moretti; Sebastian Barzaghi
Type
Tabular templates and sample datasets
Formats
CSV and ODS
Scope
Object metadata and digitisation-process paradata

Standardised spreadsheets for collecting exhibition-object metadata and phase-oriented digitisation paradata in CHANGES-related workflows. The deposit combines operational templates and instructional samples so the tabular layer can be reused before RDF generation.

Digital Damaged Ceramics CHANGES Data Collection Extended Templates

Title
Digital Damaged Ceramics – Data Collection Table Templates + CHANGES Field Names Mapping (JSON)
Authors
Arianna Moretti; Madeleine Daste
Type
Extended templates and schema-mapping resources
Formats
CSV and JSON
Scope
Object metadata, process paradata, header-level harmonisation

Project-specific templates translated into an English field vocabulary and paired with JSON mappings from the original CHANGES headers. The package functions as a lightweight schema-translation layer for consistent ingestion and downstream conversion.

OpenCitations Data Sources Converter

Authors
Arianna Moretti; Arcangelo Massari; Elia Rizzetto; Marta Soricetti
Type
Python software and package
Purpose
Crosswalk and preprocessing of heterogeneous scholarly sources into OCDM-aligned tabular outputs
Supported sources
Crossref, DataCite, PubMed, OpenAIRE, JaLC, mEDRA
License / runtime
ISC; Python >=3.9,<3.14

Reusable ingestion software that converts multiple bibliographic and citation sources into structured CSV inputs for OpenCitations META and INDEX. Its modular design isolates source-specific logic while reusing shared validation, storage, and normalisation components.

Early Modern BnF Data Collector and Harmoniser

Full title
Early Modern BnF Data Collector and Harmoniser (Actors and Editions Processing, Analysis, and RDF Graph Generation)
Authors
Arianna Moretti; Iiro Tiihonen; Jonas Fischer
Type
Acquisition, harmonisation, analysis, and RDF-generation scripts
Source
Bibliothèque nationale de France SPARQL retrieval workflow
License / runtime
MIT; Poetry project, Python ^3.11

Repository for producing a BnF-derived Early Modern bibliographic dataset and actor layer with explicit harmonisation diagnostics. It combines staged SPARQL retrieval, unification, duplicate analysis, and final RDF graph generation.

2. CSV-to-RDF Materialisation

Software, mappings, configuration artefacts, and semi-automatic tooling used to turn curated tabular data into RDF under an explicit target model.

Morph-KGChad

Full title
Morph-KGChad (Morph-KGC CHANGES Metadata)
Authors
Arianna Moretti; Sebastian Barzaghi
Type
Open-source CSV-to-RDF materialisation pipeline
Target model
CHAD-AP
Core artefacts
YARRRML mappings, INI configuration, UDFs, orchestrator, monitoring, quality checks

Extension of Morph-KGC for reproducible RDF materialisation of CHANGES-aligned tabular datasets. It packages the executable conversion environment needed to generate CHAD-AP-compliant Turtle graphs from object metadata and process paradata.

Morph-KGChad: INI configuration and YARRRML mapping files

Author
Arianna Moretti
Type
Executable configuration release
Contents
1 INI file and 2 YARRRML mapping files
Scope
Digitisation process data and museum object data

Versioned deposit of the configuration layer used to operationalise Morph-KGChad. It preserves the exact mapping and runtime artefacts required to reproduce the materialisation workflow transparently.

Morph-KGChad Digital Damaged Ceramics Extension

Author
Arianna Moretti
Type
Forked conversion stack for a project-specific extension
Adaptations
English headers, structural drift tolerance, extended controlled terms, material modelling
Output extension
Material representation through crm:P45_consists_of

Forked version of the CHANGES conversion pipeline adapted to the Digital Damaged Ceramics data model and documentation choices. It supports translated templates, broader controlled-term coverage, and a pragmatic material-description extension in RDF.

CHAD-ASK: facilitating domain knowledge formalisation in view of LOD conversion

Authors
Arianna Moretti; Sebastian Barzaghi
Type
Survey-to-mapping generation approach and software
Outputs
YARRRML mapping rules and Morph-KGC INI configuration
Environment
Morph-KGChad / Morph-KGC
Documented in
IRCDL 2026 proceedings paper

Semi-automatic method that converts structured questionnaire answers into executable materialisation artefacts. It reframes mapping authoring as guided conceptual elicitation so non-RML users can still contribute directly to the conversion setup.

3. Data Publication and Dissemination

Released datasets, knowledge-graph dumps, dissemination websites, and service endpoints that expose the results as reusable public outputs.

Aldrovandi Digital Twin CSV datasets

Title
CSV Datasets on Exhibited Objects and Digitisation Process from “The Other Renaissance - Ulisse Aldrovandi and the Wonders of the World” Temporary Exhibition Digital Twin
Contact persons
Sebastian Barzaghi; Giulia Renda; Arianna Moretti
Type
Versioned tabular dataset release
Layers
CHO metadata and digitisation-process paradata

Two CSV datasets that function as the tabular input layer for CHAD-KG in the Aldrovandi digital twin workflow. They capture both object-centred description and process-centred provenance in a form designed for direct Morph-KGChad processing.

CHAD-KG: TTL Serialised RDF Dataset of Exhibited Objects and Digitisation Process

Authors
Sebastian Barzaghi; Arianna Moretti
Type
RDF knowledge graph dump
Format
Turtle
Access model
Zenodo release, SPARQL endpoint, static HTML layer

RDF dump generated from the Aldrovandi tabular datasets through Morph-KGChad and expressed in CHAD-AP. It integrates object description and digitisation-process paradata into a FAIR-oriented, versioned semantic publication.

OpenCitations Index (INDEX) — Citation Dataset + Provenance Dataset

Author / maintainer
OpenCitations
Type
Large-scale open citation and provenance dataset
Licence
CC0
Distributions
CSV, N-Triples, Scholix, website services
Additional 2026 dumps
RDF dump 31306081; RDF dump 31353691; data-source N-Triples 24427051

Open citation database distributing citation links and associated provenance as public dumps and query services. The chapter records both the article-based snapshot and the later updates documented on the official download pages.

Digital Damaged Ceramics - RDF and CSV Dataset

Authors
Madeleine Daste; Arianna Moretti
Type
Curated research dataset
Formats
CSV and Turtle
Versioning
Preserved with provenance tracking

Versioned release of the Digital Damaged Ceramics dataset in both tabular and RDF serialisations. It serves simultaneously as a reusable dissemination output and as a reference instance aligned with the project’s extended CHANGES-derived templates.

Digital Damaged Ceramics website toolchain and reproducibility package

Title
Digital Damaged Ceramics - website files, data analysis files, and website generation software (1.0)
Authors
Arianna Moretti; Madeleine Daste
Type
Website codebase and reproducibility package
Key pages
index.html, sevres.html, mic.html, sparql.html, 3d.html, dataset.html, documentation.html
License
MIT for code; CC BY for dataset distributions on the website

Package containing the website codebase, data-analysis assets, and semi-automated site-generation materials used for project dissemination. It doubles as a reusable template for similar catalogue-based museum projects.

4. Quality Control and Assessment

Explicitly documented assessment output used as a reusable pattern for evaluating the FAIRness of cultural-heritage digitisation workflows.

Aldrovandi Digital Twin FAIRness assessment matrix

Full title
Aldrovandi Digital Twin - FAIRness assessment matrix (3-level model for heritage collections; adapted application)
Authors
Sebastian Barzaghi; Alice Bordignon; Bianca Gualandi; Ivan Heibi; Arcangelo Massari; Arianna Moretti; Silvio Peroni; Giulia Renda
Type
Assessment matrix
Location
Table 2 in the Data Intelligence article
Function
Reusable evaluation pattern across objects, object-level metadata, and metadata records

Compact evidence table adapting an existing heritage-oriented FAIR rubric to the Aldrovandi Digital Twin case. It is included as a reusable quality-control pattern for auditable, cross-case comparison of implementation choices.

5. Workflow Blueprint

Persistent methodological resource that coordinates the executable outputs and defines the transferable operating model validated through the case studies.

TABular Semantic Enhancement Blueprint (TAB-SEB)

Author
Arianna Moretti
Type
Versioned workflow blueprint on Protocols.io
Licence
CC0
Partially documented in
Moretti, Arianna. “Defining A Workflow For Semantic Enhancement Of Cultural Heritage Metadata.” IRCDL 2026 proceedings.
Note
Remove the reviewer-only link if this page will remain public.

Citable, technology-agnostic but execution-oriented blueprint that coordinates tabular preparation, RDF materialisation, dissemination, and re-iterative quality control. It is the stable methodological research object to which the case-study implementations are linked as executable instantiations.

6. Publications

Complete register of peer-reviewed publications produced during the doctoral research: the four contributions on which the dissertation is directly based (Part II), and the additional outputs whose content substantially informs the thesis.

Publications Included in the Thesis (Part II)

I

A Proposal for a FAIR Management of 3D Data in Cultural Heritage: The Aldrovandi Digital Twin Case

Barzaghi, S., Bordignon, A., Gualandi, B., Heibi, I., Massari, A., Moretti, A., Peroni, S., & Renda, G. (2024). Data Intelligence.

Author Contributions

Moretti contributed primarily to metadata management tasks (data collection and formalisation, semantic modelling, software for materialisation and RDF conversion, validation, publication and dissemination with attention to reproducibility documentation). All co-authors participated in writing; coordination and FAIR-specific leadership were primarily managed by Gualandi, while 3D domain aspects were primarily managed by Bordignon.

II

CHAD-KG: A Knowledge Graph for Representing Cultural Heritage Objects and Digitisation Paradata

Barzaghi, S., Heibi, I., Moretti, A., & Peroni, S. (2026). International Journal on Semantic Web and Information Systems, 22(1), 1–46.

Author Contributions

Sebastian Barzaghi and Arianna Moretti contributed equally to the underlying project. Barzaghi led the modelling of the CHAD-AP application profile; Moretti planned, developed, and tested the software enabling RDF materialisation (Morph-KGC extension), contributed substantially to tabular data collection and formalisation, to the definition and publication of reproducible templates, and to the maintenance of the research outputs. Further, Moretti played a primary role (on a par with Barzaghi) in drafting and editing the manuscript.

III

Formalising Cultural Heritage Metadata with a Multidisciplinary Approach: Enriching the CHANGES Workflow for Enhancing a Museum Collection about Ceramics through a FAIR Digitisation Process

Moretti, A., & Daste, M. (2025). The Eurographics Association.

Author Contributions

Daste acted as domain expert (data production and collection; part of 3D digitisation). Moretti handled the technical implementation (extending RDF materialisation software, SPARQL endpoint, predefined queries, website, publication of outputs on GitHub and Zenodo, and contributions to the 3D digital exhibition setup). Metadata formalisation and manuscript writing were shared equally.

IV

The OpenCitations Index: Description of a database providing open citation data

Heibi, I., Moretti, A., Peroni, S., & Soricetti, M. (2024). Scientometrics.

Author Contributions

Moretti contributed on an equal basis with the co-authors to conceptualisation, data curation, formal analysis, methodology, software, validation, visualisation, and writing (original draft and review/editing). She contributed substantively to the (re)design and standardisation of ingestion workflows across sources, including management of the OC-DS-Converter software and source integrations.

Additional PhD Publications and Research Outputs

The publications below describe the main outcomes of the doctoral research. Their content contributes substantially to this thesis; concepts presented in these works were adapted and partially incorporated into the manuscript.

Journal and Book Publications

Barzaghi, S., Bordignon, A., Collina, F., et al. (2025). A Reproducible Workflow for the Creation of Digital Twins in the Cultural Heritage Domain. Transformations: A DARIAH Journal (ahead of print).
DOI
Balzani, R., Barzaghi, S., Bitelli, G., et al. (2024). Saving temporary exhibitions in virtual environments: The Digital Renaissance of Ulisse Aldrovandi – Acquisition and digitisation of cultural heritage objects. Digital Applications in Archaeology and Cultural Heritage, 32, e00309.
DOI
Moretti, A., Soricetti, M., Heibi, I., Massari, A., Peroni, S., & Rizzetto, E. (2024). The Integration of the Japan Link Center's Bibliographic Data into OpenCitations. Journal of Open Humanities Data, 10.
DOI
Malínek, V., Umerle, T., Gray, E., et al. (2024). Open Bibliographical Data Workflows and the Multilinguality Challenge. Journal of Open Humanities Data, 10.
DOI

Conference Publications

Moretti, A. (2026). Defining A Workflow For Semantic Enhancement Of Cultural Heritage Metadata. Proceedings of IRCDL 2026 (CEUR-WS).
In press
Moretti, A., Tiihonen, I. L. I., & Fischer, J. P. (2026). Introducing a data harmonisation workflow exploiting the BNF SPARQL service to produce and disseminate a research-oriented bibliographic dataset concerning the Early Modern period. Proceedings of IRCDL 2026 (CEUR-WS).
In press
Barzaghi, S., Moretti, A., Heibi, I., & Peroni, S. (2026). CHAD ASK: Experimenting with a semi-automatic approach based on online surveys to formalise unstructured knowledge in Linked Data. Proceedings of IRCDL 2026 (CEUR-WS).
In press
Barzaghi, S., Colitti, S., Moretti, A., & Renda, G. (2025). From Metadata to Storytelling: A Framework for 3D Cultural Heritage Visualization on RDF Data. Proceedings of AIUCD 2025.
DOI
Barzaghi, S., Heibi, I., Moretti, A., & Peroni, S. (2025). Developing Application Profiles for Enhancing Data and Workflows in Cultural Heritage Digitisation Processes. The Semantic Web – ISWC 2024 (pp. 197–217).
DOI
Moretti, A., Heibi, I., & Peroni, S. (2024). A Workflow for GLAM Metadata Crosswalk. Proceedings of AIUCD 2024.
DOI
Barzaghi, S., Collina, F., Fabbri, F., et al. (2023). Digitisation of Temporary Exhibitions: The Aldrovandi Case. The Eurographics Association.
DOI

7. Terminological Vademecum

A terminological reference to facilitate the reading of the thesis. Given the interdisciplinary nature of the research — at the crossroads between cultural heritage studies, information sciences, and Semantic Web technologies — certain terms are used with specific meanings that do not always correspond to their common usage within the individual disciplines concerned. Definitions are clustered by conceptual coherence and are not intended to be exhaustive or universally valid, but serve as practical clarifications to disambiguate concepts frequently mentioned in this work.

Semantic Web Technologies

The Semantic Web was originally conceived by Tim Berners-Lee as an extension of the Web, where all the information would be read, interpreted and processed autonomously by the machines, enabling computers and human operators to better cooperate (Berners-Lee et al. 2001). The paradigm is coordinated by the World Wide Web Consortium (W3C), the international standards body responsible for the protocols and languages supporting the Semantic Web. At its core, the Resource Description Framework (RDF) is a W3C standard representing information as triples — composed of a subject, a predicate, and an object — thus enabling the interconnection among datasets from independent sources (Berners-Lee et al. 2001; Cyganiak et al. 2014). RDF data can be serialised in multiple syntactic formats (i.e., serialisations, such as Turtle, N-Triples, JSON-LD), which provide different representations of the same underlying graph model.

On top of the RDF data model, the W3C standard OWL (Web Ontology Language) supports the expression of complex logical constraints or class hierarchies, providing a richer vocabulary for defining ontologies, including classes, properties, cardinality restrictions, and logical inference rules. This allows for making domain knowledge not only representable but computationally reasoned over (W3C 2012). On the other hand, SKOS (Simple Knowledge Organization System) offers a model for encoding thesauri and classification systems, although with less expressiveness. These foundational standards are complemented by domain-specific ontology suites, including: SPAR (Semantic Publishing and Referencing), PROV-O (W3C Provenance Ontology), DCAT (Data Catalog Vocabulary), CIDOC-CRM (Conceptual Reference Model for cultural heritage), and CiTO (Citation Typing Ontology), each providing community-agreed vocabularies of classes and properties for specific domains.

Ontology and Controlled Vocabulary

An ontology is a shared, formal conceptualisation of a domain which defines classes, properties, and the logical relationships between them in a machine-interpretable way (Guarino et al. 2009; Gruber 1993). Thus, it provides the structural schema of a knowledge representation by expressing existing categories, how they relate to one another, and what constraints govern their relations. A controlled vocabulary, on the other hand, is a standardised list of preferred terms used to populate the values of descriptive fields. The Getty Art and Architecture Thesaurus (AAT), for instance, supplies normalised terms for object typologies, techniques, and materials; the Union List of Artist Names (ULAN) provides authority records for artists and makers. Ontology and controlled vocabulary operate at complementary layers of the same descriptive infrastructure: while the former defines the property (e.g., crm:P2_has_type), the latter supplies the authoritative term that fills it (e.g., AAT earthenware at http://vocab.getty.edu/aat/300140803).

Related but distinct from a controlled vocabulary, an authority resource is a reference system that provides globally unique identifiers for named entities (such as persons, organisations, places, and concepts), enabling unambiguous disambiguation across datasets. An example is the Virtual International Authority File (VIAF), which aggregates authority data for personal and corporate names across library systems worldwide. Wikidata (Vrandečić & Krötzsch 2014) functions as a collaborative, multilingual knowledge base whose entity identifiers are widely reused as authority references in ontology-based descriptions.

Linked (Open) Data

Linked Data refers to structured data published on the Web following programmatic approaches to make it interlinkable and queryable by machines. The four main principles governing Linked Data include:

  • the use of URIs as globally unique identifiers for things;
  • the use of HTTP URIs so that those names can be looked up;
  • providing useful information when a URI is dereferenced;
  • including links to other URIs for enhancing data discovery (Bizer, Heath & Berners-Lee 2009).

Although RDF is the primary technology for implementing Linked Data, not every RDF dataset constitutes Linked Data. A graph that uses only blank nodes, without dereferenceable URIs or links to external datasets, does not fulfil the paradigm. Linked Data can be defined as Linked Open Data when the dataset is, in addition, freely accessible on the Web and released under an open licence (Bizer et al. 2009).

Knowledge Graph and KG Materialisation

A knowledge graph is a structured, graph-based representation of entities and their relationships, typically encoded in RDF (Hogan et al. 2022). KG materialisation is the process of generating RDF graphs from source data — in this thesis, primarily from tabular (CSV) inputs — by applying a set of transformation rules. The process can be carried out through declarative mapping languages (e.g., RML, YARRRML), in which the transformation rules are expressed as human-readable, auditable specifications independently of any execution engine, or through code-oriented approaches, where the transformation is hard-coded in a programming language (e.g., Python, PHP). Declarative approaches tend to be more transparent and easier to validate collaboratively, while code-oriented solutions may offer greater flexibility for complex edge cases.

Once materialised, an RDF graph is typically stored in a triplestore, a database system optimised for storing and retrieving RDF triples. Triplestores can be queried through a SPARQL endpoint, a web interface exposing the database's content and allowing users and applications to access the graph using the SPARQL Protocol and RDF Query Language (SPARQL). For contexts in which direct SPARQL interaction is not appropriate — for example, when the intended audience lacks technical expertise or if integration with external applications is required — a REST API (Representational State Transfer Application Programming Interface) can be layered on top of the triplestore, exposing the data through structured HTTP requests and enabling access via standard web protocols without requiring Semantic Web expertise.

Metadata Schema, Application Profile, and Crosswalk

A metadata schema is a structured set of elements used to describe resources within a given domain. An application profile is a metadata schema built by selecting, constraining, and — if necessary — extending elements from existing schemas, to meet the requirements of a specific application context (Nilsson et al. 2008). The term does not prescribe a single implementation strategy: constraints may be expressed through formal mechanisms, such as OWL restrictions, or SHACL (Shapes Constraint Language), a W3C standard that defines machine-actionable validation rules (i.e., shapes) against which RDF graphs can be checked independently of the underlying ontology. In defining application profiles, the choice of encoding constraints as SHACL shapes is coherent with a design logic aimed at preserving permissiveness at the ontological level while enforcing application-specific rules at the validation layer. A metadata schema crosswalk is the systematic mapping of elements, semantics, and syntax from one schema to another, intended to enable interoperability between datasets described by different standards (Chan & Zeng 2006).

Metadata and Paradata

In this thesis, the term metadata refers to descriptive information about an object — such as its dating, authorship, technique, materials, and dimensions. Paradata, on the other hand, denotes the metadata produced by and about the digitisation process itself: the actors involved, the dates of each acquisition and processing phase, the instruments and settings used, and the decisions taken during modelling, post-processing, and publication. The two categories have different provenance, different update cycles, and may require different licensing conditions.

Metadata Provenance and Change Tracking

Metadata provenance refers to the documentation of the origin, history, and transformations of a metadata record — that is, not only where the described object comes from, but how, when, by whom, and based on what sources the metadata record was created or modified (Koster & Woutersen-Windhouwer 2018). Change tracking is the systematic recording of updates to metadata records over time, enabling traceability and reproducibility.

Workflow, Blueprint Workflow, and Executable Implementation

A workflow is a structured sequence of activities that transforms inputs into outputs (Belhajjame et al. 2015). A blueprint workflow is a transferable, technology-agnostic, and environment-independent specification of such a sequence: it defines phases, decision points, and quality-control checkpoints at a level of abstraction that is independent of any specific toolchain. An executable implementation is a concrete instantiation of the blueprint (or of a part of it) in a specific technical environment. In this thesis, executable implementations also served as validation artefacts and evidence of feasibility, although they are not the primary methodological contribution.

Actors

The workflow described in this thesis foresees the involvement of different professional profiles, which may partially overlap and be embodied by the same individual:

  • domain experts, who contribute contextual and interpretative knowledge about the collection and its significance;
  • metadata experts, who focus on descriptive accuracy, authority control, and crosswalk decisions;
  • Semantic Web specialists, who handle RDF formalisation, ontology alignment, data validation and publication;
  • digital humanists, who occupy an integrative role, mediating between domain and technical perspectives while maintaining methodological coherence across the pipeline.

Semantic Web specialists typically also bring metadata expertise, whereas the converse does not necessarily apply: a professional can be a metadata expert without extensive knowledge of Semantic Web technologies, while being a Semantic Web specialist generally implies metadata competence. The digital humanist, by contrast, is a hybrid figure whose primary role is to facilitate interaction between technical and domain profiles. In some project configurations, the digital humanist may also be the only member of the team with a technical background, provided that they are able to manage autonomously the complexities associated with the use of the specific technologies involved.

FAIR Principles and Open Science

The FAIR principlesFindability, Accessibility, Interoperability, and Reusability — provide guidelines for data governance applicable across disciplines and data types (Wilkinson et al. 2016).

  • Findability implies that data and metadata are assigned globally unique and persistent identifiers and are registered in searchable resources;
  • Accessibility implies that data can be retrieved via open, standardised protocols, and that metadata remains accessible even when the data itself is no longer available;
  • Interoperability implies that data uses formal, shared knowledge-representation languages and references community-standard vocabularies;
  • Reusability implies that data is richly described, released with a clear licence, and associated with detailed provenance, so that it can be understood and reused by others in future contexts.

Open Science is a scientific movement that seeks to make scholarly knowledge, data, methods, software, and evaluation practices as accessible, transparent, and reusable as possible throughout the research lifecycle (UNESCO 2021). FAIRness and openness are complementary but distinct concepts, and the one does not imply the other or vice versa. In this thesis, FAIR alignment was treated as a design requirement integrated into each phase of the workflow, assessed against the framework proposed by Wilkinson et al. (2025). Openness was sought throughout the research process by preferring open-source tools to implement the workflow's steps, releasing the developed software under open licences, and making freely accessible — whenever possible — academic material documenting the workflow.