Modern enterprises face an overwhelming volume of big data—continuously generated at high velocity, in diverse formats, from multiple sources, and stored at significant expense. To derive value for AI training and insightful analytics, this heterogeneous data must be properly linked and processed in machine-readable form. Traditional data ingestion pipelines, such as Extract-Transform-Load (ETL/ELT), materialise structured and unstructured data for query processing, but they become inefficient or overloaded when updates occur frequently. This thesis investigates Virtual Knowledge Graphs (VKGs), also known as Ontology-based Data Access (OBDA), as a virtualisation approach for flexible, efficient management of large-scale structured and unstructured data without materialisation. Our core focus is on integrating multidimensional raster data (e.g., images or continuous spatial-temporal phenomena represented as arrays) with relational/vector data (points, lines, polygons) and linked data. Conventional VKG systems mediate OWL ontologies over relational databases via declarative R2RML mappings, translating SPARQL queries over the ontology into optimised SQL executions. However, they lack native support for the multidimensional nature of raster data and seamless integration with vector geometries, which limits their use in geospatial settings such as Earth Observation, Geographical Information Systems, and Urban planning. To overcome these limitations, we propose OntoRaster, a novel extended VKG framework that enables virtual integration and querying of raster, relational, and vector data. OntoRaster leverages domain-specific ontologies and supports on-the-fly query processing, thus minimising expensive data transfers by delegating computations to specialised back-ends. For rigorous evaluation, we introduce comprehensive benchmarks to assess such VKG systems, i.e., OntoRaster on integrated raster-vector data processing workloads, using domain ontologies and synthetically generated, similar, scalable datasets that vary in properties (resolution, coverage, pixel density, polygons, etc.). Finally, we explore semantic enrichment for location-based business intelligence by exposing a SQL-accessible knowledge graph to BI tools. Additionally, we explore an LLM-based system for generating raster-based SPARQL queries based on ontologies from natural language queries. This empowers business experts to obtain actionable insights for location-based decision-making and risk assessment without requiring advanced geospatial expertise. Overall, this dissertation bridges Big Data, Semantic Web, AI and BI communities, advancing scalable, ontology-mediated access to heterogeneous multidimensional spatial-temporal geo data.
Page Responsible: Frank Drewes 2026-03-18