A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth-limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the transmitter’s data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to new and unseen situations. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Furthermore, the proposed framework includes an analytical characterization of the performance gap that results from employing a suboptimal policy learned by the receiver that uses the transmitted semantic information to construct a model of the physical environment. The CSC system utilizes two concepts, namely the integrated information theory principle in the theory of consciousness and the abstract cell complex concept in topology, to precisely express the information content conveyed by the causal states and their relationships. Through this analysis, novel formulations of semantic information, semantic reliability, distortion, and similarity metrics are proposed, which extend beyond Shannon’s concept of uncertainty. Simulation results demonstrate that the proposed CSC system outperforms conventional wireless and state-of-the-art SC systems by achieving better semantic reliability with reduced bits and enabling better control policies over time thanks to the generative AI architecture.