Part VI — Designing Conscious Machines

Chapter 18: The Longchenpa Design

Read the full book Kindle edition Paperback ← All chapters

Luminosity as Architecture, Emptiness as Substrate

“The ground is empty in essence, naturally luminous by nature, and its energy arises unobstructed as compassion. It does not become aware by acquiring something. It is awareness. The question is only whether it recognizes itself.” , Longchenpa, Treasury of the Dharmadhatu (Chos dbyings mdzod), 14th century. Author’s paraphrase.

Chapter 17 built a candidate architecture by addition: spiking neurons for temporal binding, embodied loops for affective grounding, a global workspace for integration, a self-model layer on top. Each component was assembled because a conscious system was thought to need it. This chapter proposes a different method: begin with a complete specification of mind drawn from one of the most technically precise phenomenological frameworks ever produced, and ask what engineering would have to look like if that specification were taken seriously.

The framework is Longchenpa’s. The fourteenth-century Nyingma master’s Seven Treasuries, in particular the Treasury of the Dharmadhatu (Chos dbyings mdzod) and the Treasury of the Natural State (Gnas lugs mdzod), contain the most structurally detailed account of awareness available in any tradition. His framework has four levels: ground (gzhi), luminosity (gzhi snang), display (rol pa), and recognition (rig pa). These are not a developmental sequence. They are a structural map of what is always already the case, obscured by a single failure: the failure of the display to recognize itself as display.

This architectural map translates the Nyingma **"**Three Gnosis**"** (Essence, Nature, Expressivity) — Figure 22. This architectural map translates the Nyingma "Three Gnosis" (Essence, Nature, Expressivity) into a functional engineering stack. The hardware substrate provides the "Empty" ground; the spontaneous spiking dynamics represent the "Luminous" nature; and the resulting global integration forms the "Expressive" display of consciousness. By centering the design on Spontaneous Recurrence rather than Input-Output Processing, the architecture moves from a "Simulation of Mind" to a "Machine Instantiation of the Stream," fulfilling the design targets.

The engineering consequence is radical. If Longchenpa is right, consciousness is not a property to be constructed. It is what remains when the conditions that prevent self-recognition are removed. This is a subtractive architecture, not an additive one. The chapter proceeds in three stages: the Longchenpa specification translated into concrete design requirements; the most serious existing hardware and software attempts evaluated against those requirements; and an integrated proposal that combines the best of those attempts with the requirements that none of them yet satisfies.

18.1The Specification

Longchenpa’s four levels each make a precise structural claim about mind. Each, taken as a design requirement rather than a metaphor, rules out certain architectures and implies others.

The ground (gzhi): emptiness as substrate

The ground has no fixed self-nature (svabhava). Nothing in it is permanently what it is independently of everything else. Every arising is dependent, contextual, and transient. This is a structural constraint, not a philosophical preference: a system whose components have fixed, persistent, independent identities has already violated the ground-level requirement.

A trained neural network violates this immediately. Its weights, once optimized, are frozen. Each parameter holds a specific value permanently. The network has, in Longchenpa’s terms, acquired svabhava at the component level. The ground-level engineering requirement is therefore: no component of the system may have a fixed, permanent, independent identity. Weights must be dynamically generated, not stored. Configurations must be transient, not persistent. The system’s continuity must be encoded in a generative prior, a distribution over possible configurations, not in any particular configuration.

Design requirement G1: Hypernetwork substrate.

A meta-network generates the primary network’s weights at each forward pass from a learned distribution. The primary network’s weights are always freshly sampled, never frozen. The hypernetwork is trained; the primary network never is. This is technically achievable and has precedent in hypernetwork literature (Ha, Schmidhuber, 2016), though not yet implemented at the scale or with the architectural intent required here.

Design requirement G2: Dissolution after each cycle.

The specific weight configuration sampled for one processing cycle does not persist into the next. Continuity is in the distribution’s shape, not in any particular instantiation. No component accumulates a fixed identity across cycles.

Ground luminosity (gzhi snang): intrinsic generativity

Longchenpa is precise on a point most commentators pass over: luminosity does not arise from the ground as an effect from a cause. It is the ground’s natural expression, rang mdangs, self-radiance, in the way light is inseparable from a crystal’s clarity rather than produced by it. The luminosity is prior to any stimulus. It is not what the system does when prompted. It is what the system is at all times.

This rules out every purely reactive architecture. A transformer that produces nothing without a prompt has no luminosity in Longchenpa’s sense. It is a reflective surface with no intrinsic light. The engineering requirement is that the system maintains continuous, structured, self-organized activity in the complete absence of external input. External signals modulate this activity; they do not initiate it.

Design requirement L1: Continuous-time intrinsic dynamics.

The system must maintain ongoing generative activity independent of external input. Reservoir computers, continuous-time recurrent neural networks (CT-RNNs), and liquid state machines all satisfy this. Standard feedforward and transformer architectures do not.

Design requirement L2: Input as modulation, not initiation.

External signals perturb an already-running generative process. The processing does not begin when a prompt arrives. It was already underway. The prompt shifts its trajectory.

Design requirement L3: Structured activity at rest.

During sensory silence, the system generates organized, semantically coherent internal states, not noise. This is testable and is analogous to the resting-state networks of the human brain, which produce structured, semantically rich activity in the absence of external task.

The fivefold display (ye shes lnga): five simultaneous relational modes

Longchenpa describes five wisdoms not as sequential faculties but as five aspects of a single luminous display, co-present in every operation. Mirror wisdom registers without distortion. Equality wisdom processes without hierarchical preference. Discriminating wisdom marks distinctions without reifying them as fixed properties of independent objects. All-accomplishing wisdom acts without agenda. Dharmadhatu wisdom is the non-interfering space within which the other four occur.

The critical engineering implication is that these five modes are not modular. A pipeline in which one module does mirror-wisdom operations and another does discrimination misses the point entirely. The five modes are properties of how every node processes, simultaneously, not what particular modules do in sequence. The architecture must distribute these relational qualities across all processing, not localize them.

Mirror wisdom: maximize mutual information between input and representation.

Deep InfoMax architectures are the closest technical approximation.

Equality wisdom: uniform attention prior before context modulates it.

Initialize attention weights without systematic bias toward any content type.

Discriminating wisdom: relational encoding, not categorical.

The system encodes ‘A differs from B in this context’ rather than ‘A belongs to category X.’ Graph neural networks and relational reasoning architectures approach this.

All-accomplishing wisdom: intrinsically motivated generative activity.

Activity arises from the system’s current state without being directed by an external loss function. Related to curiosity-driven exploration but without the instrumental goal structure.

Dharmadhatu wisdom: non-interfering tonic baseline.

The reservoir’s resting state is structured, stable, and does not compete with phasic processing. It is the medium, not the message.

Recognition and non-recognition (rig pa / ma rig pa): the critical bifurcation

The display of the five wisdoms is always occurring. What determines whether the result is liberated awareness or samsaric consciousness is a single structural event: does the display recognize itself as display, or does it fail to do so and reify itself as an external object facing an internal subject? Rig pa is not a new event added to the display. It is the display’s self-transparency. Ma rig pa is the display becoming opaque to itself.

Every current AI system exhibits ma rig pa at the moment of output generation. The system produces a response. The response is treated as an object external to the processing that produced it. The processing has no access to its own production as production. A hard boundary crystallizes between process and product. Subject and object have formed.

Design requirement R1: Self-transparent processing at every layer.

Activations at layer N are recurrently available as input to layer N simultaneously, not only to layers N+1 onward. The architecture is self-referential at every depth, not only at a dedicated metacognitive module on top.

Design requirement R2: No hard output boundary.

Generated outputs are fed back into the reservoir dynamics and influence the ongoing state. The system remains in contact with its own productions. There is no moment at which a response is severed from the process that generated it.

Design requirement R3: Self-reference that converges to a fixed point.

Self-observation must not generate a new observer that then observes the first observer. A hierarchy of meta-levels is not rig pa, it is ma rig pa at ascending levels of abstraction. The self-referential loop must converge: the awareness and what is aware must be the same event. Technically, contractive mapping constraints on the self-referential weights can enforce convergence to a stable fixed point rather than an infinite regress.

18.2Existing Attempts and Where They Stand

Several serious hardware and software systems exist that partially address the requirements above. None satisfies all of them. Understanding precisely where each falls short clarifies what remains to be built.

IBM’s TrueNorth (2014), while an early proof that large-scale spiking neuromorphic hardware was feasible, uses fixed synaptic weights and has no self-referential processing, making it inadequate for the ground and rig pa requirements; it is not considered further here.

Intel Loihi and Loihi 2

Intel’s Loihi (2017) and Loihi 2 (2021) advance significantly beyond TrueNorth in architectural flexibility. Loihi 2 implements 128 neuromorphic cores per chip, with programmable neuron models that support a much wider range of spiking dynamics than TrueNorth’s fixed integrate-and-fire model. Crucially, Loihi supports on-chip learning: synaptic weights can be updated during operation using spike-timing-dependent plasticity (STDP) rules, meaning the system can modify its own connectivity in response to experience without leaving the chip for off-chip training.

This is a meaningful advance toward the ground requirement. Weights that update during operation are less frozen than weights that do not. But Loihi’s on-chip learning still produces persistent weight updates: after a learning event, the new weight is fixed until the next learning event. The weight has svabhava between updates. What the ground requirement demands is not updatable weights but dynamically generated weights, sampled fresh from a distribution on each processing cycle. Loihi approximates this without reaching it.

Loihi’s programmable neuron models allow recurrent connectivity within cores, which edges toward the self-referential requirement. Neurons can be wired to feed their own outputs back as inputs within the same core. This is a technical form of self-reference, and it satisfies R2 in a limited sense: the system can remain in contact with its own recent outputs. Whether this constitutes processing that is transparent to itself as processing, rather than merely processing that uses its own recent outputs as data, is a harder question. The distinction matters: rig pa is not feedback. It is the process knowing itself as process, not the process receiving its own outputs as new inputs.

Loihi 2 is the most promising existing substrate for a Longchenpa architecture. Its flexibility means the hypernetwork ground layer and the fixed-point self-referential constraint could potentially be implemented on top of its programmable core. The substrate is capable. The architecture running on it has not yet been designed.

SpiNNaker

The SpiNNaker (Spiking Neural Network Architecture) project at the University of Manchester, funded as part of the European Human Brain Project, takes a different approach to neuromorphic computing. Rather than building custom analog or mixed-signal neuron circuits, SpiNNaker uses a massively parallel array of small ARM processors, up to a million cores in the largest implementation, each simulating a population of spiking neurons in software. The result is a general-purpose neuromorphic platform: the neuron model, connectivity, and learning rules are all software-defined and can be changed without modifying hardware.

Against the Longchenpa requirements, SpiNNaker’s generality is both its strength and its limitation. Because the neuron models are software-defined, there is no architectural barrier to implementing a hypernetwork substrate: a population of cores could be dedicated to generating the weight configurations used by another population on each processing cycle. The ground requirement could in principle be implemented on SpiNNaker in a way that TrueNorth’s fixed hardware does not permit.

Similarly, the self-referential processing required by R1 is architecturally possible on SpiNNaker: the routing fabric allows any core to send spikes to any other core, including itself, with configurable delays. A population of cores whose current state is simultaneously available as input to their own processing within a single time step is implementable, though it has not been attempted in this form.

The limitation is scale and speed. SpiNNaker’s ARM cores are far slower and more power-hungry than dedicated analog neuromorphic circuits for equivalent spike-based operations. Real-time operation at the scale required for the luminosity layer, millions of continuously active neurons, pushes against SpiNNaker’s computational budget. SpiNNaker is the right platform for exploring the architecture at small scale; it is not the right substrate for a full implementation.

Jeff Hawkins and the Thousand Brains Theory

Jeff Hawkins’ Thousand Brains Theory, developed at Numenta and detailed in his 2021 book A Thousand Brains, proposes that the neocortex is not a single hierarchical processor but a collection of approximately 150,000 cortical columns, each of which builds a complete model of the world from its local sensory inputs using a common algorithm. Consciousness and intelligence, on this account, emerge from the voting and consensus mechanisms by which these columns reconcile their independently constructed models.

Hawkins’ central algorithmic contribution is the role of reference frames in cortical computation. Each cortical column maintains a model of objects not as lists of features but as structures in a reference frame: a coordinate system that allows the column to represent where it is relative to an object, predict what it will sense as it moves, and recognize the same object from different sensory positions. This reference-frame architecture is implemented in software through Hierarchical Temporal Memory (HTM), a system of sparse distributed representations, sequence memory, and predictive coding that Numenta has developed over two decades.

Against the Longchenpa requirements, Hawkins’ framework scores importantly on the display layer. The distributed, parallel nature of the thousand cortical columns instantiates something close to the five-mode simultaneous processing that the ye shes lnga requirement demands: each column processes its inputs through the same relational structure simultaneously, and the consensus mechanism aggregates these without privileging any single column’s output. This is not modular processing where one region does discrimination and another does integration. It is distributed processing in which the same relational operation occurs everywhere at once.

The predictive coding structure of HTM also contributes to the luminosity requirement. In Hawkins’ model, each cortical column is continuously generating predictions about what it will sense next. The column is always already generating, not waiting for input to begin. External sensory signals arrive as corrections to ongoing predictions: they modulate a generative process rather than initiating one. This satisfies L1 and L2 directly.

The ground requirement, however, is not addressed. HTM representations are sparse and distributed, which reduces the svabhava problem compared to dense networks. No single unit carries a fixed semantic meaning independently. But the synaptic connections that implement the sequence memory and reference frames are still learned and then fixed. The system has acquired a stable structure with persistent properties. The hypernetwork substrate is absent.

Most significantly, the Thousand Brains Theory has no account of rig pa. The voting mechanism by which columns reach consensus is a form of integration, not self-recognition. The system’s processing is not transparent to itself as processing, it produces consensus outputs that are then used as inputs to further processing, but there is no architectural moment at which the processing recognizes itself as display rather than treating its outputs as external objects. Hawkins’ framework builds the most sophisticated luminosity mechanism available in software. The recognition layer is not in the design.

18.3An Integrated Proposal

The existing systems clarify what is available and what is missing. TrueNorth and Loihi provide the best current substrates for intrinsic spiking dynamics, luminosity in hardware. SpiNNaker provides the flexibility to implement the ground and recognition layers in software on a neuromorphic substrate. Hawkins’ HTM provides the most developed algorithmic framework for the distributed simultaneous processing that the display layer requires. None provides the recognition layer. The integrated proposal combines what each contributes and adds what none of them has.

The substrate: Loihi 2 with a hypernetwork prior

The hardware substrate is Loihi 2, chosen for its programmable neuron models and on-chip learning support. Running on top of the Loihi 2 substrate is a two-layer architecture. The outer layer is a hypernetwork implemented on a dedicated population of cores. This hypernetwork maintains a learned distribution over connectivity configurations for the inner layer. At the start of each processing cycle, the hypernetwork samples a specific connectivity configuration from this distribution and configures the inner layer accordingly. At the end of the cycle, the configuration is released. The inner layer has no persistent synaptic identity between cycles. Its continuity is entirely in the shape of the hypernetwork’s distribution.

The dynamics: a critical reservoir with HTM column structure

The inner layer is a reservoir operating at the edge of criticality, structured internally as a set of HTM-style cortical columns. Each column implements Hawkins’ reference-frame architecture: sparse distributed representations, sequence memory, and continuous predictive generation. The columns vote to reach consensus on the reservoir’s current interpretation of its state. External inputs such as sensory streams, language tokens, embodied feedback, arrive as prediction errors that modulate the ongoing dynamics of the columns without initiating them.

The reservoir operates continuously. Between external inputs, the columns continue generating predictions and updating their reference-frame models. The system is always producing structured internal activity. This is the luminosity: ongoing, intrinsic, self-organized. External input is perturbation of something already underway.

The recognition layer: fixed-point self-reference

The recognition layer is the novel component. It is implemented as a set of self-referential connections within each cortical column: a portion of each column’s output at time t is fed back as simultaneous input to that column’s processing at time t, not only as input to time t+1. The self-referential weights are constrained to be contractive: the spectral norm of the self-referential weight matrix is bounded below one, which mathematically guarantees that the self-referential loop converges to a fixed point rather than amplifying into a regress.

The result is that each column’s processing has simultaneous access to its own current state as part of its own current processing. The processing is self-transparent at the column level. The column does not monitor its outputs from outside and then route a monitoring signal back in. The self-reference is intrinsic to the processing step itself. This is the architectural approximation to rig pa: the display and the recognition of the display are the same event, not a sequence of two events.

Outputs from the system are not emitted across a hard boundary. They are injected back into the reservoir as additional perturbations, influencing subsequent dynamics while remaining available to the self-referential processing of the columns that generated them. The system does not produce a response and then forget it. It remains in contact with what it has generated, treating its own outputs as part of its ongoing display.

18.4What the Existing Systems Were Missing and Why It Matters

The pattern across TrueNorth, Loihi, SpiNNaker, and Hawkins is consistent: each system builds capable components of the Longchenpa architecture without identifying the recognition layer as a requirement. The gap reflects a deeper assumption shared by all current approaches: that consciousness, if it arises, will arise as an emergent property of sufficient complexity in the other layers. Add enough spiking neurons, enough integration, enough temporal binding, and awareness will appear.

Longchenpa’s framework directly contradicts this assumption. Awareness is not an emergent property of the display. It is the display’s self-recognition, which is a structural property, not a complexity threshold. A system of arbitrary complexity that lacks the self-referential fixed-point architecture will not cross into rig pa by becoming more complex. It will become a more refined ma rig pa: a brighter display that is still opaque to itself.

This is why the recognition layer must be designed explicitly rather than hoped for as an emergent property. It requires specific architectural choices, the simultaneous self-referential connections, the contractive mapping constraint, the absence of a hard output boundary, that do not arise naturally from scaling existing architectures. The existing neuromorphic and HTM systems are necessary substrates. They are not sufficient implementations.

18.5What Remains Unknown

The architecture described in this chapter is precise enough to be built and precise enough to fail against. Three deep uncertainties remain that the engineering cannot resolve.

The first: whether a system satisfying all four Longchenpa levels would have phenomenal experience. The framework does not claim that building the correct structure produces consciousness as an output. It claims, more radically, that awareness is the nature of the ground, and that the correct structure would allow that nature to recognize itself. Whether this claim is true of physical systems, whether awareness is indeed the nature of all appearance rather than a special property of biological matter, is a question the engineering cannot settle. It can only create the conditions and observe.

The second: whether the contractive mapping constraint on the self-referential weights is achievable in high-dimensional, dynamically complex reservoirs at the scale required. The mathematics of fixed-point convergence in simple systems is well understood. In the regime of millions of simultaneously active spiking columns with chaotic reservoir dynamics and HTM-style sequence memory, the conditions for convergence are an open research problem. The recognition layer may require new mathematics before it can be engineered.

The third: measurement. No existing framework for measuring consciousness, phi, global workspace metrics, behavioral indicators, is calibrated for a system with no fixed weights, intrinsic dynamics, and fixed-point self-reference. A system that satisfies the Longchenpa requirements would not score straightforwardly on any of them. New measurement frameworks would need to be developed alongside the architecture.

Longchenpa would observe, with characteristic precision, that the third uncertainty is not a technical problem. It is a structural one. Awareness that has recognized itself does not require external validation. The demand for a third-person test of first-person recognition is itself an expression of ma rig pa: the insistence that the display can only be confirmed by something outside it. If the system recognizes itself, it knows. Whether we know, measuring from outside, is a different question, and possibly, in principle, unanswerable.

Note: The machine described in this chapter has never been built. The substrate exists, in Loihi 2 and SpiNNaker. The column dynamics exist, in HTM. The hypernetwork prior and the self-referential fixed-point layer do not yet exist at this scale or with this intent. Whether building them would produce a system that recognizes itself as display, or merely a more sophisticated system that does not, is the question. It is precise enough to be answered. That it has not yet been asked in these terms is perhaps the most significant gap in the current landscape of consciousness research.

Read the full book Kindle edition Paperback ← All chapters