Abstract
Higher education resource allocation is confronted with significant systemic challenges, such as difficulties in coordinating multiple stakeholders, highly complex organizational structures, and a lack of dynamic adaptability—issues that critically undermine both the efficiency and equity of resource utilization. To tackle these challenges, this study introduces a novel Graph-based Multi-Agent Proximal Policy Optimization (GMAPPO) framework, enhanced with a graph attention mechanism, to enable intelligent and collaborative decision-making in higher education resource management. Designed to reflect the unique characteristics of the higher education ecosystem, the model employs a structured graph representation to precisely capture the intricate relationships among key entities—including institutions, academic programs, and faculty members. By incorporating a multi-head graph attention mechanism, the system improves its sensitivity to critical nodes, such as under-resourced or emerging disciplines, thereby enabling more targeted and adaptive resource allocation strategies. The system employs a centralized training and distributed execution framework, integrating global state perception with local autonomous decision-making within an actor-critic architecture. It achieves efficient collaboration between agents through policy gradient optimization. It innovatively applies a progressive course learning mechanism, manages resource conflicts and demand fluctuations in stages, and utilizes a phased loss smoothing strategy to jointly optimize global value goals and policy consistency, effectively balancing multiple educational goals such as efficiency and fairness. Experimental verification shows that compared to mainstream multi-agent algorithms such as MADDPG (Multi-Agent Deep Deterministic Policy Gradient), QMIX (Q-Mixing networks), and MAPPO (Multi-Agent Proximal Policy Optimization), the GMAPPO system achieves course and classroom resource utilization rates of 73.20% and 95.07%, respectively, in higher education scenarios. Its policy compliance rate reaches 98.5%. In a highly dynamic scenario with a 60% increase in new courses, its response time is only 3.2 s, highlighting its excellent real-time adaptability. Under normal load, its cost-effectiveness is 125.6 yuan per hour, and its average attention entropy is as low as 0.21, demonstrating its efficiency and decision focus in complex resource environments. Through structured feature extraction and dynamic collaborative optimization mechanisms, GMAPPO provides an efficient, fair, and robust intelligent system solution for higher education resource allocation.
Keywords
Get full access to this article
View all access options for this article.
