Sage Journals: Discover world-class research

Abstract

Higher education resource allocation is confronted with significant systemic challenges, such as difficulties in coordinating multiple stakeholders, highly complex organizational structures, and a lack of dynamic adaptability—issues that critically undermine both the efficiency and equity of resource utilization. To tackle these challenges, this study introduces a novel Graph-based Multi-Agent Proximal Policy Optimization (GMAPPO) framework, enhanced with a graph attention mechanism, to enable intelligent and collaborative decision-making in higher education resource management. Designed to reflect the unique characteristics of the higher education ecosystem, the model employs a structured graph representation to precisely capture the intricate relationships among key entities—including institutions, academic programs, and faculty members. By incorporating a multi-head graph attention mechanism, the system improves its sensitivity to critical nodes, such as under-resourced or emerging disciplines, thereby enabling more targeted and adaptive resource allocation strategies. The system employs a centralized training and distributed execution framework, integrating global state perception with local autonomous decision-making within an actor-critic architecture. It achieves efficient collaboration between agents through policy gradient optimization. It innovatively applies a progressive course learning mechanism, manages resource conflicts and demand fluctuations in stages, and utilizes a phased loss smoothing strategy to jointly optimize global value goals and policy consistency, effectively balancing multiple educational goals such as efficiency and fairness. Experimental verification shows that compared to mainstream multi-agent algorithms such as MADDPG (Multi-Agent Deep Deterministic Policy Gradient), QMIX (Q-Mixing networks), and MAPPO (Multi-Agent Proximal Policy Optimization), the GMAPPO system achieves course and classroom resource utilization rates of 73.20% and 95.07%, respectively, in higher education scenarios. Its policy compliance rate reaches 98.5%. In a highly dynamic scenario with a 60% increase in new courses, its response time is only 3.2 s, highlighting its excellent real-time adaptability. Under normal load, its cost-effectiveness is 125.6 yuan per hour, and its average attention entropy is as low as 0.21, demonstrating its efficiency and decision focus in complex resource environments. Through structured feature extraction and dynamic collaborative optimization mechanisms, GMAPPO provides an efficient, fair, and robust intelligent system solution for higher education resource allocation.

Keywords

multi-agent reinforcement learning graph attention mechanism proximal policy optimization higher education resource allocation actor-critic architecture

Get full access to this article

View all access options for this article.

References

Guo

. Improving higher education resource allocation efficiency and its spatial correlation for sustainable development in China. Decis Mak Appl Manag Eng 2024; 7(2): 591–607.

Zhang

, et al. Global Digital Compact: a Mechanism for the governance of online discriminatory and misleading content generation. Int J Hum Comput Interact 2025; 41(2): 1381–1396.

Liu

. Strategic planning and resource allocation in higher education institutions. Educ Rev 2024; 8(11): 1359–1364.

Chen

Sun

, et al. The measurement, level, and influence of resource allocation efficiency in universities: empirical evidence from 13 “double first class” universities in China. Humanit Soc Sci Commun 2024; 11(1): 955.

Sen

Jirarotephinyo

Chanphong

. Educational human resource allocation model for art major in private universities under liaoning province. Roi Et Academic J 2024; 9(9): 297–311.

Saaida

. AI-Driven transformations in higher education: opportunities and challenges. Int J Educ Res Stud 2023; 5(1): 29–36.

Raimundo

Rosário

. Blockchain system in the higher education. Eur J Investig Health Psychol Educ 2021; 11(1): 276–293.

Chen

, et al. Do Financial investment, disciplinary differences, and level of development impact on the efficiency of resource allocation in higher education: evidence from China. Sustainability 2023; 15(9): 7418.

Wang

Zhang

Zhao

, et al. Efficiency of higher education financial resource allocation from the perspective of ‘double first-class’ construction: a three-stage global super slacks-based measure analysis. Educ Inf Technol 2024; 29(10): 12047–12075.

10.

Okoye

Hussein

Arrona-Palacios

, et al. Impact of digital technologies upon teaching and learning in higher education in Latin America: an outlook on the reach, barriers, and bottlenecks. Educ Inf Technol 2023; 28(2): 2291–2360.

11.

El Fazazi

Elgarej

Qbadou

, et al. Design of an adaptive e-learning system based on multi-agent approach and reinforcement learning. Eng Technol Appl Sci Res 2021; 11(1): 6637–6644.

12.

Bhandari

Russo

. Global optimality guarantees for policy gradient methods. Oper Res 2024; 72(5): 1906–1927.

13.

Corecco

Adorni

Gambardella

. Proximal policy optimization-based reinforcement learning and hybrid approaches to explore the cross array task optimal solution. Mach Learn Knowl Extr 2023; 5(4): 1660–1679.

14.

Zhou

Zheng

Huang

, et al. Graph neural networks: taxonomy, advances, and trends. ACM Trans Intell Syst Technol 2022; 13(1): 1–54.

15.

Chen

Yue

, et al. Design and application of an improved genetic algorithm to a class scheduling system. Int J Emerg Technol Learn 2021; 16(01): 44–59.

16.

Rappos

Thiémard

Robert

, et al. A mixed-integer programming approach for solving university course timetabling problems. J Sched 2022; 25(4): 391–404.

17.

De Coster

Musliu

Schaerf

, et al. Algorithm selection and instance space analysis for curriculum-based course timetabling. J Sched 2022; 25(1): 35–58.

18.

Chen

Bayanati

Ebrahimi

, et al. A novel optimization approach for educational class scheduling with considering the students and teachers’ preferences. Discrete Dyn Nat Soc 2022; 2022(1): 5505631.

19.

Zhang

. An optimized solution to the course scheduling problem in universities under an improved genetic algorithm. J Intell Syst 2022; 31(1): 1065–1073.

20.

Cui

. Optimal allocation of higher education resources based on fuzzy particle swarm optimization. Int J Electr Eng Educ 2023; 60(2_suppl): 312–324.

21.

Abdelhamid

Alotaibi

. Adaptive multi‐agent smart academic advising framework. IET Softw 2021; 15(5): 293–307.

22.

Comşa

Molnar

Tal

, et al. Improved quality of online education using prioritized multi-agent reinforcement learning for video traffic scheduling. IEEE Trans Broadcast 2023; 69(2): 436–454.

23.

Ahmed

Brewitt

Carlucho

, et al. Deep reinforcement learning for multi-agent interaction. AI Commun 2022; 35(4): 357–368.

24.

Hamal

Faddouli

Harouni

MHA

. Design and implementation of the multi-agent system in education. World J Educ Technol Curr Issues 2021; 13(4): 775–793.

25.

Jiang

Chen

, et al. Multi-agent systems supported by large language models: technical pathways, educational applications, and future prospects. Open Educ Res 2024; 30(5): 63–73.

26.

Xie

. Design and implementation of physical education teaching management system based on multi-agent model. Int J Comput Intell Syst 2023; 16(1): 172.

27.

Nazir

Noraziah

Rahmah

. Students' performance prediction in higher education using multi-agent framework-based distributed data mining approach: a review. Int J Virtual Personal Learn Environ 2023; 13(1): 1–19.

28.

Yuan

Zhao

Wang

, et al. Online course evaluation model based on graph auto-encoder. Intell Data Anal 2024; 28(6): 1467–1489.

29.

Ben

Sun

Liu

, et al. Multi-head multi-order graph attention networks. Appl Intell 2024; 54: 8092–8107.

30.

Sun

Lin

, et al. Attention-based graph neural networks: a survey. Artif Intell Rev 2023; 56(Suppl 2): 2263–2310.

31.

Zheng

Kurt

Wang

. Stochastic integrated actor–critic for deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 2024; 35(5): 6654–6666.

32.

Cheng

Chen

CLP

, et al. Proximal policy optimization with policy feedback. IEEE Trans Syst Man Cybern Syst 2022; 52(7): 4600–4610.