Graph-Autonomous Mixture of Experts with Hierarchical Multi-Agent PPO for Dynamic Knowledge Graphs

Aygul F. Shaykhulova

Abstract


This paper presents a novel Graph-Autonomous Mixture of Experts (MoE) architecture that combines differentiable knowledge graphs with hierarchical multi-agent Proximal Policy Optimization (PPO). The system autonomously builds a hierarchical knowledge graph from unstructured text, initializes expert agents on graph nodes, and enables continuous adaptation through a two-level reinforcement learning framework. At the meta-level, a controller manages expert creation, deletion, and knowledge transfer. At the local level, each expert agent independently decides when to train, specialize, merge, or transfer knowledge. All rewards are derived exclusively from quality metrics—perplexity on validation data, router contribution, and expert diversity—eliminating hand-crafted reward shaping. Experiments on technical document processing demonstrate that the system achieves a 45.8% loss reduction over 220 training steps, with experts developing distinct specializations (Silhouette score 0.432) and a stable population dynamic (CREATE/DELETE ratio 8.0). The learned transfer matrix reveals asymmetric knowledge flows, with active donors and recipients emerging organically. This work establishes a foundation for fully autonomous, self-organizing MoE systems that adapt their architecture in response to data.


Full Text:

PDF Appendix

References


R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, "Adaptive mixtures of local experts," Neural Computation, vol. 3, no. 1, pp. 79–87, 1991.

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer," in Proc. International Conference on Learning Representations (ICLR), 2017.

B. Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," in Proc. International Conference on Learning Representations (ICLR), 2017.

A. Graves, "Adaptive computation time for recurrent neural networks," arXiv preprint arXiv:1603.08983, 2016.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.

W. Fedus, B. Zoph, and N. Shazeer, "Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity," Journal of Machine Learning Research, vol. 23, no. 120, pp. 1–39, 2022.

S. Gross, M. Ranzato, and A. Szlam, "Hard mixtures of experts for large scale sparse neural networks," arXiv preprint arXiv:1704.06363, 2017.

C. Li, M. Zhang, and Y. He, "Dynamic mixture of experts: An auto-tuning approach for efficient transformer models," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022, pp. 12 345–12 357.

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, "A comprehensive survey on graph neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.

T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," in Proc. International Conference on Learning Representations (ICLR), 2017.

M. Zhang and Y. Chen, "Link prediction based on graph neural networks," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 5165–5175.

N. De Cao and T. Kipf, "MolGAN: An implicit generative model for small molecular graphs," arXiv preprint arXiv:1805.11973, 2018.

T. N. Kipf and M. Welling, "Variational graph auto-encoders," arXiv preprint arXiv:1611.07308, 2016.

Y. Gao, H. Yang, P. Zhang, C. Zhou, and Y. Hu, "Graph neural architecture search," in Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 1403–1409.

L. Busoniu, R. Babuska, and B. De Schutter, "A comprehensive survey of multiagent reinforcement learning," IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 38, no. 2, pp. 156–172, 2008.

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, "The surprising effectiveness of PPO in cooperative multi-agent games," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022, pp. 24 611–24 624.

C. Finn, P. Abbeel, and S. Levine, "Model-agnostic meta-learning for fast adaptation of deep networks," in Proc. International Conference on Machine Learning (ICML), 2017, pp. 1126–1135.

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas, "Learning to learn by gradient descent by gradient descent," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 3981–3989.

H. Liu, K. Simonyan, and Y. Yang, "DARTS: Differentiable architecture search," in Proc. International Conference on Learning Representations (ICLR), 2019.

X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, "TinyBERT: Distilling BERT for natural language understanding," in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 4163–4174.

N. Reimers and I. Gurevych, "Sentence-BERT: Sentence embeddings using Siamese BERT-networks," in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, pp. 3982–3992.

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, "High-dimensional continuous control using generalized advantage estimation," in Proc. International Conference on Learning Representations (ICLR), 2016.

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multi-agent actor-critic for mixed cooperative-competitive environments," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 6379–6390.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162