Two fully-funded PhD scholarships available for commencement in 2026
Funded by the Australian Research Council Discovery Grant
Advancing stochastic optimisation: highly-correlated restless bandit models,
Professor Peter Taylor, Dr Jing Fu and Professor Jose Niño-Mora would like to advertise two fully-funded PhD projects, one to be held at The Royal Melbourne Institute of Technology (RMIT) and the other to be held at The University of Melbourne.
Project 1 (RMIT)
Title: Restless-Bandit-Enhanced Multi-Agent Reinforcement Learning
Project Description: Restless-Bandit-Enhanced Learning (RB-L) is an emerging framework that integrates restless bandit theorems with reinforcement learning to tackle the curse of dimensionality. It is widely applicable in realistic scenarios. This project aims to trade off learning and control in practical scenarios with inevitable high-dimensional state and/or action spaces. It will incorporate advanced control and learning algorithms, such as classic multi-agent reinforcement learning. The expected outcomes are scalable learning and control algorithms, with rational guarantees of overall performance.
The prospective HDR student is expected to have fundamental knowledge of stochastic modeling (such as Markov decision process), probability, reinforcement learning, and linear/convex optimization, and good programming skills for large-scale simulations. Python experience is required.
Project 2 (University of Melbourne)
Title: Optimality in highly-correlated restless bandit models
Project Description: Conventional techniques for analyzing restless bandit models such as Whittle relaxation, fluid approximation, and LP-based approximation focus on proving asymptotic optimality by exploring levels of relaxation over the original optimization problem.
A solution of a relaxed problem can reflect intrinsic properties. It can be utilized to propose a heuristic policy for the original problem. Often it is possible to show that such a policy is asymptotically optimal. The objective of this project is to extend these methods to highly-correlated restless bandit models.
The project will also consider how well the solutions perform in the non-asymptotic regime.
.
The prospective HDR student is expected to have fundamental knowledge of stochastic modeling, probability, reinforcement learning, and linear/convex optimization, and good programming skills.
In the first instance, interested applicants should email Dr Jing Fu at jing.fu@rmit.edu.au, Professor Peter Taylor at taylorpg@unimelb.edu.au and Professor Jose Niño-Mora at jnino@est-econ.uc3m.es explaining why they are interested in the projects and providing details of their curriculum vitae and their academic record.
Dr Jing Fu received the B.Eng. degree in computer science from Shanghai Jiao Tong University, Shanghai, China, in 2011, and the Ph.D. degree in electronic engineering from the City University of Hong Kong in 2016. She has been with the School of Mathematics and Statistics, the University of Melbourne as a Post-Doctoral Research Associate from 2016 to 2019. She has been a lecturer (assistant professor) in the Department of Electrical and Electronic Engineering, STEM College, RMIT University, since 2020. Her main areas of research interest are, both theoretically and numerically, stochastic optimization, restless bandits, coordinated multi-agent optimization, resource-intensive AI, and coordinated learning and control.
Tutorial presentations:
April 2026, "Restless bandit model with weakly coupled constraints: Tutorial: how to design asymptotically optimal algorithms - Fluid Approximation", [Online link]
Representative Publications/Articles:
[1] J. Fu*, B. Moran, J. Niño-Mora, “Weakly-Coupled Multi-Action Restless Bandits - Exponential Convergence in Probability”, preprint, arXiv:2604.15683, Apr. 2026.
Impact: It studies the behaviours of the general stochastic process and, most importantly, the design of policies that guarantee its convergence to an ideal trajectory as the problem size increases.
[2] J. Fu*, B. Moran, J. Niño-Mora, “Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control”, preprint, arXiv:2412.03326, Dec. 2024.
Impact: It advances the field by establishing fast convergence of key stochastic processes under simultaneous learning and control in fairly general scenarios, making notable progress on long-standing challenges in stochastic optimisation.
[3] J. Fu*, Z. Wang, J. Chen, “Coordinated Multi-Agent Patrolling with State-Dependent Cost Rates: Asymptotically Optimal Policies for Large-Scale Systems”, accepted by IEEE Transactions on Automatic Control, Dec. 2024.
[Online Available][Code for Sharing]
Impact: Annotation: For the first time, it proved fast convergence to optimality in highly dimensional spatial optimization.
[4] J. Fu*, X. Wang, Z. Wang, M. Zukerman, “A Restless Bandit Model for Energy-Efficient Job Assignments in Server Farms”, IEEE Transactions on Automatic Control, Vol. 69, Issue 9, pp. 5820 – 5835, Sep. 2024.
Impact: It advances the studies in job/traffic scheduling by approaching optimality of large, hard problems without necessitating exact intractable solutions.
[5] J. Fu*, B. Moran, P. Taylor, “A Restless Bandit Model for Resource Allocation, Competition and Reservation”, Operations Research, vol. 70, no. 1, Jan.-Feb. 2022.
Impact: It proved a non-trivial condition, which has remained open since 1990, in a range of practical scenarios.
[6] J. Fu*, B. Moran, “Energy-Efficient Job-Assignment Policy with Asymptotically Guaranteed Performance Deviation”, IEEE/ACM Transactions on Networking, vol. 28, no. 3, pp. 1325-1338, Jun. 2020.
Impact: It improves [5] by proving a significantly tighter bound over the performance degradation of proposed algorithms, enhancing the real-world impacts.

RMIT University acknowledges the people of the Woi wurrung and Boon wurrung language groups of the eastern Kulin Nation on whose unceded lands we conduct the business of the University. RMIT University respectfully acknowledges their Ancestors and Elders, past and present. RMIT also acknowledges the Traditional Custodians and their Ancestors of the lands and waters across Australia where we conduct our business - Artwork 'Sentient' by Hollie Johnson, Gunaikurnai and Monero Ngarigo.
Learn more about our commitment to Indigenous cultures