I am a fourth-year Ph.D. student in Computer Science at the University of California San Diego (UCSD). I am advised by Prof. Kamalika Chaudhuri.
I am primarily interested in the theory and application of topics in sequential decision making, such as bandit learning and reinforcement learning.
I am currently working on theoretically characterizing the benefits of domain adaptation and transfer learning, as well as designing and analyzing sample-efficient algorithms, for several sequential decision making problems.
Before joining UCSD, I obtained my B.S. in Computer Science from the University of Southern California, where I worked with Prof. T. K. Satish Kumar and Prof. Sven Koenig on combinatorial search, planning and scheduling, and constraint satisfaction, among many other subfields of artificial intelligence.
Publications
Zhi Wang, Chicheng Zhang, and Kamalika Chaudhuri.
Thompson Sampling for Robust Transfer in Multi-Task Bandits. To appear at the 39th International Conference on Machine Learning (ICML-2022).
We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.
Chicheng Zhang, Zhi Wang.
Provably Efficient Multi-Task Reinforcement Learning with Model Transfer. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS-2021). A shorter version of this work appeared in the Workshop on Reinforcement Learning Theory at ICML-2021. [arXiv]
We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem.
Zhi Wang*, Chicheng Zhang*, Manish Kumar Singh, Laurel D. Riek, and Kamalika Chaudhuri.
Multitask Bandit Learning through Heterogeneous Feedback Aggregation. In Proceedings of the 24th International Conference on
Artificial Intelligence and Statistics (AISTATS-2021). A preliminary version of this work appeared in the Workshop on Real World Experiment Design and Active Learning at ICML-2020. *These authors contributed equally. [arXiv]
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the ε-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm,
RobustAgg
(ε), that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.
Zhi Wang, Liron Cohen, Sven Koenig and T. K. Satish Kumar. The Factored Shortest Path Problem and Its Applications in Robotics. In Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS-2018). [pdf]
Many real-world combinatorial problems exhibit structure in the way in which their variables interact. Such structure can be exploited in the form of "factors" for representational as well as computational benefits. Factored representations are extensively used in probabilistic reasoning, constraint satisfaction, planning, and decision theory. In this paper, we formulate the factored shortest path problem (FSPP) on a collection of constraints interpreted as factors of a highdimensional map.We show that the FSPP is not only a generalization of the regular shortest path problem but also particularly relevant to robotics.We develop factored-space heuristics for A* and prove that they are admissible and consistent. We provide experimental results on both random and handcrafted instances as well as on an example robotics domain to show that A* with factored-space heuristics outperforms A* with the Manhattan Distance heuristic in many cases.
@inproceedings{wang2018,
author = {Zhi Wang and Liron Cohen and Sven Koenig and T. K. Satish Kumar},
title = {The Factored Shortest Path Problem and Its Applications in Robotics},
booktitle = {Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {527--531},
year = {2018}
}
T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers and Craig A. Knoblock. Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-2018). [pdf]
We study load scheduling of simple temporal networks (STNs) under dynamic pricing of resources. We are given a set of processes and a set of simple temporal constraints between their execution times, i.e., an STN. Each process uses a certain amount of resource for execution. The unit price of the resource is a function of time, f(t). The goal is to find a schedule of a given STN that trades off makespan minimization against cost minimization within a user-specified suboptimality bound. We provide a polynomial-time algorithm for solving the load scheduling problem when f(t) is piecewise constant. This has important applications in many real-world domains including the smart home and smart grid domains. We then study the dependency of the unit price of the resource on time as well as the total demand at that time. This leads to a further characterization of tractable, NP-hard, and conjectured tractable cases.
@inproceedings{kumar2018a,
author = {Kumar, T. K. Satish and Wang, Zhi and Kumar, Anoop and Rogers, Craig Milo and Knoblock,Craig A.},
title = {Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing},
booktitle = {Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI)},
pages = {6227--6236},
year = {2018}
}
T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers and Craig A. Knoblock. On the Linear Programming Duals of Temporal Reasoning Problems. In Proceedings of the 15th International Symposium on Artificial Intelligence and Mathematics (ISAIM-2018). [pdf]
Temporal reasoning problems occur in many application domains of Artificial Intelligence; therefore, it is important for us to develop algorithms for solving them efficiently. While some problems like Simple Temporal Problems are known to be tractable, some other problems like Disjunctive Temporal Problems are known to be NP-hard. In this paper, we provide a Linear Programming (LP) duality perspective on temporal reasoning problems. In many cases, we show that their LP duals are the commonly-studied flow problems in Graph Theory. Using the general theory of LP duality, we develop novel algorithms for efficiently solving several temporal reasoning problems. We also show that other previously-known efficient methods in temporal reasoning also fit into this perspective of LP duality.
@inproceedings{kumar2018b,
author = {Kumar, T. K. Satish and Wang, Zhi and Kumar, Anoop and Rogers, Craig Milo and Knoblock,Craig A.},
title = {On the Linear Programming Duals of Temporal Reasoning Problems},
booktitle = {Proceedings of the Fifteenth International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
year = {2018},
url = {"http://isaim2018.cs.virginia.edu/papers/ISAIM2018_TK_Satish_Kumar_etal.pdf"}
}
Contact
Office: EBU3B 4142 @ UCSD
Email: zhiwang at eng dot ucsd dot edu Google Scholar dblp