Zhi Wang's Homepage

Preprints

Zhi Wang, Geelon So, and Ramya Korlakai Vinayak. Metric Learning from Limited Pairwise Preference Comparisons.
[arXiv]

We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into \( \mathbb{R}^d \) equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given \( O(d) \) pairwise comparisons per user, in practice we often have a limited budget of \( o(d) \) comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, \( o(d) \) comparisons reveals no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.

Publications

Zhi Wang, Chicheng Zhang, and Kamalika Chaudhuri. Thompson Sampling for Robust Transfer in Multi-Task Bandits. Proceedings of the 39th International Conference on Machine Learning (ICML-2022).
[arXiv]

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.

Chicheng Zhang, Zhi Wang. Provably Efficient Multi-Task Reinforcement Learning with Model Transfer. Advances in Neural Information Processing Systems 34 (NeurIPS-2021). A shorter version of this work appeared in the Workshop on Reinforcement Learning Theory at ICML-2021.
[arXiv]

We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem.

Zhi Wang*, Chicheng Zhang*, Manish Kumar Singh, Laurel D. Riek, and Kamalika Chaudhuri. Multitask Bandit Learning through Heterogeneous Feedback Aggregation. Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS-2021). A preliminary version of this work appeared in the Workshop on Real World Experiment Design and Active Learning at ICML-2020. *These authors contributed equally.
[arXiv]

In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the ε-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm,
RobustAgg
(ε), that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

Zhi Wang, Liron Cohen, Sven Koenig, and T. K. Satish Kumar. The Factored Shortest Path Problem and Its Applications in Robotics. Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS-2018).
[pdf]
Many real-world combinatorial problems exhibit structure in the way in which their variables interact. Such structure can be exploited in the form of "factors" for representational as well as computational benefits. Factored representations are extensively used in probabilistic reasoning, constraint satisfaction, planning, and decision theory. In this paper, we formulate the factored shortest path problem (FSPP) on a collection of constraints interpreted as factors of a highdimensional map.We show that the FSPP is not only a generalization of the regular shortest path problem but also particularly relevant to robotics.We develop factored-space heuristics for A* and prove that they are admissible and consistent. We provide experimental results on both random and handcrafted instances as well as on an example robotics domain to show that A* with factored-space heuristics outperforms A* with the Manhattan Distance heuristic in many cases.

T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers, and Craig A. Knoblock. Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing. Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-2018).
[pdf]
We study load scheduling of simple temporal networks (STNs) under dynamic pricing of resources. We are given a set of processes and a set of simple temporal constraints between their execution times, i.e., an STN. Each process uses a certain amount of resource for execution. The unit price of the resource is a function of time, \( f(t) \). The goal is to find a schedule of a given STN that trades off makespan minimization against cost minimization within a user-specified suboptimality bound. We provide a polynomial-time algorithm for solving the load scheduling problem when \( f(t) \) is piecewise constant. This has important applications in many real-world domains including the smart home and smart grid domains. We then study the dependency of the unit price of the resource on time as well as the total demand at that time. This leads to a further characterization of tractable, NP-hard, and conjectured tractable cases.

T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers, and Craig A. Knoblock. On the Linear Programming Duals of Temporal Reasoning Problems. The 15th International Symposium on Artificial Intelligence and Mathematics (ISAIM-2018).
[pdf]
Temporal reasoning problems occur in many application domains of Artificial Intelligence; therefore, it is important for us to develop algorithms for solving them efficiently. While some problems like Simple Temporal Problems are known to be tractable, some other problems like Disjunctive Temporal Problems are known to be NP-hard. In this paper, we provide a Linear Programming (LP) duality perspective on temporal reasoning problems. In many cases, we show that their LP duals are the commonly-studied flow problems in Graph Theory. Using the general theory of LP duality, we develop novel algorithms for efficiently solving several temporal reasoning problems. We also show that other previously-known efficient methods in temporal reasoning also fit into this perspective of LP duality.

Zhi Wang

Preprints

Publications

Teaching

Professional Service

Contact