Online learning of quantum pure states without regret
Seminar author:Lumbreras Zarapico Josep
Event date and time:07/13/2023 04:00:pm
Event location:
Event contact:
We present a novel way of learning pure quantum states using online learning techniques from stochastic bandit theory. In this setting, the learner interacts sequentially with an unknown pure quantum state (the environment) performing single-copy rank-1 projectors measurements (the actions). The learner’s goal is to reduce the expected cumulative regret, which is minimized by selecting the measurements with maximum overlap with the unknown state. In previous work, it was observed that the regret scales as square root the number of rounds if we apply directly the bandit algorithm LinUCB. It was an open question if this strategy was optimal. We answer this question by presenting a modified version of LinUCB that uses a weighted least square estimator and gives a logarithmic scaling of the regret under a geometrical assumption. We do numerical studies that show logarithmic scaling and we check that the assumption is satisfied. We also derive information-theoretic lower bounds on the regret connecting quantum state tomography with bandit protocols and show a logarithmic lower bound that is almost optimal. Finally, we study a classical quantum-inspired stochastic linear bandit that shows that contrary to a common belief in classical bandit theory, the square root regret barrier is not only due to the fact that the action set is continuous, it is also because the variances of the reward probabilities distributions have non-zero variance.