Publication Date:
2023-08-04
Description:
Linear bandit algorithms yield O~(n√T) pseudo-regret bounds on compact convex action sets K⊂Rn and two types of structural assumptions lead to better pseudo-regret bounds. When K is the simplex or an ℓp ball with p∈]1,2], there exist bandits algorithms with O~(√n√T) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond ℓp balls that enjoy pseudo-regret bounds of O~(√n√T), which answers an open question from [BCB12, §5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than O(√n). However, this comes at the expense of asymptotic rates in T varying between O(√T) and O(T).
Language:
English
Type:
article
,
doc-type:article
Permalink