{"id":5398,"date":"2021-12-05T18:36:56","date_gmt":"2021-12-05T13:06:56","guid":{"rendered":"https:\/\/gtl.csa.iisc.ac.in\/hari\/?page_id=5398"},"modified":"2021-12-06T12:38:09","modified_gmt":"2021-12-06T07:08:09","slug":"can-we-make-thompson-sampling-work-for-mechanism-design-for-stochastic-multi-armed-bandits","status":"publish","type":"page","link":"https:\/\/gtl.csa.iisc.ac.in\/hari\/publications\/research-snippets\/can-we-make-thompson-sampling-work-for-mechanism-design-for-stochastic-multi-armed-bandits\/","title":{"rendered":"Can we make Thompson sampling work for mechanism design for stochastic multi-armed bandits?"},"content":{"rendered":"\n<p>In this work we consider mechanism design for multiarmed bandit problems and explore for the first time a Bayesian approach for the problem. In particular, we propose a Thompson sampling based algorithm to learn the rewards of the strategic agents. Thompson sampling algorithm achieves better theoretical regret guarantees, however, its randomized nature introduces certain unique, non-trivial challenges for mechanism design. Our main contribution is the proposal of the TSM-D mechanism which achieves the same social welfare regret as that of Thompson sampling algorithm. We further prove the following properties of TSM-D: (a) the variance in agent utilities asymptotically tends to zero; (b) the mechanism is ex-post individually rational with high probability; and (c) the mechanism is within period dominant strategy incentive compatible with high probability. <\/p>\n\n\n\n<p><strong>Reference:<\/strong><\/p>\n\n\n\n<p>Ganesh Ghalme,\u00a0Shweta\u00a0Jain, Sujit Gujar, and Y. Narahari. Thompson sampling based mechanisms for stochastic multi-armed bandit problems.\u00a0Proceedings of\u00a0<strong>AAMAS-2017<\/strong>\u00a0(16<sup>th<\/sup>\u00a0 International Conference on Autonomous Agents and Multi Agent Systems), Sao Paulo,\u00a0 Brazil, 2017, pp. 87-95.\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this work we consider mechanism design for multiarmed bandit problems and explore for the first time a Bayesian approach for the problem. In particular, we propose a Thompson sampling based algorithm to learn the rewards of the strategic agents. Thompson sampling algorithm achieves better theoretical regret guarantees, however, its randomized nature introduces certain unique, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":5372,"menu_order":20,"comment_status":"closed","ping_status":"closed","template":"template-researchsnippets.php","meta":{"kt_blocks_editor_width":""},"acf":[],"_links":{"self":[{"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/pages\/5398"}],"collection":[{"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/comments?post=5398"}],"version-history":[{"count":3,"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/pages\/5398\/revisions"}],"predecessor-version":[{"id":5514,"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/pages\/5398\/revisions\/5514"}],"up":[{"embeddable":true,"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/pages\/5372"}],"wp:attachment":[{"href":"https:\/\/gtl.csa.iisc.ac.in\/hari\/wp-json\/wp\/v2\/media?parent=5398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}