BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20250928T141759EDT-4830VUsbnS@132.216.98.100
DTSTAMP:20250928T181759Z
DESCRIPTION:\n	\n		\n			\n				\n					\n						\n							\n								\n									\n										\n											\n												TITLE / TITRE\n\n												On Mixture of Experts 
 in Large-Scale Statistical Machine Learning Applications\n													\n													ABSTRACT/RÉSUM
 É \n\n												Mixtures of experts (MoEs)\, a class of statistical machine learning
  models that combine multiple models\, known as experts\, to form more com
 plex and accurate models\, have been combined into deep learning architect
 ures to improve the ability of these architectures and AI models to captur
 e the heterogeneity of the data and to scale up these architectures withou
 t increasing the computational cost. In mixtures of experts\, each expert 
 specializes in a different aspect of the data\, which is then combined wit
 h a gating function to produce the final output. Therefore\, parameter and
  expert estimates play a crucial role by enabling statisticians and data s
 cientists to articulate and make sense of the diverse patterns present in 
 the data. However\, the statistical behaviors of parameters and experts in
  a mixture of experts have remained unsolved\, which is due to the complex
  interaction between gating function and expert parameters.\n													\n													In the firs
 t part of the talk\, we investigate the performance of the least squares e
 stimators (LSE) under a deterministic MoEs model where the data are sample
 d according to a regression model\, a setting that has remained largely un
 explored. We establish a condition called strong identifiability to charac
 terize the convergence behavior of various types of expert functions. We d
 emonstrate that the rates for estimating strongly identifiable experts\, n
 amely the widely used feed-forward networks with activation functions sigm
 oid(·) and tanh(·)\, are substantially faster than those of polynomial exp
 erts\, which we show to exhibit a surprising slow estimation rate.\n													\n													In t
 he second part of the talk\, we show that the insights from theories shed 
 light into understanding and improving important practical applications in
  machine learning and artificial intelligence (AI)\, in- cluding effective
 ly scaling up massive AI models with several billion parameters\, efficien
 tly finetuning large-scale AI models for downstream tasks\, and enhancing 
 the performance of Transformer model\, state-of-the-art deep learning arch
 itecture\, with a novel self-attention mechanism.\n\n												Lien ZOOM Link\n											\n										\n									
 \n								\n							\n						\n					\n				\n			\n		\n	\n\n
DTSTART:20241101T193000Z
DTEND:20241101T203000Z
SUMMARY:Nhat Ho (University of Texas at Austin) 
URL:/mathstat/channels/event/nhat-ho-university-texas-
 austin-360757
END:VEVENT
END:VCALENDAR