BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20250928T141759EDT-4830VUsbnS@132.216.98.100 DTSTAMP:20250928T181759Z DESCRIPTION:\n \n \n \n \n \n \n \n \n \n \n \n TITLE / TITRE\n\n On Mixture of Experts in Large-Scale Statistical Machine Learning Applications\n \n ABSTRACT/RÉSUM É \n\n Mixtures of experts (MoEs)\, a class of statistical machine learning models that combine multiple models\, known as experts\, to form more com plex and accurate models\, have been combined into deep learning architect ures to improve the ability of these architectures and AI models to captur e the heterogeneity of the data and to scale up these architectures withou t increasing the computational cost. In mixtures of experts\, each expert specializes in a different aspect of the data\, which is then combined wit h a gating function to produce the final output. Therefore\, parameter and expert estimates play a crucial role by enabling statisticians and data s cientists to articulate and make sense of the diverse patterns present in the data. However\, the statistical behaviors of parameters and experts in a mixture of experts have remained unsolved\, which is due to the complex interaction between gating function and expert parameters.\n \n In the firs t part of the talk\, we investigate the performance of the least squares e stimators (LSE) under a deterministic MoEs model where the data are sample d according to a regression model\, a setting that has remained largely un explored. We establish a condition called strong identifiability to charac terize the convergence behavior of various types of expert functions. We d emonstrate that the rates for estimating strongly identifiable experts\, n amely the widely used feed-forward networks with activation functions sigm oid(·) and tanh(·)\, are substantially faster than those of polynomial exp erts\, which we show to exhibit a surprising slow estimation rate.\n \n In t he second part of the talk\, we show that the insights from theories shed light into understanding and improving important practical applications in machine learning and artificial intelligence (AI)\, in- cluding effective ly scaling up massive AI models with several billion parameters\, efficien tly finetuning large-scale AI models for downstream tasks\, and enhancing the performance of Transformer model\, state-of-the-art deep learning arch itecture\, with a novel self-attention mechanism.\n\n Lien ZOOM Link\n \n \n \n \n \n \n \n \n \n \n \n\n DTSTART:20241101T193000Z DTEND:20241101T203000Z SUMMARY:Nhat Ho (University of Texas at Austin) URL:/mathstat/channels/event/nhat-ho-university-texas- austin-360757 END:VEVENT END:VCALENDAR