June 8, 2023, 09:55–15:00
Room Auditorium 5
Background and objective
The objective of the workshop, organized by MADS is to foster the interest in the emerging developments in decision mathematics. The workshop provides an opportunity for PhD students, post-doctoral researchers, and faculty members to interact and discuss both theoretical and empirical contributions, with a special focus this year on data science/ML, games, optimization, and statistics. It will consist in 6 talks and will be held on Thursday June 8, 2023, 9:55-15:05 (UTC+2).
Organizing Committee:
Jérôme Bolte, Abdelaati Daouia, Sébastien Gadat
Conference Secretariat:
Lukas Dargel - "On the link between Multiplicative Competitive Interaction models and Compositional Data regression with a total"
This article sheds light on the relationship between compositional data (CoDa) regression models and multiplicative competitive interaction (MCI) models, which are two approaches for modeling shares. We demonstrate that MCI models are special cases of CoDa models and that a reparameterization links both. Recognizing this relation offers mutual benefits for the CoDa and MCI literature, each with its own rich tradition. The CoDa tradition, with its rigorous mathematical foundation, provides additional theoretical guarantees and mathematical tools that we apply to improve the estimation of MCI models. Simultaneously, the MCI model emerged from almost a century-long tradition in marketing research that may enrich the CoDa literature. One aspect is the grounding of the MCI specification in intuitive assumptions on the behavior of individuals. From this basis, the MCI tradition also provides credible justifications for heteroskedastic error structures -- an idea we develop further and that is relevant to many CoDa models beyond the marketing context. Additionally, MCI models have always been interpreted in terms of elasticities, a method only recently adopted in CoDa. Regarding this interpretation, the change from the MCI to the CoDa perspective leads to a decomposition of the influence of the explanatory variables into contributions from relative and absolute information.
Philipp Koch - "Using data on famous individuals to learn unknown historical GDP per capita levels"
Historical GDP per capita estimates reveal patterns of long-term economic development across countries, reaching back several centuries or even millennia. Coverage, however, is limited since such reconstructions are based on extensive research digitizing and analyzing historical documents. Here, we build upon these datasets and train a supervised machine learning model with data on biographies of famous individuals living between 13th and 20th century to learn yet unknown GDP per capita levels of countries and regions in Europe and the United States. We extract potential features from the biographies of famous individuals in a location (e.g. number of births and deaths, occupational structure, average age etc.) and calibrate our model using a sequential forward feature selection algorithm. Using an independent test set, we find that our model makes accurate out-of-sample estimates with an R2 of 0.93. We validate our dataset in two ways: First, we use our regional out-of-sample estimates to recreate the well-established finding of England and the Low Countries experiencing larger economic growth than other European countries between 1300 and 1800. Second, we show that our estimates correlate with country-level estimates of average body height in the late 18th century, composite indicators of well-being in 1850, and data on disposable income per capita across French regions in 1864. Together, our estimates do not only allow for investigating 700 years of cross-country differences in economic development, but also for comparing the development of European NUTS2-regions (Milan, Montpellier, Stockholm, London, Moscow etc.) and metropolitan areas in the United States (New York, Boston etc.). We publish our estimates with appropriate confidence intervals and hope to spur research on topics around long-term economic development with this data.
Joseph Hachem - "A de-randomization argument for estimating extreme value parameters of heavy tails"
In extreme value analysis, it has recently been shown that one can use a de-randomization trick, replacing a random threshold in the estimator of interest with its deterministic counterpart, in order to estimate several extreme risks simultaneously, but only in an i.i.d. context. In this talk, I will show how this method can be used to handle the estimation of several tail quantities (tail index, expected shortfall, distortion risk measures...) in general dependence/heteroskedasticity/heterogeneity settings, under a weighted $L^1$ assumption on the gap between the average distribution of the data and the prevailing distribution. Particularly interesting examples of application include serially dependent but heteroskedastic frameworks.
Ngoc Tâm Lê - "Nonsmooth implicit differentiation for machine learning"
A recurrent question coming with LASSO type estimators is the choice of the regularization hyperparameter. A classical way to deal with this in the machine learning community, is to procede by cross validation which can be formulated as a bi-level optimization problem. In the case of the LASSO estimator, the unique parameter can reasonably be optimized by grid search. In order to reduce the bias of the LASSO, a weighted version can be proposed but it involves to optimize many parameters, thus making the grid search unefficient. In this work, we present a recent approach consisting in differentiating the solution of the LASSO estimator with respect to the hyperparameter in order to apply classical gradient methods. In this perspective, we propose a nonsmooth implicit differentiation formula which allows to differentiate solutions of optimization problems.
Estelle Medous - "Optimal Weights for double Many-To-One Generalized Weight Share Method"
In probabilistic survey, the sampling frame of the target population may not be available. If there is a sampling frame linked to the target population, indirect sampling can be used to draw a sample. The sampling weights can be determined using the generalized weight share method (GWSM). The GWSM is an attractive method because, in specific cases, there is a set of sampling weights minimizing the variance of the GWSM estimator. However, this method is hard to apply when the links between population are difficult to retrieve. A solution is to consider an intermediate population, linked to both the sampling frame and the target population, and to use a double indirect sampling. The sampling weights can be determined using a double GWSM: the GWSM is used twice, first between the sampling frame and intermediate population, then between the intermediate and target populations. The double GWSM allows for a reduction in the number of observed links. Thus, this method is easier to apply than the GWSM, but it deteriorates the precision of the estimator. In specific cases, there is a set of sampling weights minimizing the variance of the double GWSM such that no precision is lost compared to the GWSM while maintaining the easier implementation of the double GWSM. When these weights cannot be computed, as is the case in the French postal traffic survey, an alternative weight can be used to improve the precision of the double GWSM estimator. Results are illustrated through Monte Carlo simulations and an application to the French postal traffic estimation.
Etienne De Montbrun - "Certified Multi-Fidelity Zeroth-order Optimization"
We consider the problem of multifidelity global optimization, where one can evaluate a function f at various approximation levels (of varying costs), and the goal is to optimize f with the cheapest evaluations possible. In this paper, we study certified algorithms, which are additionally required to output a data-driven upper bound on the optimization error. We first formalize the problem in terms of a min-max game between an algorithm and an evaluation environment. We then propose a certified variant of the MF-DOO algorithm and derive a bound on its cost complexity for any Lipschitz function f. Finally we prove an f-dependent lower bound showing that this algorithm has a near-optimal cost complexity.