No Arabic abstract
In recent years, we have seen a tenfold increase in volume and complexity of digital data acquired for cultural heritage documentation. Meanwhile, open data and open science have become leading trends in digital humanities. The convergence of those two parameters compels us to deliver, in an interoperable fashion, datasets that are vastly heterogeneous both in content and format and, moreover, in such a way that they fit the expectation of a broad array of researchers and an even broader public audience. Tackling those issues is one of the main goal of the HeritageS digital platform project supported by the Intelligence des Patrimoines research program. This platform is designed to allow research projects from many interdisciplinary fields to share, integrate and valorize cultural and natural heritage datasets related to the Loire Valley. In this regard, one of our main project is the creation of the Renaissance Transmedia Lab. Its core element is a website which acts as a hub to access various interactive experiences linked to project about the Renaissance period: augmented web-documentary, serious game, virtual reality, 3D application. We expect to leverage those transmedia experiences to foster better communication between researchers and the public while keeping the quality of scientific discourse. By presenting the current and upcoming productions, we intend to share our experience with other participants: preparatory work and how we cope with researchers to produce, in concertation, tailor-made experiences that convey the desired scientific discourse while remaining appealing to the general public.
Digital stiffness programmability is fulfilled with a heterogeneous mechanical metamaterial. The prototype consists of an elastomer matrix containing tessellations of diamond shaped cavities selectively confined with semi-rigid plastic beam inserts along their diagonals. Unit-cell perturbations by placing or removing each insert reshape the global constitutive relation whose lower and upper bounds corresponding to the configurations with all holes empty and all inserts in place, respectively, are significantly distant from each other thanks to a gap between the moduli of the elastomer and the inserts. Bidirectional operation is achieved by mixing insert orientations where longitudinal inserts enhance the macroscopic stiffness in compression and transverse ones tension. Arranged digital representations of such local insert states form the explicit encoding of global patterns so that systematic stiffness programming with minimal changes in mass is enabled both statically and in situ. These characteristics establish a new paradigm in actively tuning vibration isolation systems according to shifts in the resonance of base structures.
Quantifying and predicting the long-term impact of scientific writings or individual scholars has important implications for many policy decisions, such as funding proposal evaluation and identifying emerging research fields. In this work, we propose an approach based on Heterogeneous Dynamical Graph Neural Network (HDGNN) to explicitly model and predict the cumulative impact of papers and authors. HDGNN extends heterogeneous GNNs by incorporating temporally evolving characteristics and capturing both structural properties of attributed graph and the growing sequence of citation behavior. HDGNN is significantly different from previous models in its capability of modeling the node impact in a dynamic manner while taking into account the complex relations among nodes. Experiments conducted on a real citation dataset demonstrate its superior performance of predicting the impact of both papers and authors.
Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges on the network and to quantify the similarity among papers. However, the former requires adequate labeled samples and informative negative samples, and are also ineffective in measuring the high-order connections among papers, while the latter needs complicated feature engineering or supervision to construct the network. We propose a novel generative adversarial framework to grow the two categories of models together: (i) the discriminative module distinguishes whether two papers are from the same author, and (ii) the generative module selects possibly homogeneous papers directly from the heterogeneous information network, which eliminates the complicated feature engineering. In such a way, the discriminative module guides the generative module to select homogeneous papers, and the generative module generates high-quality negative samples to train the discriminative module to make it aware of high-order connections among papers. Furthermore, a self-training strategy for the discriminative module and a random walk based generating algorithm are designed to make the training stable and efficient. Extensive experiments on two real-world AND benchmarks demonstrate that our model provides significant performance improvement over the state-of-the-art methods.
Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.
The simultaneous estimation of many parameters $eta_i$, based on a corresponding set of observations $x_i$, for $i=1,ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $mu_i$ subject to a fixed variance term $sigma^2$. However, Many practical situations involve heterogeneous data $(x_i, theta_i)$ where $theta_i$ is a known nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the Nonparametric Empirical Bayes Smoothing Tweedie (NEST) estimator, which efficiently estimates $eta_i$ and properly adjusts for heterogeneity %by approximating the marginal density of the data $f_{theta_i}(x_i)$ and applying this density to via a generalized version of Tweedies formula. NEST is capable of handling a wider range of settings than previously proposed heterogeneous approaches as it does not make any parametric assumptions on the prior distribution of $eta_i$. The estimation framework is simple but general enough to accommodate any member of the exponential family of distributions. %; a thorough study of the normal means problem subject to heterogeneous variances is presented to illustrate the proposed framework. Our theoretical results show that NEST is asymptotically optimal, while simulation studies show that it outperforms competing methods, with substantial efficiency gains in many settings. The method is demonstrated on a data set measuring the performance gap in math scores between socioeconomically advantaged and disadvantaged students in K-12 schools.