No Arabic abstract
Understanding electrical energy demand at the consumer level plays an important role in planning the distribution of electrical networks and offering of off-peak tariffs, but observing individual consumption patterns is still expensive. On the other hand, aggregated load curves are normally available at the substation level. The proposed methodology separates substation aggregated loads into estimated mean consumption curves, called typical curves, including information given by explanatory variables. In addition, a model-based clustering approach for substations is proposed based on the similarity of their consumers typical curves and covariance structures. The methodology is applied to a real substation load monitoring dataset from the United Kingdom and tested in eight simulated scenarios.
The identification of precipitation regimes is important for many purposes such as agricultural planning, water resource management, and return period estimation. Since precipitation and other related meteorological data typically exhibit spatial dependency and different characteristics at different time scales, clustering such data presents unique challenges. In this paper, we develop a flexible model-based approach to cluster multi-scale spatial functional data to address such problems. The underlying clustering model is a functional linear model , and the cluster memberships are assumed to be a realization from a Markov random field with geographic covariates. The methodology is applied to a precipitation data from China to identify precipitation regimes.
Evolutionary models of languages are usually considered to take the form of trees. With the development of so-called tree constraints the plausibility of the tree model assumptions can be addressed by checking whether the moments of observed variables lie within regions consistent with trees. In our linguistic application, the data set comprises acoustic samples (audio recordings) from speakers of five Romance languages or dialects. We wish to assess these functional data for compatibility with a hereditary tree model at the language level. A novel combination of canonical function analysis (CFA) with a separable covariance structure provides a method for generating a representative basis for the data. This resulting basis is formed of components which emphasize language differences whilst maintaining the integrity of the observational language-groupings. A previously unexploited Gaussian tree constraint is then applied to component-by-component projections of the data to investigate adherence to an evolutionary tree. The results indicate that while a tree model is unlikely to be suitable for modeling all aspects of the acoustic linguistic data, certain features of the spoken Romance languages highlighted by the separable-CFA basis may indeed be suitably modeled as a tree.
Electrical load profiling supports retailers and distribution network operators in having a better understanding of the consumption behavior of consumers. However, traditional clustering methods for load profiling are centralized and require access to all the smart meter data, thus causing privacy issues for consumers and retailers. To tackle this issue, we propose a privacy-preserving distributed clustering framework for load profiling by developing a privacy-preserving accelerated average consensus (PP-AAC) algorithm with proven convergence. Using the proposed framework, we modify several commonly used clustering methods, including k-means, fuzzy C-means, and Gaussian mixture model, to provide privacy-preserving distributed clustering methods. In this way, load profiling can be performed only by local calculations and information sharing between neighboring data owners without sacrificing privacy. Meanwhile, compared to traditional centralized clustering methods, the computational time consumed by each data owner is significantly reduced. The privacy and complexity of the proposed privacy-preserving distributed clustering framework are analyzed. The correctness, efficiency, effectiveness, and privacy-preserving feature of the proposed framework and the proposed PP-AAC algorithm are verified using a real-world Irish residential dataset.
Large-scale deployment of smart meters has made it possible to collect sufficient and high-resolution data of residential electric demand profiles. Clustering analysis of these profiles is important to further analyze and comment on electricity consumption patterns. Although many clustering techniques have been proposed in the literature over the years, it is often noticed that different techniques fit best for different datasets. To identify the most suitable technique, standard clustering validity indices are often used. These indices focus primarily on the intrinsic characteristics of the clustering results. Moreover, different indices often give conflicting recommendations which can only be clarified with heuristics about the dataset and/or the expected cluster structures -- information that is rarely available in practical situations. This paper presents a novel scheme to validate and compare the clustering results objectively. Additionally, the proposed scheme considers all the steps prior to the clustering algorithm, including the pre-processing and dimensionality reduction steps, in order to provide recommendations over the complete framework. Accordingly, the proposed strategy is shown to provide better, unbiased, and uniform recommendations as compared to the standard Clustering Validity Indices.
There is increasing appetite for analysing multiple network data. This is different to analysing traditional data sets, where now each observation in the data comprises a network. Recent technological advancements have allowed the collection of this type of data in a range of different applications. This has inspired researchers to develop statistical models that most accurately describe the probabilistic mechanism that generates a network population and use this to make inferences about the underlying structure of the network data. Only a few studies developed to date consider the heterogeneity that can exist in a network population. We propose a Mixture of Measurement Error Models for identifying clusters of networks in a network population, with respect to similarities detected in the connectivity patterns among the networks nodes. Extensive simulation studies show our model performs well in both clustering multiple network data and inferring the model parameters. We further apply our model on two real world multiple network data sets resulting from the fields of Computing (Human Tracking Systems) and Neuroscience.