An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions.
Motivated by the fact that entities in a social network or biological system often interact by exchanging information, we propose an efficient info-clustering algorithm that can group entities into communities using a parametric max-flow algorithm. This is a meaningful special case of the info-clustering paradigm where the dependency structure is graphical and can be learned readily from data.
The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of partitions in combinatorial optimization.
Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods. However, HAC is notoriously hard to scale to large data sets as the underlying complexity is at least quadratic in the number of data points and many algorithms to solve HAC are inherently sequential. In this paper, we propose {Reciprocal Agglomerative Clustering (RAC)}, a distributed algorithm for HAC, that uses a novel strategy to efficiently merge clusters in parallel. We prove theoretically that RAC recovers the exact solution of HAC. Furthermore, under clusterability and balancedness assumption we show provable speedups in total runtime due to the parallelism. We also show that these speedups are achievable for certain probabilistic data models. In extensive experiments, we show that this parallelism is achieved on real world data sets and that the proposed RAC algorithm can recover the HAC hierarchy on billions of data points connected by trillions of edges in less than an hour.
This work proposes a new resource allocation optimization and network management framework for wireless networks using neighborhood-based optimization rather than fully centralized or fully decentralized methods. We propose hierarchical clustering with a minimax linkage criterion for the formation of the virtual cells. Once the virtual cells are formed, we consider two cooperation models: the interference coordination model and the coordinated multi-point decoding model. In the first model base stations in a virtual cell decode their signals independently, but allocate the communication resources cooperatively. In the second model base stations in the same virtual cell allocate the communication resources and decode their signals cooperatively. We address the resource allocation problem for each of these cooperation models. For the interference coordination model this problem is an NP-hard mixed-integer optimization problem whereas for the coordinated multi-point decoding model it is convex. Our numerical results indicate that proper design of the neighborhood-based optimization leads to significant gains in sum rate over fully decentralized optimization, yet may also have a significant sum rate penalty compared to fully centralized optimization. In particular, neighborhood-based optimization has a significant sum rate penalty compared to fully centralized optimization in the coordinated multi-point model, but not the interference coordination model.
This paper proposes a millimeter wave-NOMA (mmWave-NOMA) system that takes into account the end-user signal processing capabilities, an important practical consideration. The implementation of NOMA in the downlink (DL) direction requires successive interference cancellation (SIC) to be performed at the user terminals, which comes at the cost of additional complexity. In NOMA, the weakest user only has to decode its own signal, while the strongest user has to decode the signals of all other users in the SIC procedure. Hence, the additional implementation complexity required of the user to perform SIC for DL NOMA depends on its position in the SIC decoding order. Beyond fifth-generation (B5G) communication systems are expected to support a wide variety of end-user devices, each with their own processing capabilities. We envision a system where users report their SIC decoding capability to the base station (BS), i.e., the number of other users signals a user is capable of decoding in the SIC procedure. We investigate the rate maximization problem in such a system, by breaking it down into a user clustering and ordering problem (UCOP), followed by a power allocation problem. We propose a NOMA minimum exact cover (NOMA-MEC) heuristic algorithm that converts the UCOP into a cluster minimization problem from a derived set of valid cluster combinations after factoring in the SIC decoding capability. The complexity of NOMA-MEC is analyzed for various algorithm and system parameters. For a homogeneous system of users that all have the same decoding capabilities, we show that this equates to a simple maximum number of users per cluster constraint and propose a lower complexity NOMA-best beam (NOMA-BB) algorithm. Simulation results demonstrate the performance superiority in terms of sum rate compared to orthogonal multiple access (OMA) and traditional NOMA