The use of multi-chip modules (MCM) and/or multi-socket boards is the most suitable approach to increase the computation density of servers while keep chip yield attained. This paper introduces a new coherence protocol suitable, in terms of complexity and scalability, for this class of systems. The proposal uses two complementary ideas: (1) A mechanism that dissociates complexity from performance by means of colored-token counting, (2) A construct that optimizes performance and cost by means of two functionally symmetrical modules working in the last level cache of each chip (D|F-LLC) and each memory controller (D|F-MEM). Each of these structures is divided into two parts: (2.1) The first one consists of a small loosely inclusive sparse directory where only the most actively shared data are tracked in the chip (D-LLC) from each memory controller (D-MEM) and, (2.2) The second is a d-left Counting Bloom Filter which stores approximate information about the blocks allocated, either inside the chip (F-LLC) or in the home memory controller (F-MEM). The coordinated work of both structures minimizes the coherence-related effects on the average memory latency perceived by the processor. Our proposal is able to improve on the performance of a HyperTransport-like coherence protocol by from 25%-to-60%.