Federated learning (FL) refers to the adaptation of a central model based on data sets available at multiple remote users. Two of the common challenges encountered in FL are the fact that training sets obtained by different users are commonly heterogeneous, i.e., arise from different sample distributions, and the need to communicate large amounts of data between the users and the central server over the typically expensive uplink channel. In this work we formulate the problem of FL in which different clusters of users observe labeled samples drawn from different distributions, while operating under constraints on the communication overhead. For such settings, we identify that the combination of statistical heterogeneity and communication constraints induces an inherent tradeoff between the ability of the users of each cluster to learn a proper model and the accuracy in aggregating these models into a global inference rule. We propose an algorithm based on multi-source adaptation methods for such communication-aware clustered FL scenarios which allows to balance these performance measures, and demonstrate its ability to achieve improved inference over conventional federated averaging without inducing additional communication overhead.