Linked text representation is critical for many intelligent web applications, such as online advertisement and recommender systems. Recent breakthroughs on pretrained language models and graph neural networks facilitate the development of corresponding techniques. However, the existing works mainly rely on cascaded model structures: the texts are independently encoded by language models at first, and the textual embeddings are further aggregated by graph neural networks. We argue that the neighbourhood information is insufficiently utilized within the above process, which restricts the representation quality. In this work, we propose GraphFormers, where graph neural networks are nested alongside each transformer layer of the language models. On top of the above architecture, the linked texts will iteratively extract neighbourhood information for the enhancement of their own semantics. Such an iterative workflow gives rise to more effective utilization of neighbourhood information, which contributes to the representation quality. We further introduce an adaptation called unidirectional GraphFormers, which is much more efficient and comparably effective; and we leverage a pretraining strategy called the neighbourhood-aware masked language modeling to enhance the training effect. We perform extensive experiment studies with three large-scale linked text datasets, whose results verify the effectiveness of our proposed methods.