Abstract
Knowing the locations of tweets can benefit a wide variety of applications such as venue recommendation, event detection, and monitoring disaster outbreaks. However, the problem of fine-grained tweet geolocation prediction is challenging since tweets are short and therefore may not contain any geo-indicative words or may contain ambiguous, noisy information. Existing solutions either yield an unsatisfactory accuracy in practical applications or make predictions that even experts struggle to interpret, failing to engender sufficient trust and actionability for real-world deployment. Our paper presents a tweet geolocation prediction framework, EDGE (Entity-Diffusion Gaussian Ensemble), which delivers predictions that are both accurate and highly interpretable without requiring any additional contextual information such as user profile and location history. In EDGE, we cast the geolocation problem as a neutral network optimization problem by learning probabilistic generative models. Compared with existing works, EDGE has two distinctive features: (1) the inference builds on mining the correlation between non geo-indicative entities and geo-indicative entities by diffusing their semantic embeddings over the constructed graph neural network (Entity Diffusion) and (2) each prediction result is represented as a Gaussian mixture instead of specific geographical coordinates (Gaussian Ensemble). Extensive experiments using real-world tweet datasets validate the superiority of EDGE over the state of the art in terms of all distance-based and POI-based metrics.