Abstract
Natural language processing (NLP) is a set of computational tools to process datasets consisting of a corpus of texts and to retrieve underlying information and features of the data. NLP uses artificial intelligence to extract meaning from languages through incorporating machine learning and data visualization. Data visualization allows one to identify some geometrical and topological features of the data that can provide useful information about the text. Topological data analysis (TDA) tools can be used to analyze datasets and examine topological features of the data. The theory behind TDA is to consider each dataset within a metric space and identify clusters, loops and voids using nearby points to reveal the shape of the data. Persistent homology is one of the known tools for TDA that detects these features. As a result of this manipulation of data through persistent homology, a shape of the data is induced. A key feature to this technique, is its ability to find similar relation, relative position or structure that distinguish these components by applying homology to a filtration of simplicial complexes built from finite sets of points. In 2013, Zhu used persistent homology to capture loops and clusters for a set of nursery rhymes and children’s stories and to compare adolescent and adult writing. Inspired by Zhu’s work, we examined nursery rhymes from different continents such as Australia, Asia, Africa, Europe and North America to study patterns for repetition of themes or ideas within a poem.