Evolution of NLP Techniques based on the Google Books Corpus

Great Ideas in current Computer Science Research

Computer Science (CS) Research is an emergent and exciting area. Classical parts of CS are being reshaped to fit a more modern concept of computing. One domain that is experiencing a renaissance is Natural Language Processing (NLP). Classical NLP tasks are being expanded to include time-series information allowing us to capture evolutionary dynamics, and not just static information.  For example, the word “bitch” was historically synonymous with a female dog, and more recently became (pejoratively) synonymous with the word “feminist.”

ACM BLOG IMG1Fig1:  The Trend of “Feminist” Over Time and Its Close Relatives

Traditional thesauruses do not contain information on when this synonymy was generated, nor the surrounding events that gave rise to this. This additional information about the historicity of the linguistic change is so innovative that it blurs the boundary between disparate disciplines: NLP and Computational Linguistics. This added dimension also allows us to challenge the foundations of traditional NLP research.

Language is the foundation of civilization. The story of the Tower of Babel in the Bible describes language as the uniting force among humanity, the key to its technological advancement and ability to become like G-d. Speaking one same language, Babel’s inhabitants were able to work together to develop a city and build a tower high enough to reach heaven. Seeing this, G-d mixes up their language, taking away the source of the inhabitants’ power by breaking down their mutual understanding. This story illustrates the power and cultural significance of universal language. Continue reading