top of page

NLP and Text Analysis

NLP
(Natural Language Processing)

What is NLP?

Natural Language Processing (NLP) is a field of artificial intelligence (AI) that focuses on enabling computers to understand, interpret and produce human language in a meaningful way. Computers' understanding of human language is primarily text-based. Computers that can understand, interpret and produce texts offer a unique service to humanities and social sciences in analyzing the same texts.

There are a number of tasks that need to be fulfilled for NLP to take place. The first of these is tokenization. Tokenization involves breaking a text into smaller units, typically words or sentences. Second of these is Part-of-Speech Tagging (POS). It involves categorizing words in a sentence into their grammatical parts (nouns, verbs, adjectives, etc.). Another of these is Named Entity Recognition (NER). This identifies and classifies entities mentioned in a text, such as names of people, organizations, locations, etc. Finally, the last of these task is Sentiment Analysis. Sentiment Analysis determines the sentiment or emotional tone of a piece of text, whether it's positive, negative, or neutral. 

Commonly Used NLP Models

BERT (Bidirectional Encoder Representations from Transformers)

GPT-3/3.5/4 (Generative Pre-trained Transformer)

ELMo (Embeddings from Language Models)

XLNet

Transformer Models (e.g., GPT-2, RoBERTa)

Commonly Used NLP Tools and Libraries

spaCy

NLTK (Natural Language Toolkit)

Gensim

Transformers (Hugging Face)

Scikit-learn

PyTorch and TensorFlow

BERT-based Libraries (Hugging Face's, TensorFlow's...)

Word Embeddings (GloVe, Word2Vec)

Text Analysis

What is Text Analysis?

Text Analysis in digital humanities and computational social sciences involves applying computational techniques to analyze and gain insights from large volumes of textual data. It aims to extract meaningful information, patterns, and trends from texts in order to support research and generate new knowledge in these domains. In doing so, it utilizes technologies such as machine learning, NLP and artificial intelligence.

In digital humanities, Text Analysis is used to explore various aspects of human culture, history, and literature. Researchers may analyze literary works, historical documents, cultural artifacts, and more. Common tasks include sentiment analysis, topic modeling, entity recognition, and the study of linguistic patterns. 

In computational social sciences, Text Analysis is employed to study social phenomena, behaviors, and interactions through the lens of text data. This can include analyzing social media posts, online forums, survey responses, and other textual sources. Researchers aim to uncover trends, sentiments, and patterns in the data to gain insights into human behavior and society.

Commonly Used Text Analysis Tools and Software

Voyant Tools

AntConc

MALLET

ATLAS.ti

R and Python Libraries

Gephi

bottom of page