High-performance dictionary-based sentiment analysis

By Maurits van der Veen in text preprocessing sentiment analysis

December 10, 2021

Lexicon-based sentiment analysis, using multiple lexica and scaled against representative text corpora.

Easy-to-use, high-quality sentiment analysis. Instead of trying to develop yet another general-purpose sentiment analysis lexicon, we average across 8 widely-used ones that have different strengths and weaknesses. In addition, we calibrate against a set of representative texts and adjust each individual lexicon’s score so that its mean is 0 (the neutral point) and the standard deviation is 1. We rescale the final average so that its standard deviation is 1 as well, to produce a sentiment measure that is readily interpretable (relative to the benchmark used for scaling). MultiLexScaled outperforms other widely-used sentiment analysis dictionaries on a range of different test sets.

The Github repository contains all the code and notebooks needed to apply MultiLexScaled to a corpus of texts, along with a paper explaining the method in more detail.

Posted on:
December 10, 2021
1 minute read, 142 words
text preprocessing sentiment analysis
See Also: