LIWC
Linguistic Inquiry and Word Count (LIWC) is the gold standard for research in the field of Language Psychology. Created by Dr. James W. Pennebaker at the University of Texas, the software was originally used to examine the therapeutic value of writing by analyzing the frequency of psychologically-relevant linguistic features of text. Since its inception, the various LIWC dimensions have been validated and addressed in published research, and LIWC has been the basis for over 25,000 academic publications in a variety of fields covering topics such as power dynamics, thinking styles, motivations, communication dynamics,personality, consumer behavior, group dynamics, culture, and interpersonal relationships, among others.
LIWC 2015 classifies language into 94 psychologically-relevant categories, and LIWC22 has expanded that number to 102. These categories are defined by curated collections of words. Some categories consist primarily of function words (e.g., articles, pronouns), others focus on content words (e.g., biological processes, positive emotion), and many blend both content and function words (e.g., affiliation).
Content words capture what people are communicating about, which is the focus of traditional Natural Language Processing (NLP) methods like topic modeling and sentiment analysis. Function words capture how people communicate. Though often discarded as “stop words” in NLP, they carry rich psychological signals, reflecting states, traits, and values. Importantly, function words account for about 55% of everyday language and are processed largely subconsciously. For example, in the sentence “We should discuss the future priorities for our team to ensure success,” the words we, should, the, for, our, and to are function words, while discuss, future, priorities, team, ensure, and success are content words. LIWC analyzes both content and function words to detect psychological patterns.
While early versions of LIWC were developed manually, later updates have added semantic vector networks, thematic analysis, meaning extraction, and other machine-learning techniques, all used in conjunction with human-led analysis and decision-making. Through the Receptiviti API, LIWC can be accessed programmatically, making it possible to integrate psychological language analysis into applications at scale.
{
"plan_usage": {
"word_limit": 250000,
"words_used": 1282,
"words_remaining": 248718,
"percent_used": 0.51,
"start_date": "2024-01-01T00:00:00Z",
"end_date": "2024-01-31T23:59:59Z"
},
"results": [
{
"response_id": "f2ef969d-c96b-4adc-b78b-cd3cba8111f8",
"language": "en",
"version": "v1.0.0",
"summary": {
"word_count": 3,
"words_per_sentence": 3,
"sentence_count": 1,
"six_plus_words": 0.6666666666666666,
"capitals": 0.043478260869565216,
"emojis": 0,
"emoticons": 0,
"hashtags": 0,
"urls": 0
},
"personality": {...},
"social_dynamics": {...},
"drives": {...},
"cognition": {...},
"additional_indicators": {...},
"sallee": {...},
"liwc": {
"analytical_thinking": 0.9325858951175406,
"clout": 0.5,
"authentic": 0.01,
"emotional_tone": 0.99,
"six_plus_words": 0.6666666666666666,
"dictionary_words": 0.6666666666666666,
"function_words": 0,
"pronouns": 0,
"personal_pronouns": 0,
"i": 0,
"we": 0,
"you": 0,
"she_he": 0,
"they": 0,
"impersonal_pronouns": 0,
"articles": 0,
"prepositions": 0,
"auxiliary_verbs": 0,
"adverbs": 0,
"conjunctions": 0,
"negations": 0,
"other_grammar": 0.3333333333333333,
"verbs": 0,
"adjectives": 0,
"comparisons": 0,
"interrogatives": 0,
"numbers": 0,
"quantifiers": 0.3333333333333333,
"affective_processes": 0.3333333333333333,
"positive_emotion_words": 0.3333333333333333,
"negative_emotion_words": 0,
"anxiety_words": 0,
"anger_words": 0,
"sad_words": 0,
"social_processes": 0,
"family": 0,
"friends": 0,
"female": 0,
"male": 0,
"cognitive_processes": 0.3333333333333333,
"insight": 0,
"causation": 0,
"discrepancies": 0,
"tentative": 0.3333333333333333,
"certainty": 0,
"differentiation": 0,
"perceptual_processes": 0,
"see": 0,
"hear": 0,
"feel": 0,
"biological_processes": 0,
"body": 0,
"health": 0,
"sexual": 0,
"ingestion": 0,
"drives": 0,
"affiliation": 0,
"achievement": 0,
"power": 0,
"reward": 0,
"risk": 0,
"time_orientation": 0,
"focus_past": 0,
"focus_present": 0,
"focus_future": 0,
"relativity": 0,
"motion": 0,
"space": 0,
"time": 0,
"personal_concerns": 0,
"work": 0,
"leisure": 0,
"home": 0,
"money": 0,
"religion": 0,
"death": 0,
"informal_language": 0,
"swear_words": 0,
"netspeak": 0,
"assent": 0,
"nonfluencies": 0,
"filler_words": 0,
"all_punctuation": 0.3333333333333333,
"periods": 0,
"commas": 0,
"colons": 0,
"semicolons": 0,
"question_marks": 0,
"exclamations": 0,
"dashes": 0,
"quotes": 0,
"apostrophes": 0,
"parentheses": 0,
"other_punctuation": 0.3333333333333333
}
}
]
}
Measures
| Category | Measure | Examples |
|---|---|---|
| Summary Language Variables | ||
| Linguistic Dimensions | ||
| Other Grammar | ||
| Psychological Processes |
The Relationship between LIWC and Receptiviti
Receptiviti operates as the commercial arm of LIWC (the academic offering). Members of our science and sales teams have worked in Pennebaker’s lab and contributed to various iterations of LIWC. Today, our science team continues to evolve and expand the science, while the customer-facing side of our company supports its application. The science team has developed over 2,600 additional categories using the same validated methodology established in LIWC's original design. Some of these are made available to customers as proportional frameworks (like LIWC Extension), while others are used internally.
Categories are a common way we refer to our proportional measures. The term is derived from the dictionary approach used to design the measures, which, at a high-level, involves determining sets of words that are psychometrically related.
Receptiviti’s proportional measures, together with LIWC, serve as the foundational ingredients of Receptiviti’s normed algorithmic measures. Our normed algorithmic measures are never based on black-box machine learning models. Instead, they are built from the ground up using interpretable components—formulas shaped through a combination of theoretical insight and data-driven methods. We rely on transparent techniques such as regression and decision trees, ensuring every part of a measure remains understandable and explainable.
Theory-based methods draw on more than 25,000 published studies that have used LIWC to predict traits, states, behaviors, and other outcomes. These findings inform how we construct each measure. Data-driven methods include statistical approaches such as principal component analysis, regression modeling, and embeddings to test performance and inform and validate design using our internal datasets.
Working with LIWC
The LIWC dictionary is composed of approximately 6,400 words, word stems, and select emoticons.
LIWC measures will range between 0 and 1. 0 implies that a word within a category was not mentioned, and anything above zero indicates that a word in that category was mentioned, and the associated score reflects the ratio of that word to the total number of words in the submitted text sample.
When using LIWC, it’s important to remember that some words can fall into more than one category. For example, the word cried is part of five different categories: sadness, negative emotion, overall affect, verbs, and past focus. Hence, if the word cried is found in your text sample, each of these five sub-dictionary scores will be incremented.
As in the example with the word cried, many of the LIWC categories are arranged hierarchically. All sadness words, by definition, belong to the broader negative emotion category, as well as the overall affect words category.
Word stems can also be captured by LIWC. For example, the dictionary includes the stem hungr*, which allows for any word in your sample that matches the first five letters to be counted as an ingestion word (including hungry, hungrier, hungriest). The asterisk denotes the acceptance of all letters, hyphens, or numbers following its appearance.
LIWC API vs LIWC Academic Desktop Processing Differences: What to Expect
When using the API, there are a few key differences from the desktop application. Here’s what you should know:
| Feature | API Behavior | Desktop Behavior |
|---|---|---|
| URLs | Counts the URL as a unit, but does not analyze the words or punctuation inside it. | Counts both punctuation and words within the URL. |
| Hashtags | Counts # under OtherP. Attempts to split the following word unless it matches a dictionary entry exactly. | Doesn’t count # under OtherP. Only scores the word if it matches the dictionary as a whole. |
| Numbers with Punctuation | Treats punctuated numbers as a single item. E.g., 20,000.00 = one number. | Counts each part separately (e.g., 20,000.00 = three numbers). |
| Parentheses | Counts full pairs as one unit, and adds 1 to OtherP for a pair like (some text). | Also counts parentheses in pairs, but adds 2 to OtherP for (some text). |
| Dictionary Bigrams | Counts each word in a bigram (e.g., "each other") separately. | Treats dictionary bigrams as one token, affecting word and six-letter word counts. |
| Titles (Mr., Ms.) | Counts both with and without periods as words and doesn't trigger sentence breaks. | Period titles cause a sentence break; both forms are skipped unless matched in the dictionary. |
| OtherP Category | Recognizes more symbols (including mathematical ones). | Fewer symbols are assigned to this category. |
| Ellipses (...) | Groups all ellipsis forms under OtherP as a single unit. | Treats ... as separate punctuation unless encoded as a true ellipsis. |
| Hyphens & Apostrophes | Always splits on hyphens and apostrophes to count words independently. | Does not split, potentially causing proportional measures > 1. |
Further Reading
- LIWC 2015 User Manual
- LIWC22 User Manual
- The psychological meaning of words: LIWC and computerized text analysis methods
- Dr. James Pennebaker’s TED Talk
- Psychological Aspects of Natural Language Use: Our Words, Our Selves
- What do we know when we LIWC a person? Text analysis as an assessment tool for traits, personal concerns and life stories.
Contact us for further reading or research materials that are specific to your use case.