Knowledge Base

The Knowledge Base explores the deeper scientific and methodological foundations behind our language-based measures and frameworks.

Each topic addresses a focused question about interpretation, context, or methodology, pairing concise summaries with deeper discussions. These entries draw from psycholinguistic theory, statistical validation, and applied research to help you interpret results responsibly and confidently.

Use this section as a companion to the main documentation when you need insight into the reasoning, assumptions, or science that underpins the frameworks.

How do self-reported and peer-reported personality insights compare with language-based personality and psychology insights?

In general, Receptiviti's language measures will correlate more strongly with objective behavioural measures....(click to expand)

In short

Language-based personality insights correlate more strongly with behavioural outcomes than with self-reports, which are shaped by bias and self-perception. Because language is a direct behavioural signal, it offers a more objective view of how people actually think and interact — behaviour best predicts behaviour, while surveys best predict survey responses.

Detailed explanation

In general, Receptiviti's language measures will correlate more strongly with objective behavioural measures (e.g., job performance) than self-report measures. This is due to the biases inherent in self-report and the methodological differences between behaviour measures (e.g., language measures and behavioural outcome data) and survey inventories (i.e., self-report). Language-based scores are behavioural data in and of themselves. Language is a core behavioural mechanism for interpersonal engagement that reflects how people think, communicate, and respond in the environments that matter. Behaviour is the best predictor of behaviour, and surveys are the best predictors of survey measures due to common method variance.

For more information, the section Correlations with Self-Reports in the Big 5 documentation describes the process of using survey-based self-reports and peer-reports as a method of validating Receptiviti personality measures.

Regarding how survey-based psychological analysis and language-based psychological analysis complement each other, there are several concepts to keep in mind.

Self-reported personality data collected through surveys is not ground truth data. In other words, self-reported personality assessments are not a perfect reflection of a person's “true” personality and, thus, are not a gold-standard that language-based assessments aim to replicate. Rather, self-reported and language-based assessments offer different perspectives on an individual's character, each with their own insights.

Self-reported personality surveys reflect how a person sees themselves (i.e., their self-concept), and, thus, come with various biases and inaccuracies, including:

Social Desirability Bias: In order to portray themselves favorably, respondents tend to both consciously and subconsciously answer questions in a manner they perceive as socially acceptable rather than provide truthful responses. For instance, individuals often exaggerate positive traits such as friendliness or diligence, while minimizing negative traits like aggression or selfishness.
Self-Other Knowledge Asymmetry: Individuals have more information with which to judge their own internal traits or behavioural expressions (e.g., feelings, thoughts), compared to their own external traits or behavioural expressions (e.g., voice, gestures). On the other hand, external peers have less information about others’ internal thoughts and feelings, but they have more information about external-facing behaviours. This asymmetry in information between self and others causes an asymmetry in accuracy when assessing internal and external traits via surveys. Said differently, self-reported personality tends to be more accurate when individuals are reporting traits that are more internally-focused (like neuroticism) and less accurate when assessing externalized traits (like agreeableness).
Reference Group Bias: Individuals’ self-perception and reported personality can be skewed by the characteristics and norms of their immediate social circles. For instance, an individual surrounded by introverts may perceive themselves as extraverted because they are more sociable than those within their immediate social circle. However, when assessed against societal standards, it may be clear that they exhibit characteristics more aligned with introversion.

Unlike self-reported personality, language-based personality insights describe how a person comes across to other people in a specific context (or if enough language data is collected, more generally) based on their verbal/written behaviour. Using language also comes with many benefits, including:

Minimizing Response Bias: Psycholinguistic analysis focuses on function word usage. Because function words are processed largely subconsciously, it makes it difficult for individuals to change their language to present themselves in a manner that is not aligned with their “true” personality. Also, when compared to individuals’ self-reported personalities, language-based personality can reveal what traits people are trying to mask or are less conscious of.
Minimizing Observer Bias: Using Receptiviti’s psycholinguistic dimensions to assess personality, you are measuring everyone with the same yardstick. By ensuring respondents are evaluated consistently, the insights derived from personality evaluations can be compared effectively.
Ease of Scalability: Given that language serves as the primary method of communication and social interaction, collecting large quantities of language data samples is straightforward and more reflective of how a person behaves in real-time/real-world contexts. Moreover, language processing is automated with Receptiviti’s API, so personality insights can be derived across hundreds of individuals in seconds.
Accounting for Self-Other Knowledge Asymmetry: In the development of our personality frameworks, we consider both peer and self-reported results. We are able to do this because (a) individuals express their more internally-focused personality traits through specific linguistic cues (b) others are able to infer personality traits through specific linguistic cues.

Here are a few resources that provide further information about the validity of language-based personality assessment:

An academic book chapter that discusses language-based personality insights, "Natural Language Use as a Marker of Personality".
A research article entitled Self-Other Knowledge Asymmetries in Personality Pathology.
Our blog post entitled Your Personality Assessment Isn’t Objective overviews the differences between personality assessment via self report vs language.

How does context impact scores? How should context be accounted for?

When building Receptiviti algorithms, we use ingredients that are found to be sufficiently stable...(click to expand)

In short

Receptiviti’s models balance trait-like stability with context-sensitive variability, identifying which language signals reflect enduring psychological patterns and which shift with situation. Context isn’t removed—it’s measured and normed—so insights remain reliable across domains while staying sensitive to setting, tone, and purpose. Adequate sample size (≈900 words or multiple shorter texts) and proper norming ensure results are stable, interpretable, and comparable across contexts.

Detailed explanation

When building Receptiviti algorithms, we use ingredients that are found to be sufficiently psychometrically stable across contexts. The goal is not to eliminate context, but to clarify which aspects of language reflect enduring psychological traits and which capture temporary states. Our psycholinguistic models and LIWC are grounded in decades of research showing that certain language patterns consistently reflect underlying traits, even as surface-level topics, settings, or populations vary. By balancing trait-like stability with context-sensitive (state-like) variability, insights remain both reliable in reflecting psychological patterns and responsive to the nuances of specific situations. This enables the ability to create personas and perform informative longitudinal analyses, such as monitoring behavior change, wellbeing, or group cohesion. In summary, over-controlling for context while developing language-based psychological measures risks stripping away the very signal the measures are designed to detect, since measures are designed to capture state-dependent responses as well as traits.

Info

Any behavioral indicator of personality is expected to vary across contexts while maintaining rank order stability (e.g., an extravert will be less chatty in a library versus a restaurant, but they’ll still be more talkative in the quiet setting than an introvert in the same context).

Sample size plays an important role as well. The more language data collected from a person, the more representative and reliable the signal becomes, allowing trait-based patterns to emerge despite natural variation in context. For example, analyzing a minimum of 750 to 1,000 words in a job interview often provides a strong signal of how someone is likely to approach their work. Reviewing a few earnings call Q&A sessions can give a meaningful picture of an executive’s leadership style and external communication approach.

For example, the chart below illustrates an analysis that reviews excerpts from executives/leaders with a given word count and demonstrates how closely the excerpt corresponds to how the same people use language in general. This highlights that above a word count threshold of ~900 words, there is good stability in leadership profiles in that the excerpt is highly correlated to the baseline.

For those who do not have samples of text with 900+ words per person, another way to build robust profiles is to collect text data across multiple interactions (moments) or contexts. In this article, we test and outline how many short text samples (100-200 words per sample) and medium-length texts (350-450 words per sample) collected across multiple communication contexts are required to determine a stable personality profile. Additionally, it is important to account for context through base rates, norming, and interpretation.

Context, especially whether the data is written or spoken, can influence the expected base rates for proportion-based category measures. Base rates represent the mean scores of a proportion-based measure (read more about base rates here). They are important to understand for each specific use case or data source type. For instance:

People typically write more analytically than they speak (Written mean ≈ 0.6; Spoken mean ≈ 0.5).
CEOs tend to use significantly higher rates of first person plural and clout language than the general population.

Base rates provide a primary method for interpreting proportion-based scores. A score higher than the base rate suggests elevated usage relative to the context the base rate was derived from. A lower score suggests reduced usage relative to that same context.

Norming allows customers to baseline their normed measure scores against a dataset that is representative of a particular context. Through norming, customers are able to compare a language sample to others in similar contexts, helping clarify whether the language patterns captured within the sample are situational or indicative of underlying traits. Base rates, or means, are one of the core statistics used to create norming tables for normed measures. Scores for normed measures are standardized using Z-scoring, which transforms raw scores into values that reflect distance from the mean in standard deviation units. This creates a normal distribution, which is then projected onto a 0 to 100 scale. Customers can norm scores against their own data, which allows the norms to directly reflect their unique context, or they can use Receptiviti’s Spoken or Written norms. These norms are curated to represent how people typically express themselves across a variety of common spoken or written contexts (read more about norming options here).

Consider this example:

Click here for the examples

Text	LIWC Affiliation (Proportion-based)	Drives Affiliation (Normed)
Excerpt: What really impressed me was how naturally the phone fit into a night of hanging out… What really impressed me was how naturally the phone fit into a night of hanging out, talking, and just being with friends. We were all piled onto the couch, ordering food, catching up on each other’s lives, swapping stories, teasing each other, and drifting in and out of serious conversations. It felt easy and real, and somehow the phone actually added to that energy instead of pulling us out of it. Someone would grab it to show a photo from a recent trip, FaceTime someone who could not be there so they could still feel included, or pass it around to show a meme that had us all laughing. It flowed in and out of the group like it belonged there. The camera captured those little moments, even in bad lighting or when someone was caught mid-laugh, without needing anyone to stop and adjust settings. The sound was clear enough on speaker that it felt like our friend on the other end of the call was sitting right there with us. Even texting in the group chat, adding reactions, or sharing a random thought felt like part of the bigger conversation happening in the room. What I loved was how it never pulled focus. It supported what we were already doing: talking over each other, finishing each other’s sentences, listening, reconnecting, and staying present. The design made it easy to pass from one person to the next without it feeling like a disruption. It just blended in and kept things flowing. The performance held up without a hitch. No lag, no overheating, no weird glitches even after hours of use. Everything ran smoothly in the background, which let us stay focused on the moment instead of fiddling with settings or waiting for apps to load. In a time when tech can easily make people feel distant or distracted, this felt different. It helped us stay close, kept everyone in the loop, and made it easier to be together in a way that felt natural. For once, the phone felt like part of the group. Not something we were hiding behind or escaping into, but something that brought us closer and helped us stay connected to each other.	.065	96.53
Excerpt: What impressed me most was how consistently smooth and responsive the phone felt… What impressed me most was how consistently smooth and responsive the phone felt across everything I used it for. From the moment I turned it on, the setup was fast and intuitive. Within minutes, I had my apps, settings, and preferences in place, and I was able to jump right into my usual routines without any friction. There were no delays, no confusing steps, and no unnecessary prompts. It was refreshingly seamless. The display is one of the best I have used. It is bright, crisp, and incredibly sharp, with excellent color accuracy and clarity. The high refresh rate makes scrolling feel natural, almost like the content is moving with your hand. Whether I was reading articles, flipping through photos, or watching videos, everything looked smooth and felt fast. I put the phone through a full range of tasks, including streaming high-resolution video, using real-time navigation, editing and exporting images, and keeping multiple apps open in the background. Not once did it lag, crash, or freeze. The phone stayed cool the entire time, even during extended use. Battery life held up impressively well. I charged it overnight, used it heavily all day, and still had power left the next morning. During a group call with friends, the audio and video stayed sharp throughout, which made it easy to stay connected and feel part of the moment without technical hiccups getting in the way. The build quality deserves a mention too. The materials feel premium, the design is sleek, and the device has a solid, balanced weight that makes it comfortable to hold for long periods. It is not too heavy, not too light, and it fits naturally in your hand. The camera system performed well in all conditions, including low light and fast motion. Shots came out clear, colors were true to life, and processing was quick without needing extra adjustment. Overall, what stood out was the thoughtfulness behind each feature. Nothing felt tacked on for show. It felt like every part of the phone had a purpose, and that purpose was to make everyday use smoother, faster, and more enjoyable. The performance was reliable from start to finish, and everything just worked.	.005	32.79

Given that the base rate for LIWC Affiliation in written data is approximately .02 (or 2%), and the average for all normed measures is 50 (due to the normal distribution), the author of Sample 1 demonstrated an extremely high level of affiliation drive, whether using proportion-based or normed scores. In contrast, the author of Sample 2 showed a moderately low affiliation drive. Since we used general written base rates and norms, the interpretation reflects a comparison to (or the context of) how people typically write.

To close the loop, the context of the data does not affect the analysis process beyond informing target sample size and expected base rates or choice of norms, which serve as the reference points during interpretation. Interpretation is the best way to document context.

What sample size is required to produce meaningful insights?

It depends on what you consider meaningful. The ideal sample size...(click to expand)

In short

Meaningful sample size depends on your goals. Smaller datasets can still yield valuable insights when word counts are sufficient, revealing patterns within individuals or small groups. For more rigorous or quantitative work, larger samples naturally strengthen reliability and generalizability, though the ideal size ultimately depends on your analytical approach.

Detailed explanation

It depends on what you consider meaningful. The ideal sample size ultimately depends on your goals and how statistically rigorous any analysis conducted on top of the Receptiviti output needs to be. Smaller sample sizes can provide useful signals, especially in exploratory or qualitative phases. Receptiviti results can be interpreted meaningfully at the individual sample level, as long as there is sufficient word count. For example, analyzing a single person’s language can provide a reliable view into that individual’s psychological profile or persona. Receptiviti can also provide meaningful insight when used to analyze multiple samples in a small dataset. For example, when analyzing a dataset of language samples derived from 5 to 10 participants, you can:

Look at convergence and divergence in a group. Do people show a wide range of thinking styles, or are they consistently more intuitive than deliberative?
Compare predefined subgroups. How do the language profiles of consumers who rate a product highly differ from those who rate it poorly? (see this analysis of Ozempic and Saxenda reviewers) How do the dynamics of two teams with different performance outcomes differ?
Generate a group-level snapshot. On average, are participants in a focus group about Product X more people-focused or task-focused? Do members of a team display traits that support an innovative culture, such as being open to change and curious?

If you are aiming to conduct statistical analysis on top of scores, recommended sample sizes are guided by the research methods commonly used in your field of work or for that particular data analysis approach. In general, more participants or samples support greater reliability and statistical power (and are more representative of a given population). This is not an aspect of Receptiviti, but rather a consideration of the statistical principles and methodological standards that guide data analysis practices more broadly.

Should I collect or use any metadata?

The answer to this question is entirely dependent on the goal of your analysis...(click to expand )

Detailed explanation

The answer to this question is entirely dependent on the goal of your analysis.

If you are trying to, for example, understand the differences between the dynamics of two teams, then collecting a metadata variable like team name or id would be required. If you are doing a cluster analysis and would like to produce market segments based on both psychographic and demographic variables, then including metadata like age can be helpful.

If metadata is not explicitly related to what you are interested in, then it is not something to include.

Note

If you are using more advanced statistical analysis methods to pull out statistically significant patterns in the data, it is also an option to include demographics as control variables in your models and analyses.

How do self-reported and peer-reported personality insights compare with language-based personality and psychology insights?​

Detailed explanation​

How does context impact scores? How should context be accounted for?​

Detailed explanation​

What sample size is required to produce meaningful insights?​

Detailed explanation​

Should I collect or use any metadata?​

Detailed explanation​

How do self-reported and peer-reported personality insights compare with language-based personality and psychology insights?

Detailed explanation

How does context impact scores? How should context be accounted for?

Detailed explanation

What sample size is required to produce meaningful insights?

Detailed explanation

Should I collect or use any metadata?

Detailed explanation