Skip to main content

I Scored My Dataset - Now What?

Once you have used the Receptiviti API to analyze your language data, you can use a variety of statistical methods and tools to further explore and understand the nuances of your dataset. By applying techniques like z-scoring, rank norming, and statistical tests such as t-tests and ANOVAs, you can identify patterns, differences, and relationships within the data.

Our Receptiviti UI allows you to craft visual representations such as graphs and charts to help make the data accessible and interpretable to a broader audience. Additionally, integrating these findings into your platform can make your insights actionable, providing real-time benefits. Each of these steps adds depth to your analysis, helping transform raw data into valuable insights.

Z-Scoring

Z-scoring is a statistical method used to normalize data by converting raw values into standardized scores that represent how far a data point is from the mean, measured in standard deviations.

For example, in language data analysis, z-scoring could be applied to psycholinguistic measures like word frequency or emotional tone to compare results across different consumer groups on a standardized scale. This ensures that differences are assessed relative to the variability within each dataset, rather than the raw values alone.

We would use z-scoring in this case to enable fair comparisons across groups with different scales or variances, ensuring that insights are consistent and not distorted by differences in measurement units or data distribution.

Rank Norming

Rank norming methods are statistical techniques used to transform data into ranked values, making it easier to compare groups without being influenced by outliers or skewed distributions.

For example, in language data analysis, rank norming could be applied to compare the frequency of certain psycholinguistic traits, such as emotional tone or cognitive processing, across multiple consumer groups. Unlike other normalization methods, rank norming ensures that the analysis focuses on the relative order of values rather than their absolute differences.

We would apply rank norming in this case to highlight meaningful patterns in language use while minimizing the impact of extreme values, providing a fair and robust way to compare groups across diverse datasets.

T-Tests

T-tests are statistical methods used to compare the means of two groups to determine if the difference between them is statistically significant.

For example, in analyzing language data, a t-test could be applied to compare the average level of authenticity in the language of two consumer groups, such as repeat buyers and one-time buyers. By examining the means, we can determine if the observed difference in authenticity is meaningful or likely due to chance.

We would use a t-test in this case to confirm whether the difference in language patterns reflects a true psychological distinction between the two groups, ensuring that any insights are statistically robust and actionable.

ANOVA

Analysis of Variance (ANOVA) is a statistical method used to determine if there are statistically significant differences between the means of three or more groups.

For example, in a study analyzing language data for three consumer groups, ANOVA could be used to compare psycholinguistic measures like agency, authenticity, or emotional tone across the groups. This allows us to identify which psychological traits differ significantly between the consumer groups.

We would apply ANOVA in this case to uncover the specific psychological dimensions that set each consumer group apart, helping us better understand the unique psychologies driving their language patterns.

Visualizations

See our Visualization UI section.

Large Language Models (LLMs)

Large language models (LLMs) are AI systems trained on vast amounts of text data to understand and generate language. However, on their own, LLMs cannot apply established psychological frameworks and produce psychological insights in a way that is credible, repeatable, scientifically rigorous, and rooted in measurement.

Receptiviti's scientifically validated measures provide consistent, reliable, and objective psychological assessment from language. By integrating Receptiviti with LLMs, we can enable automatic summarization and interpretation of Receptiviti scores, making language-based psychological insights easily interpretable and actionable.

We would apply this approach to enhance the scalability and accessibility of Receptiviti insights while ensuring the quality and reliability of results.

Regression Analysis

Regression analysis is a mathematical approach used to identify factors (independent variables) that are related to or predictive of a key outcome (dependent variable). Several types of regression analyses exist, including linear regression and multiple linear regression.

For example, in language data analysis, a linear regression model could be used to determine whether a call center agent's communication style (measured through concrete language, emotional tone, and highly empathetic language) predicts customer's purchase behavior.

We would apply regression analysis in this case to determine which linguistic factors have significant impact on the behavioral outcome we seek to understand.

K-Means Cluster Analysis

K-means cluster analysis is a machine learning technique used to segment datasets into clusters (groups) based on similarities and differences between data points.

For example, in language analysis, K-means clustering could be applied to a dataset of target consumers to identify distinct market segments within the broader consumer group.

We would apply K-means clustering to characterize the personas of each market segment, enabling marketers to optimize their strategies for better engagement and alignment with their audiences.

Build a Machine Learning (ML) Model

ML models are algorithms that learn patterns from training data to make predictions or classifications when analyzing new data. Receptiviti scores can be used as features in ML models to enhance predictive capability, accuracy, explainability, and interpretability by incorporating scientifically validated psychological insights from language.

For example, an ML model could use Receptiviti scores such as anger, fear, affiliation drive, and cognitive processing to predict which employees may be experiencing impaired well-being and burnout. These insights could then inform targeted, proactive interventions to support employee engagement.

We would apply this approach when building data-driven predictive models that benefit from psychological insights to enhance decision-making and strategic outcomes.

Correlation Analysis

Correlation analysis helps identify which linguistic measures are associated with outcome or metadata variables such as performance, satisfaction, or approval ratings.

These correlations not only help identify which measures are associated with an outcome, but also clarify the direction of that relationship — whether an increase in the linguistic measure is linked to an increase (positive correlation) or a decrease (negative correlation) in the outcome variable.

These relationships are typically measured using the Pearson correlation coefficient (r), which quantifies the strength and direction of a linear relationship between two variables. Positive values of r indicate that the measure increases with the outcome, while negative values suggest it decreases. The closer the value is to ±1, the stronger the relationship.

Correlation analysis is especially useful when combined with other techniques like z-scoring, as it allows you to identify patterns without requiring large sample sizes or complex models, making it an accessible tool for initial exploration and insight generation.

Absolute r ValueInterpretation in Language-based Psychology Research
0.10–0.19Weak - likely noise or context specific
0.20–0.29Modest but reliable – typical for single LIWC features
0.30–0.39Strong – notable for language-based traits
0.40+Very Stong – relatively rare when using single category correlations and generally less realistic when working with language data
info

See our Score Interpretation Guide for further information about score interpretation.