The Likert scale, named after psychologist Rensis Likert (1932), helps measure people’s attitudes, opinions, or feelings, by asking them to rate how much they agree or disagree with specific statements.
Each response is given a number so that feelings can be turned into quantitative data.
In its final form, the Likert scale is a five (or seven) point scale that is used to allow an individual to express how much they agree or disagree with a particular statement.
Key Takeaways
- Definition: A Likert scale is a survey tool that measures how strongly people agree or disagree with a statement, typically using a 5- or 7-point scale. It’s one of the most common methods for capturing attitudes and opinions in research.
- Structure: Each item presents a statement followed by ordered response options, such as “strongly disagree” to “strongly agree.” These responses are treated as numerical values for easier comparison and analysis.
- Purpose: The scale helps researchers quantify subjective opinions, turning feelings and attitudes into measurable data. This makes it possible to track patterns and differences across individuals or groups.
- Design: Effective Likert scales use clear, balanced statements and an equal number of positive and negative response options. Good design improves reliability and reduces bias.
- Analysis: Responses are often averaged or summed to create overall scores representing a participant’s attitude. Researchers may use descriptive statistics or tests like t-tests and ANOVA to interpret the results.

I believe that ecological questions are the most important issues facing human beings today.

A Likert scale assumes that the strength/intensity of an attitude is linear, i.e., on a continuum from strongly agree to strongly disagree, and makes the assumption that attitudes can be measured.
For example, each of the five (or seven) responses would have a numerical value that would be used to measure the attitude under investigation.
Examples of Items for Surveys
In addition to measuring statements of agreement, Likert scales can measure other variations such as frequency, quality, importance, and likelihood, etc.
| Agreement | Strongly Agree | Agree | Undecided | Disagree | Strongly Disagree |
| Frequency | Always | Often | Sometimes | Rarely | Never |
| Importance | Very Important | Important | Moderately Important | Slightly Important | Unimportant |
| Quality | Excellent | Good | Fair | Poor | Very Poor |
| Likelihood | Almost Always True | Usually True | Occasionally True | Usually Not True | Rarely True |
| Likelihood | Definitely | Probably | Possibly | Probably Not | Definitely Not |
Design
Creating reliable and valid Likert items and response options is a cornerstone of designing effective self-report measures and psychological tests.
The process requires meticulous attention to item wording, response structure, and rigorous psychometric testing to ensure that the instrument consistently and accurately measures the intended construct.
Here is an elaboration on the principles and practices for creating reliable and valid Likert scale instruments:
1. Establishing Reliability and Validity (The Foundations)
Reliability and validity are two fundamental properties that must be demonstrated for any psychological test, including those using Likert scales.
Reliability (Consistency)
Reliability refers to a measure’s ability to produce consistent results under the same conditions.
If a test is reliable, it yields the same result each time it is administered to a specific person or group. To demonstrate reliability:
- Internal Consistency (Cronbach’s α): This is the most common measure for multi-item Likert scales, reflecting the degree to which different items measuring the same thing correlate with one another.
- Target Value: A scale where items average an internal consistency score of greater than 0.8 is considered very good.
While a value between 0.7 and 0.8 is generally acceptable, lower values may be expected when dealing with diverse psychological constructs. Achieving a perfect score of 1.0 is problematic as it indicates redundant items that waste the respondent’s time. - Refinement: To improve internal consistency, researchers check the Corrected Item-Total Correlation.
If an item has a correlation of less than approximately 0.3, it should be considered for deletion, as it does not correlate well with the overall scale score.
Researchers also check the Cronbach’s Alpha if item deleted value; if removing an item substantially increases the overall alpha, that item should be deleted to improve the scale’s reliability.
- Target Value: A scale where items average an internal consistency score of greater than 0.8 is considered very good.
- Test-Retest Reliability: This is tested by administering the measure to the same group of people at two different points in time to see if they yield similar scores.
This is crucial for measures that presume to assess stable characteristics, such as personality or intelligence.
Validity (Accuracy)
Validity refers to the degree to which a test actually measures what it is designed to measure. Validity is a critical assurance that the resulting scores accurately reflect the construct of interest.
- Content Validity: This ensures that the questions asked (items) cover the full range and content of the construct intended to be measured.
- To establish content validity, test developers should consult with experts in the field to determine the necessary concepts and themes to be targeted by the questionnaire.
- Construct Validity: This relates to whether the observed measurement truly captures the underlying, unobservable trait or concept (the construct).
- Factor Analysis is a key statistical tool used to assess construct validity by determining if the items load onto the intended theoretical factors or sub-components of the scale. If items are removed during the refinement process, the factor analysis should be rerun to check that the factor structure still holds.
- Criterion/Concurrent Validity: This involves comparing the new Likert scale scores with scores obtained from existing, proven measures that assess the same or a closely related construct. High correlation between the new test and the established test suggests good concurrent validity.
- Face Validity: This addresses whether an assessment tool looks valid on the surface. However, high face validity can be a drawback in certain contexts (e.g., job screening) because when it is very obvious what a test is measuring, test takers can easily bias their responses to appear more favorable (e.g., through impression management).
Item and Response Option Design
The construction of the individual items and their corresponding response choices is crucial for maximizing both reliability and validity.
Item Development and Wording
- Iterative and Qualitative Development: Items should be based on prior research, theory, or qualitative data derived from interviews or thinking aloud studies.
Qualitative methods are often appropriate in the early phases to identify descriptors that are then used as the basis for developing quantitative measures. - Clarity and Simplicity: Questions should be direct and simple to answer. Complex ideas or behaviors should not be reduced merely to a number or percentage without context.
- Cognitive Interviews: Before finalizing a new instrument, researchers should conduct cognitive interviews with a small sample of respondents (typically 5–25 people).
The purpose is to explore the cognitive processes respondents use when answering, helping the researcher clarify the intent of questions and identify problems with wording, readability, item sequence, or length.
Difficult or confusing words should be identified and rephrased to increase clarity and reduce possible errors in responses.
Response Options and Bias Reduction
- Scale Format: Likert scales typically present a statement and ask respondents to indicate their level of agreement on a numbered scale.
Common ranges include 1 to 5, 0 to 7, or 1 to 7. The numerical responses, though strictly ordinal, are typically treated as if they represent equal differences in the property being measured (interval-scale principles). - Reverse-Phased Items (Reverse Coding): It is critical to include reverse-phrased items (or negatively worded items).
- Purpose: Reverse-phrased items help reduce response bias, such as the tendency for respondents to simply agree with every statement (acquiescence). By using both positive and negative phrasing, participants are forced to actually read and consider the items.
- Scoring: Responses to these items must be reversed-coded during analysis to ensure a high score consistently reflects a high level of the construct being measured.
- Avoiding Response Sets: Researchers must be vigilant against response sets (or test-taking attitudes) which can compromise the accuracy of the data.
- To disguise the true purpose of the scale and prevent participants from altering their answers, “filler items” (which are not scored for the construct) can be added.
- Design should aim to mitigate tendencies toward over-reporting (e.g., symptom exaggeration), under-reporting (e.g., faking good), or carelessness in responding.
- Ordering Effects: Researchers must acknowledge that the order of questions may influence the resulting data, particularly in attitude or risk perception surveys. This necessitates taking questionnaire design factors into consideration when interpreting findings.
Scale Structure and Comprehensiveness
Likert scales are frequently part of larger instruments (questionnaires or inventories) used to measure complex constructs.
- Multi-Item Measurement: To measure complex psychological constructs (like personality or job satisfaction), instruments often consist of many items (e.g., 45 items for authenticity, 75 items for systemizing quotient, or over 500 for complex psychopathology tests like the MMPI).
- Balance of Length and Comprehensiveness: Test developers must achieve a balance, striving to make the test as short as possible but also as comprehensive as necessary to capture the phenomenon well. For example, a measure of irritability should use enough items (e.g., 10–20) that assess various aspects of irritability to fully cover the construct.
- Use of Multiple Measures: Because any single scale is imperfect (people can lie, or be influenced by moods or situational factors), well-being scientists and researchers often use self-report scales alongside other assessment methods, such as reports from family or biological measures, to overcome shortcomings and contextualize the data.
Analyzing Data
The response categories in the Likert scales have a rank order, but the intervals between values cannot be presumed equal. Therefore, the mean (and standard deviation) are inappropriate for ordinal data (Jamieson, 2004).
Statistics you can use are:
- Summarize using a median or a mode (not a mean as it is ordinal scale data ); the mode is probably the most suitable for easy interpretation.
- Display the distribution of observations in a bar chart (it can’t be a histogram because the data is not continuous).
Critical Evaluation
Advantages
The self-report format of Likert scales, particularly when utilized within questionnaires and surveys, offers significant practical and analytical benefits:
- Ease and Cost-Effectiveness: Likert scales are generally straightforward and easy to administer, whether via paper, pencil, or electronic means. They are a simple and cost-effective approach to personality assessment. They allow for the collection of data from large numbers of people in a relatively short amount of time, and they can be completed without a clinician present.
- Increased Generalizability and Representativeness: Since surveys enable researchers to gather data from larger samples, the findings are better able to reflect the actual diversity of the population, enhancing generalizability.
- Objectivity and Standardization: As a type of objective test, Likert items are predetermined and offer a limited range of responses, making them amenable to objective scoring. This standardization means that results can be compared scientifically against a “norm”.
- Statistical Utility: Because the collected data are numerical (quantitative data), they allow for easier comparison and statistical analysis. Researchers frequently calculate descriptive statistics such as the mean and standard deviation. Composite scores derived from multiple items are often treated as quantitative, continuous data, allowing the use of sophisticated statistical tests (like those in regression analysis).
- Access to Internal Information: Self-report measures provide respondents with direct access to their own thoughts, feelings, and motives, information that may not be available to others.
- Promoting Truthfulness (in some contexts): Participants may be more likely to reveal truthful answers in a questionnaire format compared to a face-to-face interview, as the setting is less invasive.
Limitations
The structured and quantitative nature of Likert scales gives rise to limitations in measurement, data interpretation, and flexibility:
- Ordinal Nature vs. Interval Assumptions:
- Likert scale data are fundamentally ordinal. Ordinal data tell us the order in which things happened (e.g., “agree” is higher than “disagree”), but they tell us nothing about the exact differences between values.
- For the data to be treated as interval data (as is required for many common statistical tests), researchers must assume that equal intervals on the scale represent equal differences in the property being measured. For subjective concepts like beauty or helpfulness, ratings depend on subjective feelings, meaning this assumption is often suspect, even though the data are often treated as interval.
- Loss of Rich Detail (Reductionism): Since Likert scales produce numerical data, they inherently miss out on valuable information. If an answer is simply a number or selection on a rating scale, researchers do not know the underlying reasoning or why the participant chose that answer. This can be seen as reductionist, as complex ideas and behaviors are reduced merely to a number or percentage.
- Inflexibility and Lack of Context: Self-report measures offer no opportunities to qualify answers or expand upon and explain what each response means. Furthermore, measuring complex phenomena, such as meaning in life, can be difficult to capture through simplistic quantification.
- Interpretation Complexity (Composite Scores): Even when many items are summed to create a composite score (e.g., a multi-item measure of irritability), interpretation can be difficult because scales may overlap.
- Difficulty for Certain Populations: Using Likert-type scales (e.g., 1 to 10 ratings) implies that the respondent can translate their self-perception, intention, or sensation into meaningful numbers. This can be particularly challenging for populations with lower abstract thinking abilities, such as young children.
Potential Biases and Threats to Validity
The self-report nature of Likert scales means that data can be influenced by a person’s conscious or unconscious intentions, moods, or surrounding context, potentially compromising the validity (accuracy) of the findings.
Intentional and Unintentional Reporting Biases
These biases influence how a person chooses to present themselves or responds to sensitive questions:
- Social Desirability and Faking: Respondents may lie, misremember, or answer questions in a way that makes them look good. This is especially concerning in “high-stakes testing” (e.g., job applications or custody evaluations), where test takers are motivated to present themselves in an overly favorable way. This is known as under-reporting of psychopathology or faking good.
- Self-Enhancement Bias: People are motivated to ignore or downplay less desirable characteristics and focus instead on positive attributes.
- Response Sets/Test-Taking Attitudes: These are systematic patterns of responding that can influence the accuracy of the assessment data:
- Acquiescence: The tendency to agree with nearly every item, regardless of content.
- Over-reporting/Faking Bad: The patient attempts to present themselves in an overly negative or unfavorable light, potentially due to seeking attention or secondary gain.
- Carelessness or Inconsistency: The patient is not paying attention or responding randomly.
- Face Validity Risk: If a test has high face validity (meaning it is obvious what the test is measuring), test takers can easily bias their responses to align with what they think the researcher or interviewer expects. For example, job applicants for a sales manager position might exaggerate their social skills if the test obviously measures gregariousness.
- Bias Blind Spot: Respondents often believe they themselves are less susceptible to biases than the “average American”.
Contextual and Measurement Biases
These biases stem from the structure of the questionnaire or the context in which it is given:
- Reference Group Effect: A person’s self-ratings are often based on how they compare themselves to their immediate sociocultural reference group. This relativistic comparison can distort the rating when compared to an absolute standard of the trait being measured.
- Wording and Ordering Effects (Undue Influence): The sequence and phrasing of questions can significantly influence the resulting data, potentially creating a “measurement artefact”. Researchers must be aware that collecting data or asking questions in a way that influences the response constitutes undue influence. Studies have shown that manipulating the ordering of comparison questions significantly alters reported levels of unrealistic optimism (UO).
- Impact of Mood and Situational Factors: Self-report scales can be influenced by current moods or momentary situational factors. Subjective experiences, such as distress, are likely to bias answers regarding stressors or resources available.
- Outlier Sensitivity: Although the mean is often the preferred measure of central tendency for statistical analysis, it is very sensitive to the effects of outliers.
- Cultural Applicability: While Likert scales and related objective tests are used globally, there is a risk of bias when applying instruments developed in one culture to members of widely divergent ethnic/cultural groups. Cultural differences, such as whether family or friends are the primary source of wellbeing, can affect how items load onto factors, influencing statistical conclusions.
References
Bowling, A. (1997). Research Methods in Health. Buckingham: Open University Press.
Burns, N., & Grove, S. K. (1997). The Practice of Nursing Research Conduct, Critique, & Utilization. Philadelphia: W.B. Saunders and Co.
Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical Education, 38(12), 1217-1218.
Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology, 140, 1–55.
Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of personality and social psychology, 46(3), 598.