Face validity refers to the degree to which a test appears to measure what it is intended to measure. This is often determined from the perspective of the test-taker or experts.

Face validity is not a technically rigorous form of validity, meaning it is not a guarantee that a test is actually measuring what it is supposed to. It is primarily a matter of public perception.

While it does not guarantee that a test is actually measuring what it is supposed to, it can enhance test-taker cooperation, public acceptance, and clinical utility.

Key Characteristics

Subjective: It is based on a judgment or first impression, not statistical analysis.
Surface-Level: It assesses the relevance and appropriateness of the test items at face value.
Quick Assessment: It’s often the quickest and easiest way to initially check if a measure seems suitable.
Aids Acceptance: Good face validity can increase the confidence and cooperation of the participants because the test seems relevant to them.

In This Article:

What Is Face Validity?

Face validity is the extent to which a test appears to measure what it is intended to measure.

In simpler terms, face validity is about whether a test looks like it is measuring what it claims to be measuring (Johnson, 2021).

For example, asking someone if they feel sad or depressed would have face validity as a measure of depression.
Asking someone to solve complex math problems would not have face validity as a measure of depression, even though there might be a correlation between mathematical ability and depression.

Face validity is not a technically rigorous form of validity, and does not rely on established theory for support (Fink, 2010).

It doesn’t guarantee that a test actually measures what it is supposed to measure.

However, it’s still an important consideration, particularly in applied settings, as it can impact test-taker cooperation, public acceptance of test results, and the establishment of rapport in clinical settings.

Importance of face validity

Face validity is not a technically rigorous form of validity, meaning it is not a guarantee that a test is actually measuring what it is supposed to. It is primarily a matter of public perception.

However, it is still an important consideration for several reasons:

Test-taker cooperation: A test with good face validity can increase the likelihood of test-taker cooperation. If a test appears relevant and meaningful to the test-taker, they are more likely to take it seriously and provide honest and accurate responses. Conversely, if test-takers do not see the relevance of the test, they may be less motivated to put in effort or may even try to sabotage the test.
Public acceptance: Face validity can also enhance the public’s acceptance of a test. If a test appears to be measuring what it is supposed to, people are more likely to trust the results. This is particularly important for tests that are used to make important decisions, such as educational placement or employment selection.
Clinical utility: In clinical settings, face validity is important for establishing rapport with clients. If a client feels that the assessment tools being used are relevant to their concerns, they are more likely to trust the clinician and engage in the treatment process.

Tests that appear to be face valid can give participants and researchers alike confidence that the results of the assessment are fair and equitable (Johnson, 2021).

Face validity can be used to eliminate subpar research quickly.

For example, a researcher reviewing a paper on the link between vaccinations and autism in children may reveal several shortcomings in the design of the experiment, causing the paper’s rejection of face validity.

Additionally, face validity is important for establishing other types of validity.

It is necessary for establishing the content validity of a test, which is defined as “the extent to which a test covers all important aspects of the domain being measured” (Siraj et al., 2021).

How can face validity be improved?

Here are ways to improve face validity:

Ensure test items are clear and understandable. Test-takers are more likely to perceive a test as valid if they understand what the items are asking. If the instructions or items are confusing, test-takers may feel that the test is not a fair assessment of their abilities or traits.
Use language that is appropriate for the target audience. Test items should be written using language that is familiar and understandable to the people who will be taking the test. For example, a test for children should use simpler language than a test for adults.
Make sure the test items are relevant to the purpose of the test. Test-takers are more likely to see a test as valid if they believe that the items are measuring something that is important. For example, if a test is being used to select employees for a particular job, the items should be related to the skills and knowledge required for that job.
Explain the purpose of the test to test-takers. When test-takers understand why they are being asked to take a test, they are more likely to cooperate and take the test seriously. This can be done by providing test-takers with a brief explanation of the test purpose before they begin.
Consider the cultural context of the test-takers. A test that is considered to have face validity in one culture may not be considered to have face validity in another culture. For example, a test that asks about personal experiences may not be appropriate in a culture where it is considered taboo to talk about such things.

Who should measure face validity?

Face validity is a subjective judgment about whether a test appears to measure what it is supposed to measure.

In general, however, it is best to have multiple people measure face validity, as different people may have different perspectives on what is important for measuring a construct.

Determining face validity often involves considering the opinions of both test-takers and experts:

Test-takers: The perspective of the test-taker is important for face validity because if a test doesn’t appear relevant or meaningful to the person taking it, they may not be motivated to put in effort or may even try to sabotage the test. For example, older adults may be less willing to participate in memory tests if they perceive the tasks as trivial or meaningless.
Experts: Expert judgment is also important for face validity because experts in the field can assess whether the test content appears to align with the construct being measured. For instance, content specialists can examine test items to determine if they are relevant to the intended domain of measurement.

For example, researchers might ask test-takers to rate the relevance and clarity of test items, or they might convene a panel of experts to review the test content and provide feedback on its face validity.

It is also important to note that face validity is not static; that is, what is considered face valid for measuring a construct can change over time.

For example, a personality test measuring “masculinity” and “femininity” that was developed in the 1950s may not be considered valid today, as society’s understanding of gender has changed significantly since then.

As such, it is important to review and update measures of face validity regularly.

How to measure face validity

It is important to note that face validity is a subjective judgment and does not guarantee that a test actually measures what it is supposed to measure.

However, it can be a useful tool for improving the quality of a test by identifying items that are unclear or irrelevant to the test-takers.

Gather subjective judgments from test-takers. This can be done by asking test-takers to rate the relevance and clarity of test items. For example, a researcher could ask test-takers to rate how much they agree with the statement, “This test item seems to measure what it is supposed to measure.”
Convene a panel of experts to review the test content and provide feedback on its face validity. Experts in the field can assess whether the test content appears to align with the construct being measured. For example, content specialists can examine test items to determine if they are relevant to the intended domain of measurement.
Use a systematic approach to assess content relevance. This can involve having experts match each item to the domain facet that they think the item best represents. Factor analysis or multidimensional scaling of relevance ratings by multiple judges can also be used to document the nature and degree of consensus reached and to uncover any differing points of view about the relevance of specific domain facets or item content.

When should you test face validity?

Face validity is often measured during the early stages of test development, as it can give researchers an idea of whether or not the content and format of a test are appropriate for measuring the desired construct.

However, it is important to note that face validity is only a preliminary step in assessing the overall validity of a test; other types of validity (e.g., content validity, predictive validity) must also be assessed to determine whether or not a test actually works (Fink, 2010).

Face Validity vs Content Validity

Face validity and content validity are distinct but related concepts in the field of psychometrics. While both relate to the perceived appropriateness of a test, they differ in their scope and focus.

Face validity is a superficial assessment of whether a test looks like it measures what it intends to measure. It is primarily based on the perceptions of test-takers and non-experts.
Content validity, on the other hand, is a more rigorous evaluation that considers how well the test items represent the entire domain or universe of content that the test is designed to measure.

Content validity necessitates a careful examination of the test content by subject matter experts to determine its alignment with the construct being measured.

Face validity focuses more on appearances and perceptions than on a systematic evaluation of the test content.

For example, a depression questionnaire that asks about symptoms like sadness and loss of interest would have face validity because these symptoms are commonly associated with depression.

However, content validity would require ensuring that the questionnaire adequately covers all the key aspects of depression as defined by experts and diagnostic criteria.

Face validity is often considered a subtype of content validity, meaning that a test with good content validity will typically also have good face validity.

However, the reverse is not always true. A test can appear to measure what it is supposed to (face validity) but may not actually cover the full breadth of the construct (content validity).

Here’s a table summarizing the key differences between face validity and content validity:

Feature	Face Validity	Content Validity
Definition	The degree to which a test appears to measure what it is supposed to measure.	The extent to which a test adequately samples the domain or universe of content that it is intended to measure.
Focus	Superficial appearance and perceptions of the test.	Systematic evaluation of test content.
Perspective	Test-takers, non-experts	Subject matter experts
Rigor	Subjective, less rigorous	Objective, more rigorous
Relationship	Often considered a subtype of content validity.	Can have good content validity without good face validity.
Examples	A depression questionnaire asking about sadness and loss of interest has face validity.	A math test that covers all the topics taught in a course has content validity.
Measurement	Subjective ratings from test-takers and experts.	Expert review of test items and their alignment with the construct; examination of item response consistencies and the empirical domain structure.
Importance	Important for test-taker cooperation, public acceptance, and clinical utility.	Crucial for ensuring that a test is a valid measure of the construct of interest.
Limitations	Does not guarantee actual validity; subjective judgments can vary.	Can be challenging to establish for complex constructs; requires careful consideration of the domain and its boundaries.
Other points	Some experts believe that face validity isn’t “real” validity. One source outside of those provided suggests that face validity might be external validity.	One source notes that face validity can be impacted by context effects and situational factors, which aligns with the idea of face validity as a facet of external validity.

In essence, while face validity can be a useful initial indicator of a test’s appropriateness, it is content validity that provides a more robust and reliable assessment of whether a test truly measures what it is intended to measure.

References

Fink, A. Peterson, P. L., Baker, E., & McGaw, B. (2010). International encyclopedia of education. Elsevier Ltd..

Johnson, E. (2021). Face validity. In Encyclopedia of autism spectrum disorders (pp. 1957-1957). Cham: Springer International Publishing.

McDermott, R. (2011). Internal and external validity. Cambridge handbook of experimental political science, 27-40.

Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational measurement: Issues and practice, 14(4), 5-8.

Rubio, D. M. (2005). Content validity.

Siraj, S., Stark, W., McKinley, S. D., Morrison, J. M., & Sochet, A. A. (2021). The bronchiolitis severity score: An assessment of face validity, construct validity, and interobserver reliability. Pediatric pulmonology, 56(6), 1739-1744.

What Is Face Validity?

Key Characteristics

What Is Face Validity?

Importance of face validity

How can face validity be improved?

Who should measure face validity?

How to measure face validity

When should you test face validity?

Face Validity vs Content Validity

References

Contact

our staff

topics