Ky Jackson, BS
University of Nebraska Omaha
University of Nebraska Medical Center’s Munroe-Meyer Institute

Sarah Frampton, PhD, BCBA-D, LBA
University of Nebraska Omaha

ASAT's Science CornerAccurate measurement is essential to assessing the extent to which an intervention is effective, whether that intervention is biomedical, educational, or psychosocial. Unlike other conditions that rely on automated instruments such as laboratory tests or scans to diagnose and inform treatment, the diagnosis and treatment of autism spectrum disorder (ASD) largely relies on data collected and interpreted by humans through real-time observations of behavior. Unfortunately, when we are tasked with collecting data, unreliability or inaccuracies are often possible (Ledford, 2017). Errors or inconsistencies within collected information affect what is known as the internal validity of a treatment or intervention (Ledford, 2017). Internal validity is the extent to which changes in the behaviors of interest can be attributed to the intervention being implemented, rather than other factors (Frampton, 2024). When errors in measurement procedures or tools result in inconsistent data, this is referred to as an instrumentation threat to internal validity (Kratochwill et al., 2010).

Types of Instrumentation Errors

There are three main types of instrumentation errors that can threaten a treatment’s internal validity- measurement errors, observer drift, and observer bias. To understand these errors, let’s look at a common example from a classroom. A teacher is working with their class on raising hands before shouting out. The teacher is aiming to have the students learn to raise their hand above their head and pause, without shouting out, until the teacher can call on them. The table below demonstrates an example and non-example of this behavior as defined by the teacher.

Example Definition of Hand Raising Non-Example Definition of Hand Raising
Student is seated at their workspace and extends their arm upwards, so their hand is above their head. The student keeps their hand extended upwards and remains seated quietly until called on. Student puts their hand up.

Measurement errors occur when there is any type of deviation of recorded data from the “true” value (i.e., what actually occurred; Johnston et al., 2020). Measurement errors can be caused by unclear definitions of the behavior under observation. In our hand raising example, the non-example does not specify whether the student needs to sit (or where they need to sit), or that the hand-raise must continue until they are called on. Without these details, measurement errors could occur when data collection begins. Measurement errors can also be caused by difficult or complex data collection procedures. For example, if the teacher attempted to take data on every student in the class at the same time, it would be nearly impossible to accurately capture all their hand raises, leaving the teacher with incomplete or other questionable data (e.g., data from the left side of class is more accurate than the right).

Observer drift occurs when over time, observers may unintentionally begin to incorrectly interpret and apply the definition of a target behavior in a different way than the original definition, which causes unintended errors within data collection (Cooper et al., 2020; Ledford, 2017). After a few weeks, the teacher in our example begins counting instances when students hold their hand up for only a few seconds or wave their hand wildly. Clearly, the teacher’s criteria for what counts as correct hand-raising has drifted off target. The data will now show that there is an increase in hand raising. However, rather than there actually being an increase in hand-raising, the “increase” is simply the result of the teacher’s broadening definition of the behavior being observed.

As another example, perhaps the teacher in our example initially counted instances when students would vocalize a little while holding up their hand but eventually began to require complete silence. Here, the teacher’s criteria for what counts as correct hand-raising also drifted off target. The data will now show that there is a decrease in hand raising. However, rather than there actually being a decrease in hand-raising, the “decrease” is simply the result of the teacher’s increased expectations of the behavior being observed.

Observer bias occurs when observers hold conscious or unconscious biases (expectations, preferences, beliefs, etc.) that influence how data is recorded or interpreted (Ledford, 2017). When collecting data, observer bias may cause observers to unintentionally record data that conforms to their biases, rather than the reality of what occurred. In our example, the teacher may have an expectation that girls raise their hands more than boys. As a result, the teacher might unintentionally create more opportunities for the girls to raise their hands over the boys, skewing the intervention outcomes.

Types of Instrumentation Errors
Measurement Error Observer Drift Observer Bias
Any type of deviation of a recorded measurement from the “true” value which occurred (Johnston et al., 2020). Over time, observers may unintentionally begin to incorrectly interpret or deviate from original definition of target behavior (Cooper et al., 2020; Ledford, 2017). Conscious or unconscious bias that influences how observers record or interpret data (Ledford, 2017).

 

Detecting and Minimizing Instrumentation Errors

Researchers and clinicians should take steps to prevent instrumentation errors from occurring, and quickly detect them through careful planning and data analysis (Ledford, 2017). The most commonly used indicator of measurement quality is the collection of interobserver agreement (IOA), which is determining the degree to which two or more independent observers report the same observed values after measuring the same event (Cooper et al., 2020). In other words, do two people agree that the behavior in question occurred (or did not occur) at a particular time. Following data collection, data reports are then systematically compared to determine if there are any inconsistencies. High IOA suggests the measurement system (definition of behaviors, collection type, etc.) used was reliable, and increases the believability in the collected data (Cooper et al., 2020). Poor IOA could suggest that threats in the form of instrumentation exist and need to be addressed promptly, especially before any treatment decisions are made based on these data.

Explicit and systematic training of observers is another essential step in the prevention of internal validity errors when collecting data (Repp et al., 1988). Doing so presents an opportunity to review definitions and practice data collection until consistency is observed, referred to as calibration. Ongoing re-calibration throughout data collection helps to safeguard against errors from observer drift (Repp et al., 1988).

In our hand-raising example, following an initial training, the teacher and a paraeducator both collect data during the same class period. The teacher counted 75 instances of the students raising their hands. The paraeducator, however, only counted 10 instances. Clearly, there are major discrepancies between the teacher and the paraeducator’s data, resulting in low IOA. After meeting to discuss these data, the teacher determined that the paraeducator was adhering to the precise definition at the outset of the study while the teacher’s scoring had drifted. They both re-read the definition and agree to follow it closely. They also decided to break the observation period down into 5-minute increments so they could more precisely compare their data collection. The following day, the teacher and paraeducator take data again and both record five instances of hand-raising in the first 5-minute interval and eight in the second 5-minute interval, showing perfect agreement. In the third interval, the teacher recorded 10 instances and the paraeducator recorded eight. Though not perfect agreement, this would still be considered a high degree of agreement (Kratochwill et al., 2010). These IOA data suggest the measurement system is now reliable, and the data collected can be used to make decisions about next intervention steps.

Conclusion

Poor instrumentation is a threat to all data-based decision-making, whether in the context of a research study or intervention services. If data are unreliable, it becomes difficult (if not impossible) to determine if an intervention is working or if changes are needed. If every day you were to step on the bathroom scale and see a wildly different answer, it would be extremely difficult to know what (if any) lifestyle choices to make. Making sure the scale at least measures in a consistent way each day is necessary to determine if your efforts are resulting in desired outcomes. In the same way, careful efforts must be made by clinicians and researchers to achieve reliable measurement so that data can be used to produce real, meaningful progress for individuals with autism receiving treatment.

References

Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied behavior analysis (3rd ed.). Pearson Education Inc.

Frampton, S. (2024). An overview of internal validity: Was it really the treatment that made a difference? Science in Autism Treatment, 21(08).

Johnston, J. M., Pennypacker, H. S., & Green, G. (2020). Strategies and tactics of behavioral research (4th ed.). Routledge.

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentationhttps://ies.ed/gov/ncee/wwc/Document/229

Ledford, J. R. (2018). No randomization? No problem: Experimental control and random assignment in single case research. American Journal of Evaluation39(1), 71-90. https://doi.org/10.1177/1098214017723110

Repp, A. C., Nieminen, G. S., Olinger, E., & Brusca, R. (1988). Direct observation: Factors influencing the accuracy of observers. Exceptional Children, 55(1), 29-36.
https://doi.org/10.1177/00144029880550

Reference for this article:

Jackson, K., & Frampton, S. F. (2025). Science Corner: Instrumentation as a threat to internal validity. Science in Autism Treatment, 22(1).

Other Internal Validity Articles:

Other Science Corner Articles:

Other ASAT Articles:

 

#Researchers #SavvyConsumer #Educators #Parents

 

Print Friendly, PDF & Email