Sampling Protocols: How do I know that my sample represents the actual population?
Sampling is a very important consideration for all types of data collection. For audience research and summative evaluations in particular, it is important that the sample from which data is collected represents the actual population. That is, the visitors who participate in a questionnaire or interview should match the entire population of visitors. For instance, if the population of program visitors are 75% female, the sample should include approximately the same percent of females. When the study sample and the museum’s visiting population are the same, the sample has external validity. And when there is external validity, we can draw conclusions from a study’s results and generalize them to the entire population.
There are several protocols RK&A follows to work towards external validity. First, to select study participants, we use a random sampling method, and most often, a continuous random selection method. To follow the method, we instruct data collectors to position themselves in a designated recruitment location (e.g., museum or exhibition exit) and ask them to visualize an imaginary line on the floor. Once they are in place, we instruct data collectors to select the first person who crosses the line. If two people cross the line at the same time, we ask data collectors to select the person closest to them. After the data collector finishes surveying or interviewing the selected person, the data collector returns to their recruitment location and selects the very next person to cross the line. It is important for data collectors to follow this protocol every time so as not to introduce bias into the sample. For instance, data collectors should not move the imaginary line or decide to delay recruiting because the person crossing the line looks unfriendly.
Second, we record observable demographics (e.g., approximate age) and visit characteristics (e.g., presence of children in the group) of any visitor who is invited to participate in the study but declines. We also record the reason these recruited visitors provide for declining (e.g., parking meter is about to run out). These data points are important to confirm or reject the external validity of the sample because we compare demographic and visit characteristics of those who participated in the study to the demographic and visit characteristics of those who declined participation. While the data points for comparison are limited, they are still informative. For instance, a trend we have observed is that visitors 35 – 54 years are most likely to decline participation, so their voices are often underrepresented. The same goes for visitors with children, which may be a subset of those in the 35 – 54 year age group; they are often underrepresented in visitor studies. Knowing where your sample may be lacking is important context when interpreting the results.
For these two reasons, we aim to systematically recruit visitors for audience research and evaluation studies. Even for studies that use standardized questionnaires, we hire data collectors who use a random selection protocol to recruit participants and track information about those who declined. As such, we do not recommend using survey kiosks to collect data since visitors self-select to complete the survey and cannot be compared to those who decided not to complete the survey (and if you think kiosks may be preferable because you could boost the number of surveys collected, see my former post on sample sizes). Again, there are always some exceptions to these general rules described above. Yet, our goal is always to use protocols that promote external validity as well as document threats to it…because what you don’t know can hurt you.
Explore other posts within the series: