Murari Suvedi, Associate Professor
Department of Agricultural and Extension Education
409 Agriculture Hall
Michigan State University
East Lansing, MI 48824
Table of Contents
We are in an era of accountability. The demand for program evaluation information is growing. Because the need to support local developmental programs is increasing and the resources are limited, there is an increased competition among agencies. This has resulted in greater expectations for efficiencies and accountability reports of performance for organizations. Elected officials, the media, and the public have become much more demanding about accountability and receiving quality services in return for tax dollars and donations to private foundations. The U.S. Congress, state legislatures, local legislative bodies, foundations, and other funding agencies are increasingly demanding information on how program funds were used and what those programs produced. Following are some examples of frequently asked questions by the stakeholders:
"We gave you 500,000 dollars last three years--what did your agency do with them?"
"We have supported your agency for the last 15 years, why should we continue this support?"
"Are the programs of your agency effective?"
"What are you doing to improve or terminate ineffective programs?"
"What new programs need to be developed to meet the needs and problems of the people you intend to serve?"
Evaluation helps answer these questions. The main purpose of evaluation is to improve the quality of a program or a project by identifying its strengths and weaknesses. Evaluation is an important part of Extension programming. Extension programs, no matter how large or small, need to be reviewed or assessed to see if they accomplished the stated objectives. Through evaluation processes, we find out what impact the program had on the audience. How did they react? What did they learn? Was the program worth for the time, money and resources? Should this program be continued, expanded or terminated? Evaluations provide information for decisions concerning future programming. The information is useful to fine-tune the program and it is used to communicate important facts to key individuals or groups who are concerned about our service. Evaluation results are useful for formal reporting.
There is no widely agreed-upon definition of evaluation. Some authors equate evaluation with measurement. Others define evaluation as the assessment of the extent to which program objectives have been attained. For some, evaluation is synonymous to professional value judgment. Whereas others argue that it is essentially a political activity. After a careful review of all these viewpoints, Worthen and Sanders (1987) defined evaluation as "determination of a thing's value." Case. Andrews and Werner (1988) provide a fairly comprehensive definition of evaluation. According to them, "to evaluate is to make an explicit judgement about the worth of all or part of a program by collecting evidence to determine if acceptable standards have been met." This definition of evaluation has two key terms: Standards are ideals or desired qualities or conditions against which actual objectives are to be measured. Evidence is information necessary to help us confirm whether or not the required standards have been met by the program. For example, adoption of the no till practices in a watershed is the standard and percent of farmers adopting the no till practice within the first five years of the project is the evidence.
Evaluation is the process of assigning "worth" or determining the "value" of a program or activity. When we evaluate, we collect information about a program's actual inputs and/or outcomes and then compare that information to some preset standards or expectations and a judgement is made about the program or activity. It should be noted that the standards or desired qualities or conditions against which program outcomes are measured come straight from the written goals and objectives of the program.
For practical purposes, evaluations can be classified in two broad categories:
Process evaluation: It focuses on providing information for program improvement, modification, management. Some call this a formative evaluation.
Impact evaluation: The focus is on determining program results and
effectiveness
(merit and worth). It serves the purpose of making major
decisions
about a program-- continuation, expansion, reduction, and
funding.
Sometimes it is also referred as summative evaluation.
Back to Table of Contents
What is the Role of Extension Managers and Educators in Program Evaluation?
Extension managers and educators have a primary role to play in the evaluation of educational programs. Each time we plan a program, we identify the program to be conducted, write objectives we want to accomplish through the program, decide on a plan of action to meet the objectives, and develop an evaluation plan to see if the objectives were accomplished. A meaningful plan for evaluation outlines clearly the standards for evaluation or the criteria on which the program is to be evaluated? It also provides guidelines on what information should be gathered about the program, how, when, where, and by whom?
As an educational programmer, we should be knowledgeable of the basic principles and processes of evaluation so that we could empower people to perform many evaluative functions for our educational programs. Many educators plan and conduct their own evaluations. However, if we believe in participatory mode of educational programming we should function more like a midwife than as a mother. We need to guide others in the process and be able to better assess and utilize the results of external evaluations.
When Should We Conduct an Evaluation?
Where in the program's life should we conduct evaluation? Before the program is conceived? Before the program is conducted during the planning stage? During the implementation of the program? Or should we evaluate when a program concludes? It is appropriate to consider collecting evidence for program evaluation in all these stages. Frequently evaluation information is gathered in the following programming stages:
Project design stage. This is the most common stage of conducting evaluation. The form of evaluation data gathered at this stage is a needs assessment. The information is used to determine program content and set the program goals.
Program start-up. Evaluation information gathered at the beginning of the program helps us to establish a baseline from which changes in the participants or the impact on the community can be tracked. This usually involves carrying out a pretest or gathering baseline data on selected indicators.
In-progress or formative evaluation. This type of evaluation is conducted during the planning and implementation of a program to help make immediate changes or adjustments in the program and to prepare for summative evaluation. Formative evaluation helps programmers find the strengths and weaknesses in a program while it is still going on. Therefore, this type of evaluation is helpful for program improvement.
Program wrap-up or summative evaluation. This takes place at the end of a program. Frequently this is the only evaluation conducted during the life cycle of an educational program. It sums up what has occurred in the program, asks for end-of-program reactions and attempts to assess success in meeting program objectives. It is used for program accountability purposes.
Follow-up. This evaluation is conducted after participants have finished their involvement with the program. This type of evaluation looks for the longer term benefits of a program.
We do not necessarily need to be research workers in order to evaluate our educational programs. There is a wide range of degrees of evaluations from casual everyday evaluations to scientific research. Perfect accuracy is not necessary, nor is it attainable. The evaluation should be structured to serve as a learning process. Evaluation principles can be applied by all persons. We should be careful, however, in the use of evaluation principles to improve our judgements and decisions.
Assessing the Feasibility and Likely Usefulness of Evaluation
Program evaluation involves the measurement of program performance and communication of this information to policy makers, managers, and staff. Some programs are much easier to evaluate than others. Usually, Extension programs with multiple goals, objectives, and operational elements are more difficult to evaluate. For example, the goals of an Extension project might include water quality improvement, growth management, and public education in fairly large geographical area. Numerous confounding variables can be involved in the evaluation; differentiating between change induced by an Extension program and change induced by other factors is often difficult. Some aspects of change readily lend themselves to quantification, while other aspects do not. When planning an evaluation, we need to think the following questions:
1.Can the program be evaluated? Does the program has clear objectives or well-defined activities? Do you have enough resources (money, time and expertise) to undertake the evaluation? Are there clearly defined standards and indicators? Will the evaluation information be used? If your answer to these questions is "no" then it is not feasible to undertake evaluation of your program without clarifying these issues.
2.What information is needed for your evaluation? Who needs it? What level of evidence is desired? Rockwell and Bennett (1994) provide an evaluation model called "Targeting Outcomes of Programs (TOP): an integrated approach to planning and evaluation" TOP uses a single model to target outcomes, track the extent they are achieved, and evaluate program performance toward achieving them. You have to decide whether your evaluation should focus on process evaluation, i.e., gathering data about program implementation such as resources used, activities undertaken, involvement of participants like who and how many participated in the program; or you are interested in measuring impacts by collecting information on peoples' reaction, changes in knowledge, attitude, behaviors, and ultimately the social, economic, environmental condition? Evidence of program impact becomes stronger as the hierarchy is ascended. Patton (1986) also provides a pyramid of evaluation design and study criteria (see the appendix) which can be helpful in determining the level of evidence desired for an evaluation.
3.How do we get that information? What method (or a combination of methods) is appropriate? There are several methods of evaluation data collection and each has its own advantages and disadvantages. There is no one best method! We have to consider relative merits. Selection of method should be influenced by (a) type of information desired, (b) time availability, (c) cost, and (d) our own level of expertise using the method.
Evaluability assessment is an important initial step. It involves key policy makers, managers and staff members in activities that clarify program intent and program reality, including the criteria to be used in evaluating the program. It also increases the likelihood of utilizing evaluation results for program improvement.
Who Should Be Involved in Conducting an Evaluation?
Extension project (s) have different program components or
aspects.
What aspect (s) of the project needs evaluating? To answer this
question,
we need to ask a related question: who want to know what and for what
purpose?
Answering these questions involves
(a) identifying stakeholders of your project, (b) determining their
stake-- concerns that stakeholders have about the project, and (c)
defending
the aims-- identify clear evaluation goals by examining reasons for
conducting
evaluation, in light of stakeholder interest, feasibility, and
usefulness.
First, you need to determine who are the stakeholders in your Extension project. Generally, policy makers at state and local governments, program managers/directors, educators like you, donors/funding agency representatives, citizens, local leaders, representatives from collaborating agencies are interested in the outcome of our project. We need to involve them from the very beginning to ensure that the evaluation addresses their concerns. This also increases the chances that evaluation results will be used to improve the project.
Evaluation involves five major steps:
Planning: It begins with the conducting of evaluability assessment. First, we need to review the purpose statement and intended outcomes of the project being evaluated. Then we need to identify the activities designed to achieve the intended activities. Based on this review, we ask questions like: Can the project be evaluated? What information is needed? How to get the information? How to use the information?
Gathering information: It involves decision about the use of primary (i.e., the original documents, the first reporting of the facts) or secondary (i.e., that brings together facts from primary sources) sources of data. It also involves decision as to the types of sources (i.e., respondents, subjects, or documents) to be used for data collection.
Summarizing information: The information gathered need to be summarized in the form of charts, tables, case study or short vignette. Accurate and unbiased interpretation of findings in the key to the successful evaluation.
Comparing to standards: Often we tend to collect evidence and make judgements about a project without making explicit the standards that are implicit in our assessment. Standard should be stated well ahead of time so that everyone involved knows upon what criteria the project will be evaluated. Standards should not remain hidden in the minds of the evaluator. As program evaluator, we need to be objective to say whether the project accomplished what it intended to accomplish. Standards make evaluations more objective.
For example: We will be successful if 50% of the target households will participate in groundwater educational programs and half of these participants will adopt at least one groundwater stewardship practice within the first three years of the project.
Standard involves both an indicator (a marker that can be observed to show that something has changed) and a level of evidence of change (i.e., percent adoption).
Determining "worth": Evaluation essentially involves value judgement. Judgement is reached by comparing evidence against predetermined standards. This is usually documented in the form of an evaluation report. The findings are shared with the stakeholders in various format-- written report, newspaper article, videotape, radio talk, television show, and others.
Evaluation is both an art and a science. The art of evaluation
involves working with the management to agree upon purposes and users
of
results, creating a design and gathering information that are
appropriate
for a specific situation and a particular policy making context. The
science
of evaluation involves determining standards and developing indicators,
selecting methods appropriate to gather information in a systematic
way,
analyzing information to assist in determining the value of the program
in an objective manner.
Back to Table of Contents
Methods of Gathering Evaluation Data
Evaluation data could be gathered from primary and secondary sources. Primary sources involve original documents, the first reporting of the facts, and the first grouping of the raw data. Secondary sources include ways that bring together facts from primary sources.
Evaluation data is generally gathered from respondents (e.g., participants, facilitators, project managers, local residents, experts, etc.) by asking questions and collecting (orally or in writing) their deliberate responses. Sometimes, evaluation information is gathered from a subject by watching what happens in a specific context or situation (e.g., how participants react during workshops, how does nitrate level changes in a river system over time, etc.). Evaluation information could also be gathered from documents (e.g., learning materials developed by a project staff, minutes of meetings, correspondences, local newspaper cuttings, etc.) by examining their constituent parts.
How do we decide what sources to use? It depends on several factors. First, we need to consider the information availability issue. Are they inexpensive to reach? Do we have easy access to them? Are they likely to cooperate? Second, we need to consider about the credibility of the sources. Are they in the best position to report on the event? Are they likely to remember accurately? Is there reason to suspect that they might exaggerate or underplay the truth? Is their information likely to be incomplete? Do they possess the general background knowledge required to competently respond to the evaluation questions we want to be answered?
There is no one best method! We have to consider relative merits. Selection of method should be influenced by the type of information desire, time availability, and cost of using the method. Many methods could be used, but you should choose those that provide the most useful information, those you and/or your staff have the skill to use, and those that are within your budget. Last but not the least, whether the information collected will be viewed as credible, accurate, and be useful to your organization.
The Census and Vital Statistics Record: A population census is taken every 10 years. The information is available for public use. These are readily available at minimal cost and can be subscribed to on a regular basis. This information is available for each county and state.
Content Analysis: It involves using existing documentary information-- letters, diaries, photographs, records, receipts, reports, proceedings of meetings, or hearing proceedings, newspaper articles or editorials are used. These types of descriptive data provide insights into a program that cannot be observed in any other way. The information is readily available at minimal cost. Such information can be accessed on a continuing basis.
Participant Observation: Developed by anthropologists, participation observation is a method that is well described by its name. Rather than remaining detached, the participant observer lives with, eats with, works with, plays with, and may even join in rituals with the people he or she is studying. The participant observer is systematically recording information on these activities.
True participant observation requires the investigator to immerse him/herself in the life of the community being studied. This method is especially useful in the assessment of long term effects on local residents of a new industry or development program. It is very useful in determining reasons for community conflicts or misunderstandings, assessing community needs and problems, and finding acceptable ways of involving people in problem solving.
Participant observation alone will rarely provide enough information for a program evaluator. More detailed information usually must be elicited by interviewing informants. Such interviews may be particularly valuable to learn about local peoples' belief, values, motivations, power relationships, etc. The observer requires higher level skills, as the quality of information is subject to biases of the observer, to document the complex human behavior.
There are some significant ethical issues involved with the study of other humans through participant observation. Many people do not enjoy being observed-- it may make them feel self-conscious, awkward, embarrassed, etc. People have a right no to be observed if they do not want to be. Therefore, it is important to follow the following guidelines while practicing participant observation:
1. Whenever possible, ask people's permission to observe them. You can say something as simple as, "I'm very interested in learning about such-and-such from you. May I write down a few notes about our conversation?" You should assure them that their names will never be used in the report.
2. In some cases, it will not be possible to ask permission-- there may be too many people, the action may be temporary, the people may be at some distance, etc. In these cases, observe and record only behavior that is enacted in public.
3. If anyone objects to your observation or data recording, you must respect their wishes and stop your activities immediately.
An observation schedule is typically a checklist or rating scale where the occurrences of particular, pre-identified events or features can be noted. This method is useful to gather information about particular physical behaviors. It provides an opportunity to collect information directly about what is actually occurring within the natural settings. It yields very objective information. The more structured the schedule, the easier to summarize and analyze data.
This method has some limitations when one needs to gather information about people. The evaluator has less control over the situation in a natural environment. If the group is aware of being observed, behavior may be affected-- which is known as "Hawthorne effect." Different observers may not record events consistently. If the evaluator chooses to be involved in the activity, he or she may lose objectivity.
It can provide in depth information on a single unit, group or organization-- through personal discussion, mutual interaction, observation, or review of existing documents. The evaluator captures the total scene of the situation. Yin (1984) describes case studies as either explanatory, descriptive, or exploratory. With an explanatory purpose, a case study is designed to test and explain causal links in real-life programs whose complexity cannot be captured by a survey. For descriptive purposes, a case study can be used to describe the real-life context where a program takes place. Finally, a case study can be exploratory-- if a program has no clear set of outcomes, it can help identify performance measures or pose hypotheses for further evaluative work.
Case study has several advantages. It allows depth of insight into relationships and personal feelings; the information can be effectively used in combination with other methods such as survey and observation; and it can be tailored to a specific situation.
The major disadvantage of case study is the subjectivity of information. People sometime question on the evaluator's bias. In addition, it is time consuming. It requires extensive amounts of data. The findings cannot necessarily be generalized to a larger community as it focuses on a limited number of cases.
It is based on obtaining information, over time, from a community resident who is in a position to know a community well. Key informants in a community may include school superintendents, Extension educators, local leaders, Church officials, local business leaders, and members of service clubs such as Lions, Kiwanis, or Optimist International. These people could provide fairly representative information on how a program or project is serving its intended beneficiaries. The evaluator should, however, recognize the limitations of this approach that it does not use random selection of subjects and thus, is subject to information bias and representativeness.
Surveys have become very popular methods of collecting evaluative data. Surveys are used to measure people's opinion, attitudes, beliefs, behaviors, reactions, and attributes in response to specific questions. It can provide the distribution of some characteristics in a population and can usually accomplish that through surveying only a portion of the people (or units) in that population. Some qualitative methods such as focus group interviews, in-depth case studies, and ethnography cannot do the job which surveys can do.
Surveys have several advantages. They are moderate in cost and relatively easy to reach large numbers. They allow for anonymity of responses. Evaluators could also ask fairly complex questions about respondents attitude and behaviors. Data can be requested from records and other sources. Surveys allow time for respondents to reflect on events and report subsequent changes and feelings. The usefulness of survey data can be enhanced if the information is combined with other methods, i.e., observation, or case study.
Surveys could be conducted using mail, telephone, mixed-mode (mail and telephone), or administered under a group setting such as in workshops or classroom settings. Each method has its own advantages and disadvantages.
One can ask, "which one is better, mail or telephone?" Mail is
the method of choice when:
(a) size of sample is large, (b) visual display of questions is needed,
(c) educational level of respondents is high, (d) respondents are
dispersed
in a large geographical area, and (e) the budget is low.
Mail
surveys have been very popular in the past two decades. If
designed
properly, they could generate valid and reliable
information.
A mail survey, however, should be avoided if the target population has
low education, survey questions are open-ended, or sampling frames is
inadequate
or not available.
Telephone survey is the method of choice when: (a) respondents are widely dispersed geographically, (b) speed in data collection is essential, (c) sample size is small, and (d) cost is not a big factor. Telephone surveys may yield higher a response rate than mail surveys. To some extent, interviewers can explain questions not understood by the respondents. Telephone surveys, however, should be avoided if we need to ask long and complex questions and/or bias from people without telephones cannot be tolerated. The cost may be higher than a mailed questionnaire, require good interviewing skills, there is a natural bias in favor of those with listed numbers and who are usually in their home. It requires clear and simple questions. If respondent is unfamiliar with a caller, there might be indifference and/or poor cooperation.
Evaluators could use "mixed-mode surveys" to collect some data by
mail
some by telephone when (a) one method won't get an adequate response
rate,
and/or (b) faced with sampling problems. Dillman (1994) warns
that
mixed-mode surveys should be avoided when key evaluation questions
involve
attitude and/or social desirability.
Surveys may produce inaccurate results because of the following four
types of data collection errors:
1. Coverage error, which results from not allowing every person in the study population to have an equal (or known) chance of being sampled for the study. This error could be minimized by use up-to- date, an accurate list of population to be studied.
2. Sampling error, which results from the fact that only some members of the study population are asked to provide survey information. Sampling errors could be controlled by using random sampling to select members of the study population.
3. Measurement errors, which result from obtaining inaccurate answers to survey questions. Or one can question "are data valid (true)?" Such measurement errors may occur due to: (a) questions not clearly stated, (b) instructions are vague or not clear, (c) tendency of respondents to give socially acceptable answers, (d) respondents do not possess the correct information, and (e) respondents deliberately lie. We can control the measurement errors by using suitable, reliable, and valid instruments.
4. Nonresponse error, which results from some people in the survey sample not responding, and their being different from those who respond. A low response rate has been the frequent problem with the mail survey. Dillman (1994) suggests a social exchange concept to guide the survey design to improve a response rate. The principal idea behind the social exchange approach is to increase perception of possible rewards (i.e., make answering interesting, support values, provide token incentives), decrease perceived cost (i.e., time, embarrassment, mental effort) and encourage likelihood that recipients of questionnaire trusts (i.e., promote trust by showing investment, legitimate and trustworthy sponsorship) that reward will on balance outweigh costs.
When subjects can't be located or fail to respond, we could consider the following options as suggested by Miller and Smith (1984):
i) Double-dip-- List and number non-respondents; draw a random sample (10-20%); "get" their response by phone, interviews, etc.; statistically compare respondents to non-respondents. If no difference, collapse data. If different, develop a proportionately weighted formula to get "adjusted" data. Can then say results are true for sample; samples should be representative of population; so, results are valid for population.
ii) Compare early to late respondents. If no difference, results could be generalized to the population.
iii) Compare respondents to non-respondents on known characteristics. If no difference, results could be generalized to the population.
iv) Compare respondents to population on known characteristics. If no difference, results could be generalized to the population.
v) Ignore non-respondents-- Can only generalize to respondents, i.e., results are not true for the sample or the population.
As program evaluators, we should put special efforts to minimize or hold all the above four types of errors to acceptable levels while designing the evaluation.
Survey instruments could also be administered under special group situations like at the end of workshop, seminars, classrooms, etc. This approach to has two major advantages, (a) there is little or no cost in reaching respondents, and (b) the purpose of getting information can be clearly explained. The disadvantages include, (a) limited generalizability of information to a larger population, (b) it takes time away from the regular program, (c) group mood or setting at the time may affect responses, and (d) it does not allow for long term reactions and changes.
This method consists of the oral collection of information from one individual at a time through personal contact. The major advantages are: (a) you can see respondent and hear responses, (b) it is more personal than other methods, can pick up nonverbal clues, and ask for clarification, (c) you are more likely to get hold of hard-to-reach respondents, and (d) a response rate is usually very high.
The disadvantages of personal interviews are several. It is costly specially when the respondents are dispersed in a wide geographic area. It may be hard to keep respondents on track or some respondents may feel uneasy when confronted with the interviewer. It requires a trained interviewer.
Case, Andrews and Werner (1988) offer following guidelines when conducting interviews:
1. Clarify the purpose of the interview with respondents. They want to know what is expected of them, how they are selected, and if they can see the results of the evaluation. Always emphasize the confidentiality and anonymity of their responses. You may want to ask respondents for permission to take a few notes.
2. Take notes as you proceed with the interview. Sometimes the exact words people use may be important.
3. Focus respondents' attention on the question. If respondents want to talk about something else, politely but firmly refer them back to the question.
4. Ask all questions -- check if you left any questions. Be flexible when unexpected problems arise.
5. Don't cut respondents off in mid-answer, even if their remarks do not seem relevant.
6. Respect the respondent's right to refrain from answering a question.
Tests, or examinations are tools to measure the level of knowledge, understanding, and ability to apply knowledge by an individual. They can provide an indication of level of knowledge and other changes related to a particular program. They are relatively easy to implement and can usually be carried out in a group setting.
The major disadvantage of tests in out-of-school settings is that people, mainly adults, often resist attempts to test their knowledge. In addition, the setting may influence the test results. If the goal of our project/program is to bring about behavioral change, then tests are not the sufficient measures to evaluate behavioral change because knowledge gain may be unrelated to behavior. It is also difficult to construct a reliable and valid test.
Holding informal conversation with a respondent is usually done in
person,
although such contact could occur by telephone. You could gather
fairly accurate information through such conversation with program
participants.
Informal conversations are casual, short and usually occur as
opportunities
arise. Respondent may be more relaxed, and they respond more
spontaneously
than in a structured interview. The major limitation of
this
approach is that note- taking may not be possible and there may not be
an opportunity to ask more than a few questions.
Back to Table of Contents
A log is a chronological record of significant events. Typically, it consists of a brief description of progressive events or steps followed, with times and dates. Descriptions should be as objective (non-interpretive) as possible. The log can be used to monitor progress and chart actual procedures followed. It provides a brief, easy-to-read, overall picture of significant events.
One has to understand the limitations of logs as evaluation data collection method. First, it provides only certain information about the activity of a project. The briefness and compactness of the log may provide misleading information at times, because details may be missing. Second, the chronological presentation may not always be the format which offers the most insight into events.
Focus groups are often used in marketing research to find out what particular component of the public needs and what they will consume. In recent years, this technique is frequently used to identify community needs and issues; obtain citizens' perceptions on a defined area of interest in a permissive, nonthreatening environment to generate program alternatives; and assess the impacts of a particular program on individuals and communities. Focus group interviewing uncovers information on human perceptions, feelings, opinions, and thoughts.
According to Krueger (1994), the focus group is a special type of group in terms of purpose, size, composition, and procedures. A focus group is typically composed of seven to ten participants and members are selected because they have certain characteristics in common that relate to the topic of the focus group.
Focus groups should be conducted by a skilled interviewer. The interviewer should create a permissive environment in the focus group that nurtures different perceptions and point of view, without pressuring participants to vote, plan or reach consensuses. Krueger (1995) suggests that the discussion needs to be relaxed, comfortable and often enjoyable for participants as they share their ideas and perceptions. The group discussions should be conducted several times with similar types of participants to identify trends and patterns in perceptions. Careful and systematic analysis of the discussions provide clues and insights as to how a product or service is perceived.
A program evaluator may consider the following steps in conducting the focus group interview:
1. Consider your purpose-- why do you want to conduct focus group interviews. Who are the users of this information? Why do they want the information? Develop a tentative plan including resources needed.
2. Identify the questions to be asked in the interview. Establish the context for each question. Arrange the questions in a logical sequence.
3. Arrange a suitable meeting place in a convenient location. It could be a meeting room in the Courthouse, at a local restaurant, or school. The location should be easy to find, relatively free from outside distractions, and have tables and chairs arranged with participants facing each other. Arrange a tape recorder and check if it records, whether it needs new batteries, blank tapes, etc.
4. Identify the audience who will be interviewed. Invite them well in advance. Explain them the purpose of the meeting and how they can contribute. Reconfirm their availability to participate in the session. Prepare name tags for each participants.
5. Identify a trained moderator (and an assistant) to conduct the focus group interview. The moderator must be mentally alert and free form distractions. He or she should help create a warm and friendly environment. The moderator should direct and keep the discussion flowing, and take few notes.
6. Arrange the meeting room for interview. Check the seating arrangements. Individuals who talk a great deal and might dominate the discussion should be seated to the moderator's side; shy and quiet participants are placed across from the moderator.
7. Conduct focus group interviews. The moderator should again explain the purpose. He or she ensures the participants about the confidentiality and anonymity of their responses. He tape records the interview. When the session is ended, he checks the tape to be sure it adequately captured the group discussion.
8. Immediately following the focus group interview, the moderator and assistant moderator discuss about the common experiences and perceptions surfaced during the interview. They should review the tape together before the next focus group interview is conducted.
9. Analyze the results by transcribing the taped discussion and summarizing what was said by the participants. The interpretations should focus on the meaning-- what does the findings mean to you. Are findings of value to the stakeholders? What recommendations are in order? Try to provide a summary of the focus group interview rather than the lengthy transcriptions of discussion.
10. Prepare a short report and share the findings with your stakeholders.
Focus group interview is relatively fast and fairly cheap technique to collect evaluative data. If conducted properly, it tends to reduce distance between projects/program personnel and the intended beneficiaries. It stimulates dialog among participants. On the other hand, focus group interviews are very easily misused. They are easy to set up, but require skill to moderate. Data interpretation is tedious and capturing the major issues surfaced without bias is often difficult. Further, results may not be generalizable to the target population.
How to Select a Sample for Evaluation Data Collection?
Evaluation of Extension projects usually involves first hand
collection
of data from people. The collection of data essentially involves
decision about the population and a sampling plan. First, let us
understand the concept of population and sample.
Population is defined a group of individual persons, objects or items
having a characteristics in common. It is the total group from
which
samples are taken for statistical measurement. Sample populations might
include all Americans voting in 1996 Presidential Election, college
Freshmen
class of 2000, and all Americans over 60 years of age.
A sample is a portion or subset of a larger group called population. A good sample is a miniature version of the population-- just like it, only smaller (Fink, 1995). The best sample is representative, or a model, of the population. A sample is representative of the population if important characteristics (e.g., age, educational level, ethnicity, income) are distributed similarly in both groups. Sampling is the selecting a smaller number of units from among the whole group concerned in such a manner that they can be used to make estimates about the whole group.
What advantage does a sample possess over a complete count? Or why sample? The answer in brief is that it is cheaper in terms of time, money, materials, and effort. Results can be accurate and precise. It can obtain data that could not possibly be available otherwise.
Sampling methods are usually divided into two types:
(1) Random or probability sampling, it provides a statistical basis for saying that a sample is representative of the target population. Samples are based on random selection of units. Every member of the target population has a known, nonzero probability of being included in the sample. It eliminates subjectivity in choosing a sample. It is a "fair" way of getting a sample.
(2) Purposeful sampling, they are chosen based on judgement regarding the characteristics of the target population and the need of the study. Some members of the target population may have a greater chance of being chosen, whereas others do not. Survey findings may not be applicable to the target population.
There are several types of random or probability samples. Following are the more frequently used random samples by program evaluators.
Simple random sampling: All the individuals in the population have an equal and independent chance of being selected as a member of the sample. We need a list of eligible units comprising a population from which to sample. This list is called a sampling frame. Members of the population are selected one at a time and independently. Once they have been selected, they are not eligible for a second chance and are not returned to the pool. One can use a computer generated lists of random number to select the sample. A table of random numbers is sometimes employed with a random starting point to identify numbered subjects.
Systematic sampling: All members in the population are placed on a
list
for random selection and every nth person is chosen after a random
starting
place is selected. Suppose you have a list of 4500 households
living
in a watershed for which a sample of 450 is to be selected for water
quality
monitoring. Dividing 4500 by 450 yields 10, indicating that you
have
to select one out of every 10 households. To systematically
sample
from the list, a random start is needed. You can toss a die to
get
a number, or consider the month of the year you were born.
Suppose
you were born in March, 3rd month of the year. This means that
the
3rd name on the list is selected first, then the 13th, 23rd, 33rd,
43rd,
and so on until 450 names are selected.
Stratified sampling: To assure that certain subgroups in the population
will be represented in the sample in proportion to their numbers in the
population, each subgroup called "strata" is separately numbered and a
random sample is selected from each subgroup or "strata." A
definite
rationale should exist for selecting any strata. It is more
complicated
than simple random sampling and using many subgroups or "strata" can
lead
to large and an expensive sample.
Cluster sampling: The unit of sampling is not the individual but rather a naturally occurring group of individuals such as classroom, neighborhood, club, and so on. The clusters are randomly selected, and all members of the selected cluster are included in the sample. Cluster sampling is used in large scale evaluations involving surveys.
Matrix sampling: A sample of people receives a sampling of questions, and other samples receive other sampling of questions.
Evaluators may have to choose purposeful or nonprobability samples for various reasons. Accurate listing of population may not be available, resources are limited to develop a sampling frame, or obtaining cooperation from potential respondents is perceived to be difficult. A purposeful sample may be chosen to be sure to include a wide variety of people based on a number of critical characteristics. Sometimes, individuals are specifically chosen to represent a specific characteristic. More frequently, evaluators choose nonprobability sample because they can be conveniently assembled. A purposive sample does not rely on random selection of units. Following are frequently used purposeful or nonprobability samples:
Accidental sampling: This is the weakest type sample, but is the easiest to get. "Man-in-the-street" interviews are typical of accidental samples. The evaluator usually uses the first five, or 10 people who happen along and are willing to talk.
Reputational sampling: It involves selecting specific people to respond to a survey or to be interviewed about an issue. The choice of an individual depends on someone's judgement of who is and who is not a "typical" representative of the population.
Convenience Sampling: a convenience sample consists of a group of individuals that is readily available for data collection. Households living near parks or schools or persons working in the factory or business are chosen because of convenience.
Snowball Sampling: This type of sampling relies on previously identified members of a group to identify other members of the population. As newly identified members name others, the sample snowballs. This technique is useful when a population listing is unavailable.
Determining Sample Size
Several factors need to be considered while determining sample size. Cost is a factor. We often need to work with the smallest sample that can still offer us adequate data. The characteristic of the population also affects sample size. If the population under study is homogenous, i.e., people possess similar characteristics, it may require smaller samples. Sample size is also determined by the size of the population and the type of analysis to be implemented.
The confidence level and the margin of error of findings are important factors in determining sample size. In general, 95% confidence level gives us the security we need when drawing conclusions from our sample. The margin of error is a matter of choice. If we want to be relatively safe in our conclusions then a 5% margin is acceptable. In general, more subjects are needed for a .01 alpha test than a .05 alpha test, and two-tailed tests require larger sample sizes than one-tailed test.
Sampling error is large when the sample is small. Therefore, researchers suggest that the best answer to the question of size is to use as large a sample as possible. The following table may be used to determine sample size base on a 5% error rate.
Population size Sample size (+/-5%) Population size
Sample
size
(+/-5%)
10 10 275 163
15 14 300 172
20 19 325 180
30 28 350 187
40 36 375 194
50 44 400 201
65 56 450 212
75 63 500 222
90 73 1000 286
100 81 2000 333
125 96 3000 353
150 110 4000 364
175 122 5000 370
200 134 6000 375
225 144 8000 381
250 154 10000 385
275 163 100000 398
Back to Table of Contents
Constructing
Evaluation
Instruments
Developing an instrument is a critical step in evaluation. The instrument reflects the type and quality of data we are collecting for our evaluation. While designing instruments, the evaluator should focus on the goals and objectives of the program to be evaluated; consider the activities undertaken to meet each objective and the audience served or reached; review the standards for program evaluation and identify indicators of program merit for each program objective; and determine how he or she is going to gather needed information. Based on these factors, the evaluator has to start the actual construction of the tools, or instruments, to collect information.
The first step in constructing an evaluation instrument is to list several questions or jot down ideas or items relevant to the indicator of program merit. As far as practicable, questions should be simple, unambiguous, and must not lead the respondent to any particular response. Usually, more question should be drafted then will be needed, because many will be rejected by the reviewers during validity check.
After specific items have been drafted, they must be assembled into instruments. The format of depends on the type of instrument. Mail questionnaires are worded differently from interview schedules. Self-reporting instruments such as mail questionnaires must be well organized and self-explanatory. Telephone interviews are again structured differently from face-to-face interviews. In telephone interviews respondent never sees the form, so visual appearance is not a concern, except to the extent that interviewers need clear understanding on using the instrument and recording data. Questions should read and be easily understood.
Surveys are popular tools of evaluation. Mail surveys are often used to determine programs' impact on peoples reactions, knowledge, opinions, skills, attitudes, and practice changes. A good questionnaire has the following characteristics:
It is easy to read. It is short. As questionnaire increases in length, respondents are less likely to complete them or, if they do, are less likely to answer the question carefully.
It introduces respondents about the purpose of the evaluation, explains why their cooperation is needed, and provides clear directions to complete and return the questionnaire.
The questions are organized in a logical order. It starts with familiar, easy questions. It is followed by items pertinent to the purpose. Questions with similar content or response formats are grouped together. Demographic information is asked near the end of the questionnaires.
Branching is clear. If some respondents are not expected to answer all questions, then indicate clearly where branching occurs.
It uses boldface or capitalized key words to reduce the chance of misreading instruction.
It asks both closed and open-ended questions. Usually a good questionnaire ends with an open-ended question, "are there any other comments or concerns you would like to mention?"
The title should reflect the content of the instrument. It should be concise and written in language easily understood by the respondents.
It comes with a cover letter. The cover letter should explain the purpose of the study and convince the respondent that the study is useful and their participation is important to the success of the study.
It should also explain the presence of an identifying number on the questionnaire in a straight forward, honest manner (to facilitate the sending of follow- ups). Further, it should explain who will be answering the questionnaire and how their responses will be kept confidentially. Finally, it should provide directions for the how and when to return the questionnaire.
It has an attractive front cover. The front cover has the study title, a graphic illustration, directions to complete questionnaire, and the name and address of the study sponsor.
Establishing Validity and Reliability
One of the most important steps in developing an evaluation instrument is to establish its validity and reliability. According to Mueller (1986), validity and reliability are the benchmark criteria for assessing the quality of instruments.
Validity asks the question, "Does the instrument measure what it purports to measure?" The following procedure is recommended to establish the validity of an instrument:
1. Clearly define what it is that you want to measure (e.g., reactions, knowledge level, people involvement, behavior change, etc.).
2. Prepare a draft of your instrument in consultation with other colleagues. Search for existing instruments related to your topic of interest to use as a guide in developing your own instrument. You may use the similar question formats and response categories.
3. Identify 5-7 persons to serve as a panel of experts for reviewing your instruments in terms of content, format, and audience appropriateness. Remember that the members of the panel should be familiar about the purpose of the study. Ask the panel of experts to review the instrument and give feedback.
4. Revise the instrument by incorporating the suggestions offered by the panel of experts.
5. Field test the instrument to find out its suitability and clarity. Select about 10 persons who are similar to the target audience to participate in the field test. Watch people complete the questionnaire. Watch for hesitation, erasures or skipped questions. Seek verbal feedback after you have watched them complete the instrument. If some respondents appear confused or hesitant to answer, find out why. Review the instrument for clarity, content, wording, and length. Based on the feedback, revise your instrument.
Reliability asks the question "Does the instrument consistently yields the same results with the same group of people under the same conditions?" Reliability looks for consistency, accuracy, and dependability of an instrument. Usually, reliability is established by conducting a pilot-test.
A test-retest method is used to establish reliability. This method involves administering the same instrument twice to the same group after a certain time interval has elapsed. According to this method, 15-20 persons having characteristics similar to the target audience (but they are different from a field test group described earlier under validity section) are asked to complete the entire instrument. After about two weeks, the same instrument is readministered to the same group of people. Responses on each questions are compared in pairs, i.e., first and second time answers from the same individual are compared. A high degree of agreement (70 percent or higher) between the paired scores indicates that the instrument is reliable (Neito and Henderson, 1995).
Internal consistency methods of estimating reliability are becoming popular mainly because they require only a single administration of an instrument. There are several internal-consistency methods of establishing reliability. Some frequently used methods are the following:
(a) Split-Half Procedure: This procedure involves scoring two halves (usually odd items versus even items) of a test separately for each person and then calculating a correlation coefficient for the two sets of scores. The coefficient indicates the degree to which the two halves of the test provide the same results, and hence describes the internal consistency of the instrument (Frankel and Wallen, 1996).
(b) Kuder-Richardson Approaches: This approach utilizes three pieces of information about a test-- the number of items in the test, the mean, and the standard deviation. It assumes that the items are of equal difficulty. Manual computation of reliability coefficient is fairly complicated. Computer programs are commonly available for testing various types of reliability coefficients.
(c) Alpha Coefficient: Cronbach alpha is another procedure to check on internal consistency of an instrument. This procedure is followed in calculating reliability of items that are not scored right versus wrong. This procedure is appropriate to establish reliability of questions asked on a scale such as Likert-type scales used to measure peoples' reactions, attitudes, or perceptions.
A frequently asked question about reliability is, "What value of reliability coefficient is adequate to establish instruments' reliability? Is a reliability coefficient of 0.50 good or bad?" A useful rule of thumb is that reliability should be at least .70 and preferably higher.
Evaluation data may be qualitative or quantitative in nature. Qualitative data include information which we typically collect by allowing questions to be answered in a person's own words. The information collected through an open-ended question in a survey, case study, informal interviews and observation, and focus group interviews are examples of qualitative data. Analysis of qualitative data is sometimes tedious and it may involve a lengthy process. This is because respondents answer in their own words, and it is often difficult to categorize, classify, and tabulate responses without losing their meaning. Analysis of qualitative data requires synthesis. It requires strong interpretative skills and human insight instead of "volume" of responses. Ethnograph software is frequently used in analyzing qualitative data. Once classified and/or tabulated, qualitative data provide in-depth insights about the project outcomes.
Quantitative data provide numerical values to each response. They are typically collected by providing a preset range of options from which respondents choose the most appropriate answer to a particular question. The instrument consists of closed-ended questions where the range of possible responses is predetermined. The quantitative data are easy to tabulate and analyze. More frequently, descriptive statistics including measures of variability (i.e., frequency, range, percentile, standard deviation) and measures of central tendencies (i.e., mean or average, median, mode) are used to present the findings. Correlation coefficient is used to determine the linear relationship between variables. Cross-tabulations are used to determine associations. Evaluators use t-test to determine differences in mean scores between two groups, and analysis of variance (ANOVA) to determine differences in mean score when three or more groups are involved. Computer software such as Statistical Package for Social Sciences (SPSS PC+), Minitab, and Mystat are used to analyze quantitative data.
Program evaluators usually make use of both types of data. They enrich their evaluation by using a combination of both quantitative and qualitative methods of data analysis. Their analysis is accompanied by visuals such as charts, diagrams, vignettes, etc.
Interpretation of evaluation results involves decision regarding what data tell us about our project particularly what it says about our project in light of the standards the evaluation is based upon. Often we tend to collect evidence and make judgements about a project without referring back to the original standards. Judgement should always be made by comparing evidence against predetermined standards. While making judgement, we need to make sure that the impacts we want to attribute to the project are in fact produced by the project and not the result of other independent factors. The judgement also involves assessment. How much impact is enough? When will we say the project achieved its standards? As program evaluators, we need to be objective to say whether the project accomplished what it intended to accomplish. If there are any unintended outcomes of the project worth mentioning, we also need to mention them in our findings.
Sharing Evaluation Results With Stakeholders
Evaluation findings are usually documented in the form of an evaluation report. These findings need to be shared with the stakeholders. There is no fixed format. We can us written, oral, or visual formats. Evaluators use a variety of formats, i.e., written report, newspaper article, videotape, radio talk, television show, and others.
Reporting of evaluation should be clear and candid. We should know our audience and keep our audience in mind (i.e., how much time do they have? what is their level of education? etc.) when preparing reports. The report should highlight the most important points. It should be short and to the point. Whenever possible, use visuals to present information.
If a written report is the choice of format, keep the readers in mind. Avoid the use of technical words. Remember that most readers do not read the entire report. Provide an executive summary at the beginning highlighting major findings, conclusions and recommendations.
Good evaluations are those whose results are used by stakeholders in their decision making. Programs are continued, altered, or modified based on evaluation results. Sometimes special action plans emerge to meet the needs of a specific group of people as a result of evaluation. Remember, good evaluations involve stakeholders at all stages-- planning, implementation and utilization of results.
Archer, T. and Layman, J. (1991). "Focus Group Interview" Evaluation Guide Sheet, Ohio Cooperative Extension Service.
Bennett C. and Rockwell, K. (1995). "Targeting Outcomes of Programs (TOP): An Integrated Approach to Planning and Evaluation." Class notes in AG*SAT Graduate Course in Program Evaluation in Adult Education and Training, University of Nebraska-Lincoln.
Case R.; Andrews, M. and Werner, W. (1988). How can we do it? an evaluation training package for development educators. British Columbia, Canada: Research and Development in Global Studies.
Contant, C. K. (1993). "Assessing What and Why: Designing and Using Evaluations Effectively for Local Level Programs." Paper presented at the Rural Nonpoint Source Pollution in the Upper Midwest Conference, March.
Dillman, D. A. (1995). "Survey Methods." Class notes in AG*SAT Graduate Course in Program Evaluation in Adult Education and Training, University of Nebraska-Lincoln.
Fink, A. (1995). How to Sample in Surveys. Thousand Oaks, California: Sage.
Fraenkel J. R and Wallen, N. E. (1996). How to Design and Evaluate Research in Education. New York: McGraw-Hill Inc.
Krueger, R.A. (1994). Focus Groups: a Practical Guide for Applied Research. 2nd edition, Thousand Oaks, California: Sage.
Mueller, D.J.(1986). Measuring social attitudes. New York: Teachers College Press.
Neito, R. and Henderson, J. L. (1995). Establishing Validity and Reliability (draft). Ohio State Cooperative Extension.
Patton, M. Q. (1986). Utilization-focussed evaluation. Newbury Park, California: Sage.
Wholey, J. S.; Harty, H. P. and Newcomer, K. E. (eds.). (1994). Handbook of practical program evaluation. San Francisco: Jossey-Bass Publishers.
Worthen, B. R. and Sanders, J. R. (1987). Educational evaluation: alternative approaches and practical guidelines. New York: Longman.
Yin, R. K. (1984). Case study research: design and
methods.
Applied Social Research Methods Series. Vol. 5. Newbury
Park,
California: Sage.
Back to Table of Contents