Chapter III - Supporting Data: Collection and Presentation
Since the documentation of teaching must be selective to avoid unnecessary bulk in a tenure file and to increase the efficiency of the review process for all involved, it is imperative that the supporting data presented be collected in a way that reduces bias, is representative and yields valid measurements. It is also important to understand that the manner in which the data is presented can itself bias the evaluative outcome. Fortunately this is an area that has stimulated a very prolific amount of research. This chapter will provide a synthesis of some of that work, focusing on two major sources of evaluation data: students and peers.
Use of Student Evaluation Data
It has been argued that students are not valid sources of evaluation information,
that their numerical and written responses on questionnaires used to make
tenure and promotion decisions are based on superficial criteria, like
appearance and popularity. This assumption has not been empirically supported.
"Based on the findings of the meta-analysis, we can safely say that
student ratings of instruction are a valid index of instructional effectiveness.
Students do a pretty good job of distinguishing among teachers on the
basis of how much they have learned. Thus, the present study lends support
to the use of ratings as one component in the evaluation of teaching effectiveness.
Both administrators and faculty should feel secure that to some extent
ratings reflect an instructor's impact on students."1
Key in the use of student data is the notion that as a data source it
is only one component available for a committee to make an informed judgment.
When incorporated into a thorough teaching analysis, student evaluation data
is useful not only because it represents the learner's perspective, but it
can stand to round off a picture of a candidate's teaching quality when
presented in relationship to peer data and data supplied from the candidate's
own perspective.
Student data can be solicited and presented in both quantitative and qualitative forms. Quantitative data, in the form of numerical student evaluation questionnaire scores, was the more prevalent form in the tenure files analyzed during the preparation of the report Evaluation and Recognition of Teaching. Qualitative student evaluation data, in the form of written letters of evaluation, appeared in less abundance, although when such data were provided it took up a lot of space in the file, as in one case where a tenure file contained 88 student evaluation letters.
The potential to statistically manipulate quantitative data is a mixed blessing. Quantitative data can be very efficient in that many teaching factors from many individual perspectives can be presented in relatively little space. However, questionnaire items must be validated to insure what is being measured by the question is, in fact, what the question purports to measure. In the analysis of tenure files that was conducted at Cornell, many of the guidelines listed below were not followed which resulted in a students' picture of the candidate's teaching that was not only limited, but also possibly biased.
Instrumentation
Research in the area of student evaluation of instruction has resulted
in the publication of more than 2,500 studies. Much has been learned about
proper questionnaire design. One finding is that the purpose of the evaluation
should determine the format and kinds of questions included in the evaluation
instrument. In general, summative evaluation questionnaires designed for tenure
and promotion decisions and administered at the end of the semester contain fewer
items than formative questionnaires that may be administered while course
instruction is still on-going allow the instructor to make necessary modifications.
Summative instruments focus on global items ("Overall, how would
you rate the quality of the instructor's teaching?") and use evaluative
scales (Excellent, Good, Fair, Poor... or Strongly Agree...Strongly Disagree)
rather than frequency scales (Frequently, Somewhat Frequently, Rarely,
Never...) which may be more appropriate for behaviorally-based items in a
formative questionnaire.
The use of "core" items allow an individual's scores to be compared to scores determined from a group aggregation, such as interdepartmental, or across a college's faculty. Core items are more generic aspects of teaching that are not influenced as much by course design, class size, or field of study. Core items enable the development of normative scores so an individual can be validly compared to his or her peers. Examples of such core items that have been validated through controlled quantitative methods include:
The instructor is well prepared for class.
The instructor has a thorough knowledge of the subject.
The instructor communicated his/her subject well.
The instructor stimulated interest in the course subject.
The instructor is one of the best Cornell teachers I have known.
The instructor clearly interprets abstract ideas and theories.
The instructor demonstrates a favorable attitude toward students.
The instructor is willing to experiment and be flexible.
The instructor encourages students to think for themselves.2
Administration of Questionnaires
Research on questionnaire validity suggests that if the following guidelines
are followed for administering student evaluation questionnaires, reliability
and validity of results will be improved.
- response format should be clear and consistent
- students should remain anonymous
- students should be given adequate time to complete the questionnaire
- students should not be allowed to discuss their ratings while they are being administered
- questionnaires should be administered during the last 4 weeks of semester (but not the last day and not during or after an exam)
- someone other than the one being evaluated should administer the questionnaire, or at the very least, the one being evaluated should leave the room
- a student should collect the questionnaires and mail them to an independent office for scoring
- 80% minimum attendance of the student population in a course is necessary on the day an evaluation is administered
- don't use a numeric questionnaire in courses with fewer than 10 students (use open-ended, written response items instead.)3
Recently, there has been a move toward on-line, electronically tabulated student evaluations of teaching. In a study carried out in 2003, it was found
Of the responding institutions, 17 percent reported using the Internet in some capacity to collect student evaluation data for face-to-face courses. This percentage was higher than previous findings in this area (Hmieleski, 2000) and suggests that using the Internet for student evaluation of face-to-face courses has increased. In addition, another 10 percent indicated that their institutions planned to initiate Internet evaluations of face-to-face courses in 2003. Eighteen percent of the respondents were in the process of reviewing Internet options. The remainder had decided against using the Internet for collecting student feedback.4
On-line student evaluation of teaching systems have the advantage of a faster turn-around time to return the feedback to the instructor. In addition, qualitative, open-ended responses to questions can be reported with the assurance that the students' identities will be anonymous. The primary disadvantage of on-line student evaluations is the potential for lower response rates. A series of studies carried out at Brigham Young University during a three-year period from 1997 to 2000 showed a slow increase in response rates for students' on-line evaluations of teaching, from an initial rate of 40% to 62% in 2000. It has been suggested that several factors may contribute to the increase in response rate to student on-line teaching evaluations:
student access to computers, amount and quality of communication to teachers and students regarding the online rating system, communication to students regarding how student ratings are used, and faculty and student support of the online rating system.5
Reporting scores
How summative evaluation scores are reported in a tenure file or in the
tenure/promotion process can bias that process, either positively or negatively.
Some general principles for proper questionnaire score reporting include:
- report frequency distribution for each item
- don't carry mean scores beyond one decimal place
- multiple sets of scores should be provided for each type of course (survey, lab, seminar) and collected over a period of time
- narrative (qualitative) data from the candidate, colleagues or chair about the contextual circumstances of the quantitative student rating scores is an aid in their interpretation.
- normative data sets should be established yearly for course type (elective, required, lecture, lab, etc.) on both a department level and college level for comparison with a tenure candidate's own scores.
- appropriate normative data should be provided wherever possible
Figure 7 below is an example of a simple format for reporting student evaluations scores for a single course.
Larger scale version of figure
Figure 8 is an example of a visually clear way of reporting a candidate's
relative standing in relation to departmental normative data.

Figure 8
Larger scale version of figure
A word of caution is offered in considering what normative data should be used in comparing an individual instructor's teaching evaluation scores. The variables of class size, level, type (lab versus lecture/discussion), elective versus required should be taken into consideration. More important, however, is the effect of creating a general group norm and comparing all individual faculty to that norm. Such a policy can create an atmosphere of competition between members of the faculty. An alternative policy that may stimulate more of an atmosphere of collaboration about teaching is to consider an individual's student evaluation scores for each course compared to previous historical evaluation scores for that same course by the same instructor. Such a policy could avoid unfair comparisons and focus more on the individual's improvement over time.
Qualitative data is less generalizable and harder to aggregate due to the fact that it is in a more open-ended form. Its potential bulkiness can be reduced through a synthesis by an objective individual familiar with the case, such as a department head. For others who must review this kind of synthesized data and who are less familiar with the candidate's situation, like deans and reviewers outside of the department, a supplementary reflective statement from the candidate synthesizing the student letters can be useful in concert with the department head's report. If these reports are well written and address major developmental issues in the candidate's teaching practice, the time necessary to write them is well justified, especially if their creation leads to improved practice. The work of synthesizing can be spread out over time, on a year-by-year basis, as part of an annual review process. However, a more efficient system of dealing with qualitative student evaluation data is to have the individual faculty member summarize and report the major themes in the students' comments that may have impacted teaching practice, both positive and negative, for each course taught, each semester.
During the preparation of the report Evaluation and Recognition of Teaching, the deans of the colleges were interviewed. One dean raised the issue of anonymity of student evaluation data. Quantitative questionnaire scores, combined with letters of recommendation, provide a good balance of general and specific information. However, letters in their original form do not preserve the anonymity of the student. While students, either undergraduate or graduate, are still taking courses with the candidate they are in what one dean called the candidate's "power web." This may prevent students from being as candid in their written remarks if they know the candidate may identify them at some point while reviewing the evaluation data. If letters by students are returned to someone other than the candidate-the department head or Ad Hoc chair, for example—and if they are then keyboarded on a computer and summarized by an independent person (a member of a departmental standing committee on teaching) and students are informed that these precautions are being taken when they are asked to write their letters, the validity of their responses will be enhanced. This issue is also eliminated through the use of on-line student evaluation of teaching systems, as mentioned above.
An example of the department chair synthesizing the relevant comments from undergraduate student reviewers who were asked to write letters of recommendation is included below.
. . . undergraduates uniformly describe him as an unusually effective, conscientious, enthusiastic teacher who enables students to do their best work, master difficult subject-matter, and gain confidence in their own intellectual abilities.
This [student quote from a review letter] clear and convincing testimony describes the experience of all the students who wrote to us from the courses he taught in spring 1988 and in fall 1989. Since the most disturbing aspect of some of the student responses two years ago was the suggestion that he could be authoritarian and coercive in his teaching, we are reassured by all these letters which suggest precisely the opposite.
It seems clear that like many young assistant professors [candidate] was too demanding in his first dealings with graduate students, imposing admirable but often excessive standards of professionalism both in the classroom and as a special committee member, and expecting his students to share his commitment to his own projects. As the letter from [student] suggests, however, he has since become more realistic and flexible. And all the letters attest that he is always extremely conscientious and helpful.
One should conclude, I think that [candidate] is an intellectually stimulating and enabling graduate teacher, with an expertise and commitment that many of our students find particularly valuable, one who has had trouble finding the appropriate mode in which to exercise authority, but who has now learned to do so.6
The usefulness and reliability of student letters of evaluation, whether undergraduate or graduate, can be improved if specific criteria are communicated when letters are solicited to help focus the students. If the students are all requested to respond to the same questions, reliability will be enhanced and it will be easier to summarize all the letters. The following is an example of the kinds of questions about teaching that can be used to aid students in writing evaluation letters:
- Factual Knowledge: how well did the instructor help you acquire and integrate new terms, information and methods? Please give explicit examples where possible.
- Concepts and Principles: how well did the instructor organize the material covered into a comprehensive whole? Were important concepts and principles from theory interrelated? Please give explicit examples where possible.
- Application: Do you feel that the instructor's teaching and course structure enabled you to apply what you learned in the course to concrete problems? Were you able to generalize beyond the text? Please give explicit examples where possible.
- Motivation: Did you feel the instructor was sufficiently motivated about the subject matter to excite your own interest in it? Describe how the instructor communicated a sense of enthusiasm about teaching.
- Self Understanding: To what degree did the instructor help you become more aware of yourself as a learner? Describe specific experiences where the instructor contributed to your feeling empowered in your ability to learn.
- Improvement of Instruction: Did the instructor seek out information from you and experiment with ways of improving his or her teaching? To what degree was the instructor open to feedback on improving the course? How confident are you in the instructor's ability to continually develop as a teacher? Please be as specific as possible.
To avoid biasing faculty opinion of a candidate's teaching effectiveness, student letters, in any form, summarized or not, should not be available to the voting faculty until all file data on both teaching and research has been assembled into the tenure file. This is true for all data: everyone voting on the candidate should have the same data base to make an informed and unbiased decision.
Peer Data
Evaluation of the candidate's teaching by peers is a practice that has become more prevalent in tenure and promotion decisions over the last 30 years7 and has taken on an increasingly significant role in the review process. Effective peer review depends to a large degree on the explicitness of the criteria by which candidates are to be judged. Colleagues and peers are necessary contributors to evaluating a tenure candidate's teaching. They are best qualified to evaluate the candidate's breadth and depth of subject matter knowledge, course design skills and assessment strategies for determining students' learning the material. The information necessary for colleagues and peers to evaluate these kinds of skills must be thorough without being redundant. The candidate can help in peer evaluation by supplying the kind of information described in Chapter 2.
To be most effective the peer evaluation process should be neutral, open, relatively unthreatening and structured, all of which can be enhanced through the use of standardized rating and observation procedures and criteria. Standardization is a precaution stimulated by the evidence that colleagues' ratings may not be statistically reliable. In one study by Centra8 the average correlation between colleagues was .26 per item. Another study9 revealed the potential for positive bias of peer evaluation: fifty-four teachers were evaluated on the basis of 2 classroom visits by each of 3 different colleagues, which showed that 94% of all ratings were in the top 2 categories of a 5-point scale.
The entire review process by peers should be governed by a set of procedures established within the department. Examples of such procedures include:
. . . peer ratings should be used in conjunction with student ratings . . . dimensions [of teaching] should be decided upon in advance . . . [the] procedure should guarantee the anonymity and independence of the rater . . . at least three colleagues be chosen to rate an instructor's teaching . . . these raters . . . may come from . . . an elected committee of the college faculty whose function is to evaluate teaching. . . . raters do not meet together and preferably do not know who else is involved in the evaluation process. Rather, each judge independently rates the instructor on the preselected dimensions and submits the ratings to the dean [or department head], who then compiles a pooled rating for each dimension.10
Developing these procedures and the questions used to review the candidate can be a useful accomplishment of a departmental standing committee on the peer review of teaching.
Qualification of Peer Reviewers
How peer reviewers are selected is another critical factor in establishing validity in peer review. No one should be placed in a position to review or observe a colleague for tenure or promotion decisions who is not qualified to carry out that task. A very consistent finding of peer observation studies is that observers should have some kind of training which prepares them for that responsibility. Peers are typically capable of evaluating subject matter knowledge, what must be taught by the candidate, whether the appropriate methodology is being employed for teaching specific content areas, the degree to which the candidate has applied adequate and appropriate evaluation techniques for course objectives, and the degree to which professional behavior has been exhibited according to current ethical standards. It is critical to understand that in a summative review process, classroom observations by faculty peers are impractical. To obtain a representative observational data set several classes in each course the instructor teaches must be observed. Some kind of standard set of criteria must be used. In addition, there is the matter of observer reliability. To obtain reliable observation data, observers must be trained in the process and there must be "at least 3, and preferably four members on a peer review team." A more practical approach is to involve a peer mentor who can work with the instructor in a formative manner, over a period of time, observing classes, meeting to discuss classroom technique, helping the instructor to set goals and to determine progress in reaching those goals. This process can be described through the use of classroom observation and teaching feedback forms like those found on-line on our Teaching Materials page.
The most appropriate focus of peer reviewers is the instructor's teaching materials, as outlined in Chapter 2. Discussion of experimental teaching approaches, their modification based on feedback and how best to document effective improvement of practice is where peer collaboration can be of help, particularly when the peers are knowledgeable in the course content. Documenting such collaboration and measurable changes over time should be a major focus of a peer mentor process.
Footnotes
- Cohen, Peter. "Student Ratings of Instruction and Student Achievement: A Meta-analysis of Multisection Validity Studies." Review of Educational Research, Fall 1981, Vol. 51, No. 3, Pg 305.
- Sell, G. R. & Chism, N. Assessing Teaching Effectiveness for Promotion and Tenure: A Compendium of Reference Materials, Center for Teaching Excellence, Ohio State University, January, l988.
- Ibid.
- Hoffman, K.M. "Online Course Evaluation and Reporting in Higher Education" in Online Student Ratings of Instruction, New Directions for Teaching and Learning, Sorenson, D.L. & Johnson, T.D., eds., No. 96, Winter 2003, pg. 27.
- Johnson, T.D. "Online Student Ratings: Will Students Respond?" in Online Student Ratings of Instruction, New Directions for Teaching and Learning, Sorenson, D.L. & Johnson, T.D., eds., No. 96, Winter 2003, pg. 51.
- A Report of the Select Committee, Evaluation and Recognition of Teaching. Cornell University, Ithaca, N.Y. Jan. 14, 1992. Appendices, pg. 21.
- Seldin, P. Changing Practices in Faculty Evaluation. Jossey-Bass, l984.
- Centra, J. Determining Faculty Effectiveness. Jossey-Bass, 1979.
- Centra, J.A. "Colleagues as Raters of Classroom Instruction". Journal of Higher Education. 46, 1975, 327-337.
- Cohen Peter & McKeachie, Wilbert. "The Role of Colleagues in the Evaluation of College Teaching." Improving College and University Teaching. Vol. 28, no. 4 pg. 150.
- Arreola, R.A. Developing a Comprehensive Faculty Evaluation System, 2nd Ed. Anker Publishing, Bolton, MA 2000, pg. 76.
Table of Contents
- Introduction
- Chapter 1 - A Conceptual Overview
- Chapter 2 - The Teaching Portfolio: Documenting Teaching and Its Improvement
- Chapter 3 - Supporting Data: Collection and Presentation
- Chapter 4 - Criteria for Evaluating Data on Teaching
- Chapter 5 - Improving Practice: Case Examples
- Appendix: Evaluation and Recognition of Teaching - A Report of the Select Committee
- References
- Bibliography

