The purpose of this evaluation study is to identify problems and suggest modifications in the NIH Consensus Development Program. The current program consists of three-day conferences in which experts assess medical technologies for issues of efficacy, safety, conditions of use, and other related topics (e.g., costs and social impact). Eight consensus conferences held between 1980 and 1982 were studied in depth using a variety of methods; five of the conferences were investigated concurrently. In addition, archival material was examined for all but one of the 33 conferences held up to that time, and four planning meetings for future conferences were observed. The delay in publishing our findings provided an opportunity to examine the changes introduced by NIH; it also allowed us to avoid the criticism of numerous prior evaluations for finding fault with programs that are still developing. NIH adopted many of the recommendations in our evaluation report and has investigated others. Based on our evaluation and more recent evidence, however, we conclude that the major problem that was uncoveredselection bias, particularly with respect to the choice of questions and panelistsremains a significant threat to the credibility of the consensus process. More specifically, the results indicate that controversial issues cannot be properly addressed within the present conference format, although that was one of its major purposes. Recommendations for improving the consensus process are presented, as are their implications for a larger set of consensus activities that are currently being conducted.