28

CASP AND OTHER COMMUNITY-WIDE ASSESSMENTS TO ADVANCE THE FIELD OF STRUCTURE PREDICTION

Jenny Gu and Philip E. Bourne

In the early 1990s, the community recognized that methods for structure determination from sequence information have been proliferating, creating the need to benchmark these developments to gauge the utility of the algorithms to the biological community and to measure the progress in this developing field of structure prediction. In part, this need also came from users of these tools concerned by over optimistic claims of prediction performance. In 1994, John Moult pioneered the idea that the only way to objectively assess the utility of these tools was to conduct a blind experiment in which predictions were made on protein structures not yet publicly available, but with available sequences (Moult et al., 1995). critical assessment of structure prediction (CASP) was born and, through this assessment, confidence from the user community can be said to have been reclaimed. Even a prediction of limited accuracy can be useful if the user knows what to expect and the method has been cross-validated. The CASP experiment consists of three parts: defining the types of predictions to be performed (this has changed over time), the collection of prediction targets, and the evaluation of the performance of each predictor. For the first competition, the field was evaluated in three categories: comparative modeling, threading, and ab initio prediction.

To clarify for individuals interested in reading the original background papers, there have been changes in nomenclature over the years of the CASP experiment. The term “comparative modeling” has become interchangeable with “homology modeling” and “threading” has been replaced by the term “fold recognition.” “Ab initio structure prediction” is sometimes referred to as “new fold recognition” to reflect the underlying methodologies. Part of the reason for this change, as we shall see, has come about by the role that the ever-increasing number of experimental structures plays in just about every case.

Stated simply, the categories are based on the level of sequence identity between the protein sequence to be modeled and the potential structural homologue (template). If the protein has high sequence identity, homology modeling is used. Conversely, ab initio structure prediction is reserved for cases where no known structural homologues exist. The ab initio category has also come under recent scrutiny because successful methods use knowledge-based approaches, where that knowledge is derived from existing structures. Thus, the ab initio category has recently been redefined as “new fold recognition” (Moult et al., 2005). True ab initio approaches are now limited to numerical simulation techniques using traditional empirical potentials.

Nomenclature aside, CASP has served as a metric by which advances in structure prediction is measured and has undoubtedly accelerated developments in the field. So much so that similar assessments (some would say competitions) now take place in the field of docking (CAPRI; Janin et al., 2003).

The primary goal of CASP is to evaluate the performance of bona fide blind predictions of structures. Participating teams are given a period of several weeks to complete their model while automatic servers are given 48 h. Targets are obtained from different experimental groups with PDB structures not yet released to the public. Several different evaluation measures are employed to measure success and the performance of structure prediction methods. Evaluation is conducted by expert assessors in the field and conclusions are then released to the public and published in special editions of the journal Proteins: Structure Function and Bioinformatics summarizing results for the year that include strengths, weaknesses and where improvements can be made. Submissions for assessment are handled by the Protein Structure Prediction Center (Zemla et al., 2001) dedicated to this community-wide effort.

While this may appear to be a competition between structural predictors to identify the best performing methods, it should be considered instead to be a collaborative effort that provides the community with a set of principles for improving the standards in the field of structure prediction (Moult, 2006). Significant advances are still needed to predict protein structures that are comparable to those obtained experimentally. The four major challenges that the field is confronted with at this time are (i) when high sequence homology exists, to produce models close to those obtained experimentally; (ii) to improve alignments between the unknown and the template; (iii) to develop a better refinement process to construct models of remotely related proteins; and (iv) to construct a reliable scheme to discriminate possible model solutions generated from a template-free algorithm.

How can the best approach to structure prediction and the best models be judged? Devising appropriate statistical measures is important to firmly establish progress in the field and to establish that the proposed models do indeed represent the true protein in vitro and in vivo. Through the cumulative lessons gained from each CASP, several criteria are deemed necessary to conduct successful benchmarking experiments. First, a large test set that is agreed to by the community should be used. This test set must be independent of the training set that has been used to train the methods in question. Second, error estimation must be reliable and continuously questioned and improved to develop proper measures. Third, all measures have inherent bias and therefore independent tests of accuracy must be performed. Finally, the results must be open and freely available to the community as well as the source code for participating methods. Even though there may be intellectual property issues, access to the software is necessary to rigorously check the performance of the methods to be sure that they perform as stated.

TABLE 28.1. Available Community-Wide Benchmarking Services

Benchmarking Service Brief Description Reference
CASP Critical assessment for structural prediction. Several categories include homology modeling, fold recognition, de novo structure prediction. Disorder, domain, and functional predictions also included Moult et al. (1995)
CAFASP Critical assessment of fully automated structure prediction. Evaluation of methods that do now utilize expert interpretations Fischer et al. (1999)
EVA Continuously and automatically analyses protein structure prediction servers in ‘real time’ Eyrich et al. (2001)
EVAcon Continuous evaluation of inter-residue contacts in structure prediction Grana et al. (2005a), Grana et al. (2005)
LiveBench Similar to EVA but differs in methods of evaluation Bujnicki et al. (2001)
CAPRI Comparative evaluation of docking interactions in protein structure prediction Janin et al. (2003)
FORCASP Discussion forum for CASP enthusiasts

As the value of CASP has come to be appreciated by the community over the years, the project expanded benchmarking efforts to evaluate docking, domain boundary identification, protein disorder prediction, and functional prediction. An important addition that was introduced along the way was that of fully automated servers as opposed to predictions requiring significant human intervention. Table 28.1 summarizes the major classes of benchmarks employed by CASP.

The result of this ongoing community experiment has yielded interesting snapshots of performance and progress in the field over the years. Here, we review the history of CASP since its inception and highlight some important events that have helped to advance and fuel the field. Details of the technical strategies themselves, such as homology modeling, fold recognition, ab initio structure prediction, disorder prediction, domain boundary identification, and functional annotation are found in Chapters 30–32, 38, 20, and 21, respectively.

Participation from the community is absolutely necessary for the success of CASP (Moult et al., 1995). In CASP1, 34 groups participated from March through October 1994 with targets expiring at different dates during that period. A total of 34 predictions were submitted for 7 test candidates in comparative modeling; 20 targets were used for threading and ab initio structure predictions with 66 and 29 predictions submitted, respectively.

The meeting to evaluate the performance of the first CASP1 experiment was held at the Asilomar conference center, California, in December 1994. It was determined during this first evaluation that there was a need to develop better measures to gauge the success of predictions. Simple statistical tests did not evaluate all the features needed for successful model prediction. Moreover, it was established that results were biased by the time each group spent on a single prediction, which is not a true measure of the method itself. To circumvent this issue, in CASP2 onwards, participants were permitted to submit several predictions for each target.

Comparative modeling approaches were found to be challenged by obtaining the proper sequence alignment or target sequence to template and loop regions, insertions and deletions were not well modeled (Mosimann, Meleshko, and James, 1995). Inclusion of models containing fundamental errors suggested that minimal testing of model coordinates should be required before papers referencing these models are to be accepted for journal publication. The most disturbing finding was the presence of D-chirality for some of the Ca carbons and incorrect chirality for some beta-branched threonine and isoleucine residues. Out of the 43 predicted structures, 14 contained an example of this incorrect enantiomorph most likely arising during the energy minimization step used to refine these models. A significant number of predictions had the planar peptide dihedral angle deviating by more than ±15° from the plane (well-refined structures did better with ±6° from the plane. The distribution of Image, ψ dihedral angles often deviated significantly from the groupings typically seen in a Ramachandran plot that is made using experimental protein structures. As expected, the root-mean-squared deviation (RMSD) between the models and the template was lowest for models with high sequence identity to the target.

Structure prediction of proteins with little or no sequence homology to proteins with known structure required the use of threading methods. Assessment of threading methods indicated that, in many cases, they were capable of identifying the correct fold but alignment was an issue (Lemer, Rooman, and Wodak, 1995). A major contributor to the success of fold recognition was attributed to the importance of hydrophobic interactions in defining the protein core. Fold identification was considered correct if the predicted structure was aligned to the template with an RMSD ≤ 3Å based on Ca atoms. Different methods were better depending on the type of fold. This raised the idea of using consensus methods that we will come back to shortly.

Ab initio structure prediction to determine novel protein folds faced significant challenges during CASP1 (Defay and Cohen, 1995). Accurate tertiary structure prediction was not possible with these methods. However, limited success was attained when generating protein folds and motifs that were recognizably similar to another known structure, presence. A useful development was the identification of sequence similarity within the sequence that often translates to structural symmetry and is likely to be the result of ancient duplication and fusion events. Approximate ab initio predictions could be made for very small proteins through exhaustive conformational searches. multiple sequence alignments (MSA) also helped improve distance matrix approaches that use sequence variability information to make inferences about specific contact potentials.

Finally, for CASP1, although the focus was on tertiary structure prediction, the performance of secondary structure predictors could be inferred since they were a part of many methods (Rost and Sander, 1995). The secondary structure of helices was predicted well, although sometimes too long, with beta and coil underperforming. The secondary structure of helices was overdetermined and nonsoluble proteins were also poorly determined.

CASP1 was a cathartic experience, and we have emerged from it with a new and sharper sense of direction” (Moult et al., 1995). With newly identified limitations in the various prediction methods, better evaluation strategies were once again re-evaluated 2 years later during CASP2 (Moult et al., 1997). Organizers collected 42 structures and participants submitted solutions for 34 of these. Finding suitable targets was something of a problem as the era of high throughput structure determination was yet to begin. More protein targets were available for evaluating ab initio structure prediction, therefore permitting proper statistical evaluation. A new docking category was introduced. For its debut, seven small ligands that bind to four different proteins (protein-ligand docking) and one protein-protein complex (protein-protein docking) were obtained. While this data set was too small to properly evaluate the state of docking methods at the time, 13 groups did participate and a total of 56 predictions were submitted. Overall participation doubled from 34 to 70 groups with the total number of submitted predictions approaching 1000. Evaluation measures were improved and were more sensitive in judging poor quality models and singling out particular features of the model that reflected the strengths and weaknesses of specific steps in the algorithms, such as loop modeling, alignment accuracy, or correct topology.

The development of new evaluation criteria aimed not to identify the best performing predictor, but to identify the strengths and weaknesses of each method (Venclovas et al., 1997). The three considerations were (1) gauging the performance at each different stages of modeling; (2) distinguishing results from easily modeled versus difficult regions; and (3) eliminating effects of possible experimental uncertainties. It was shown that there had been some improvement since CASP1, particularly in side chain accuracy. Large loop errors were still found in comparative modeling that often reflected poor alignments in these regions. The performance of threading methods was the hardest to estimate as it was being plagued by alignment issues. Finally, the geometry of the models was significantly improved with no D-amino acids (Martin, MacArthur, and Thornton, 1997). No method came out as clearly superior, but assessors were deemed to provide good justification of their decisions (Levitt, 1997).

Threading methods were difficult to evaluate because they involved two separate criteria: fold recognition and model accuracy. Attempts to gauge performance used weighted averages of fold recognition and alignment accuracy with and without normalization for target difficulty (Levitt, 1997). Using these measures, it was clear that threading predictions had improved significantly based on the number of successful predictions from a large number of groups using easier targets. Generally, fold recognition methods outperformed simple sequence alignment against the database of known folds. The average RMSD of one predictor between model and target structures was 5.1 Å. The best models for two easy targets had RMSDs of 2.9 and 4.2 Å, respectively, with a structural alignment containing more than 80% of the same residues.

Submission of ab initio predictions included secondary structure, three-dimensional coordinate sets, modes of oligomerizaton, and residue and secondary structure segment contact patterns (Lesk, 1997). Secondary structure predictions continued to perform well with the new blind test set; however, tertiary structure predictions were limited to fragmentary success. Predictions of contacts between residues and elements of secondary structure were not consistent.

Many of the small molecule-ligand complexes used to assess docking methods involved serine proteases (Dixon, 1997). Overall, the results for the small molecule targets performed well within a RMSD of 3 Å for at least one of the submitted predictions made for each target. The predictors, however, did not perform consistently and correct predictions close to the target solution were not necessarily the top ranking solution. The protein-protein target proved to be more difficult in that there were significantly larger conformational orientations to consider and the identification of surface binding sites were more difficult to identify compared to binding clefts.

By CASP2, the community of predictors had grown so large that an automated system that could process and verify predictions according to several different evaluation criteria was required. This gave rise to the Prediction Center where predictions could be uploaded and evaluated with the results presented in tabular and graphical form (Zemla et al., 1999). By the time of CASP3 (Moult et al., 1999), other benchmarking services had appeared. These included CAFASP for fully automated prediction (Fischer et al., 1999), EVA (Eyrich et al., 2001), and LiveBench (Bujnicki et al., 2001) (Table 28.1).

EVA (Koh et al., 2003) is a Web server providing automated evaluation of the accuracy of automated protein structure prediction methods, but differs in procedural details to CAFASP. Rather than conducting evaluations every 2 years, evaluations are conducted and updated automatically each week. Secondary structure prediction, contact prediction, comparative protein structure modeling, and folding recognition methods are evaluated. Target proteins to be predicted are collected daily from newly submitted proteins to the Protein Data Bank (PDB) and compared once a week to experimental structures. The results are published on the Web server where a measure of sustained performance as well as the ranking of methods is published. It is argued that this approach is preferable because larger data sets are used and therefore better statistics and ranking schemes can be obtained (Eyrich et al., 2003).

LiveBench (Bujnicki et al., 2001; Rychlewski, Fischer, and Elofsson, 2003) also provides continuous benchmarking for automated, publicly available protein fold recognition servers to measure progress in the field. This evaluation approach differs from CAFASP in that it assesses structure prediction performance using newly released targets that do not show significant sequence similarity to proteins already in the PDB. In the strict sense, this is not a blind test, but the fold recognition algorithms tested presumably have not seen these new structures before in the training set. The advantage of LiveBench over CAFASP is that it provides a larger test set from which to make comparisons between the different algorithms.

The introduction of other benchmarking experiments indicated the growing interest in sharing research findings in the pursuit of improved structural predictions. The conclusion from CAFASP1 was that no automated method at the time proved to be markedly superior (Fischer et al., 1999). More important, this conclusion highlights the wide gap in quality model produced by automated servers compared to those derived with expert interpretations at the time of CAFASP1. This gap has narrowed since then, and like the beginnings of CASP, CAFASP have also identified requirements for improved blind tests to go forward.

CASP3 saw further improvements, particularly in the alignment of target and template sequences. Encouragingly, ab initio prediction showed improvements in the accuracy of prediction of fragments up to 60 residues. Fold Recognition, on the contrary, seemed to show a greater improvement between CASP1 and CASP2, but less so between CASP2 and CASP3 (Sippl et al., 1999). As the previous CASPs, numerical evaluation methods continued to be refined to provide better measures of progress and success.

Heralded as the 2000 Olympic Games of protein structure prediction” (Fischer, Elofsson, and Rychlewski, 2000) CASP4 participants were faced with more stringent evaluation criteria. In summary, alignment of sequences by comparative modeling continued to be a problem (Venclovas et al., 2001). Secondary structure prediction, according to the three state accuracy measure (Q3), appeared to have reached a limit. Contact predictions were approaching a useful level of accuracy and overall success rate for remote homologue detection had improved, successfully identifying a large fraction of the folds in the blind set. Significant improvements in de novo predictions were with CASP4 showing particularly good predictions for small proteins with the correct topology (Schonbrun, Wedemeyer, and Baker, 2002). The success of the algorithms could be attributed to a combination of knowledge and physics-based methods.

The evaluation methods used by CASP4 and related services continued to be challenged (Marti-Renom et al., 2002; Moult et al., 2002). The reliability of the rankings of protein structure modeling methods were assessed using the parameteric Student’ t test and the nonparameteric Wilcox signed rank test of statistical significance of the difference between the paired samples. It was determined that with these tests, the top eight methods of prediction could not be distinguished. The target sequences used for CASP4 were analyzed and shown not to distinguish between the top eight methods given the standard deviation of the difference in model quality. The results suggested that CASP needed to be supplemented by an assessment made by other evaluation services that are automated, continuous in time, and based on several criteria applied to a large number of methods.

By CASP4, while the best human-assisted methods continued to outperform automated servers (Sippl et al., 2001), automated consensus metapredictors were very successful in fold recognition. CAFASP2 revealed that the most significant progress in fold recognition came with the development of metaservers incorporating prediction results from several independent methods to generate consensus models (Kinch et al., 2003; Venclovas et al., 2003). The performance gap between automated and manual methods narrowed with about one fourth of the top 30 performing groups using fully automated servers in the fold recognition category. Moreover, the consensus servers that incorporated predictions from multiple fold recognition servers outperformed individual servers alone (Fischer et al., 2001; Schonbrun, Wedemeyer, and Baker, 2002). These metapredictors performed approximately 30% better than the best of 60 independent servers participating in CAFASP3 (Fischer et al., 2003). Top performing metapredictors were comparable to the best 5-10 human CASP predictors. Prediction of multidomain proteins, however, remained a challenge (Kinch et al., 2003). Nevertheless, the potential of consensus methods recognized earlier, clearly had the potential to advance the field further (Schonbrun, Wedemeyer, and Baker, 2002).

CASP5 saw the recognition of the importance of protein disorder predictors (Melamud and Moult, 2003). Structural disorder observed in structures is reported as missing atoms. Several groups had postulated that this property was encoded in the protein sequence and were able to develop tools to recognize such signals using disorder in existing structures as a training set. While the exact definition of disorder was in question, six participating groups successfully identified disordered regions within the blind test set without too much over prediction.

CASP6 saw a record number of groups making predictions. Starting with 34 groups during CASP1, the number of groups increased to 70, 163, 98, 216, and then to 266 for CASP6 (Moult et al., 2005). At CASP6, a total of 41,283 models were deposited of which 32,703 could be assessed. 23,119 had coordinate sets with 4484 alignments converted to coordinates for assessment. The total number of submitted predictions included 1397 with residue contacts, 1293 with domain assignments, and 990 with function predictions—all new classes of prediction. Finally, 1769 disorder predictions were made.

Assessment of the comparative modeling category (Tress et al., 2005) indicated that identification of the best structural template to use for modeling remained a big challenge. Predictors still produce incorrect models that contain tangles in the backbone and beta whorls that are not observed in Nature. A statistically significant difference in performance between the best performing methods compared to the rest ofthe participants was observed. Once again, the differences stemmed from the methods used to select and align templates as well as the use of expert knowledge. The technique shared by many successful groups was the detection of templates with 3D-Jury (Ginalski et al., 2003) followed by alignment improvement before direct modeling.

The most important new evaluation introduced with CASP6 was an assessment ofside chain orientations and identification of biologically important sites. Conclusions from this analysis showed that side chain packing improvements could come only at the expense of rotamer accuracy, therefore indicating the need for improved refinement techniques. Predictions for functionally important sites showed, surprisingly, an overall better performance in determining structural orientations. Closer inspection of these sites suggested that local structural factors dominantly contribute to the observed orientation in the final target structure and matches well with the model template that were used. Steady but modest progress was observed for comparative modeling and homologous fold recognition for difficult targets. Sequence relationships for the superposition between the model and target still affects the alignment accuracy.

New measures to evaluate the sequence-dependent and sequence-independent alignment methods were used to gauge the success offold recognition algorithms (Wang, Jin, and Dunbrack, 2005). Alignment, once again, remained the bottleneck in successful prediction; however, it was noted that more time to include biological and functional information would in all likelihood improve predictions.

Disordered regions were once again included, with twenty participating groups in this category (Jin and Dunbrack, 2005). Assessment of protein disorder segments in otherwise ordered structures in the blind set are limited to segments that are often short. One group clearly performed better than other methods identifying 75% of disordered residues with high over prediction. Overall, about 17% of the residues were identified as being disordered. Other groups predicted disordered residues at specificities higher than 90%, but only correctly identifying half the disordered residues in the blind set.

Results of 3D contact predictions were reported with measures that are accepted as standards in the field: accuracy, coverage, and a score representing the average distances between strict contacts (Xd) (Grana et al., 2005a). Top performing methods used genetic programming and neural networks trained with differentinput information. The blind testset was too small to make conclusions about progress and limits of performance. The reader is referred to other services such as EVAcon (Grana et al., 2005) for a better interpretation. A potentially interesting conclusion was that the contact prediction methods perform better on average than 3D prediction methods when applied to difficult targets.

Two new components of CASP, domain boundaries and functional prediction, were added to address the needs of structural genomics. Domain boundary prediction is crucial for modeling larger protein structures and proved to be a difficult task and can only be achieved when a related structural template is available to use as a reference (Chapter 20). Functional predictions were evaluated to address the growing need of structural genomics in understanding the functional role of uncharacterized protein structures (Soro and Tramontano, 2005). The task of functional annotation is challenged by findings that show that common evolutionary origin does not necessarily confer shared function (Devos and Valencia, 2000; Rost, 2002). The discovery of proteins that perform multiple functions further complicates the situation (Jeffery, 2003a; Jeffery, 2003b). Last, this aspect of CASP differs from structural prediction in that the function of the target protein may likely still not be known and thus functional predictions remain speculative.

An objective of functional prediction is to provide experimentalists working on the target proteins with some useful information. Each predictor is required to provide the following information for each target: (1) GO category of molecular function, biological process, and cellular components; (2) binding information; (3) location of binding site; (4) role of residues; and finally, (5) posttranslational modifications. Free text comments are allowed at the end of the submitted prediction file. Twenty-six groups participated with each group allowed to submit up to 5 ranked functional predictions for each target. A total of 1235 predictions were made. Conclusions from CASP6 were (1) the experiment should be limited to enzymes or proteins predicted to be enzymes; (2) function should be described in terms of EC numbers that are less ambiguous than GO annotations; (3) a general description of the method should be made available to the biological community to facilitate evaluation and further methods development.

One metric to summarize the history of CASP is known as GDT_TS (Zemla et al., 1999) that analyzes the superposition of structures (Kryshtafovych et al., 2005). This metric has approximately doubled since the start of CASP. The most difficult targets with sequence identities of less than 20% identity remain difficult to model with only 20% of sequences correctly aligned for the best models. Accuracy was limited by three factors: differences in main-chain conformation, the number of targets in the mid-range of difficulty (30-50% sequence identity) for structure prediction, and remote evolutionary relationships. Progress for predictions using targets with high sequence similarities to know templates was difficult to quantify, but automated server performance has improved. Perhaps most important is the improvement over time in fold recognition. CASP6 included the first report of a successful model for a small protein refined from a backbone RMSD of 2.2 Å down to 1.6 Å with many core side chains correctly oriented (Moult, 2005). This significant improvement is dominated by methods from Baker’ Group, using the software Rosetta (Bradley et al., 2005) and Robetta (Chivian et al., 2005).

At the time of writing, CASP7 had concluded but the outcomes were yet to be published. CASP7 introduced four new challenges devised during CASP6 (Moult et al., 2005). These challenges are (1) to model structures of single-residue mutants; (2) to model structure changes associated with specificity changes within protein families; (3) to directly focus on improving refinement methods and thus produce a 0.5 Å RMSD improvement in the Cα accuracy of models based on sequences with greater than 30% identity; and (4) to devise scoring functions that will reliably pick the most accurate models from a set of candidate structures produced by the current new fold methods.

The latter point is of particular interest since it introduces a new category that is the ability for groups to successfully assess their own models as opposed to using techniques defined by the CASP organizers (Cozzetto et al., 2007). Participants were asked to provide an index of quality for individual models as well as an index for the expected correctness of each residue. The method to predict model quality is useful for two purposes: first, to select the best model among the plausible choice, and second, to assign an absolute quality value to each individual model. Results suggest that it is possible to create methods to distinguish the best solution within plausible models.

Concerning the previously used categories, CASP7 showed that intramolecular residue-residue contacts inferred from 3D structure predictions are similar in accuracy to those predicted by contact prediction methods. The latter approach does not construct a protein structure model; nevertheless, performs better for some targets in identifying interacting residues (Izarzugaza et al., 2007). Domain boundary predictions were more consistent when the target has a suitable structural template to use as a reference (Tress et al., 2007). Disorder predictors continue to be an interest to the community even though developing a proper evaluation criterion remains elusive (Bordoli, Kiefer, and Schwede, 2007). Overall, participating methods have generally improved their accuracy in identifying disorder predictors, but the improvements are not significantly better than the best method in the last round, CASP6.

In the functional prediction category, submissions were made for GO molecular function terms, Enzyme Commission numbers, and ligand binding sites (Lopez et al., 2007). The results were disappointing in that there were few participants in this category and the test was not a purely blind study. As a relatively new category, some improvements in organization need to be made before a true assessment of the value of functional predictors can be made.

As with any scientific endeavor, CASP is an evolving process and will continue to serve the community by creating new standards to be met by those in the field of structure prediction. We have tried to convey that CASP has undoubtedly accelerated the field through a focused effort that has continued to challenge participants. The adoption of this type of critical assessment effort in other fields is the best testament to the success of CASP.

A process such as that undertaken during the seven CASP experiments is possible only through the participation of the community, both as predictors and service providers in assessing the work of others. CASP has been a great success and a testament to everyone in the field.

PredictProtein: http://www.predictprotein.org/

CASP—Protein Structure Prediction Center: http://predictioncenter.gc.ucdavis.edu/

CAFASP: http://www.cs.bgu.ac.il/-dfischer/CAFASP5/

CAPRI: http://capri.ebi.ac.uk/

FORCASP: http://www.forcasp.org/

EVA: http://cubic.bioc.columbia.edu/eva/

EVAcon—Continuous evaluation service for protein contact prediction (Granaetal., 2005).: http://cubic.bioc.columbia.edu/eva/con/index.html

LiveBench: http://meta.bioinfo.pl/livebench.pl