4
COMPUTATIONAL ASPECTS OF HIGH-THROUGHPUT CRYSTALLOGRAPHIC MACROMOLECULAR STRUCTURE DETERMINATION
INTRODUCTION
The desire to understand biological processes at a molecular level has led to the routine application of X-ray crystallography. However, significant time and effort are usually required to solve and complete a macromolecular crystal structure. Much of this effort is in the form of manual interpretation of complex numerical data using a diverse array of software packages and the repeated use of interactive three-dimensional graphics. The need for extensive manual intervention leads to two major problems: significant bottlenecks that impede rapid structure solution (Burley et al., 1999) and the introduction of errors due to subjective interpretation of the data (Mowbray et al., 1999). These problems present a major impediment to the success of structural genomics efforts (Burley et al., 1999; Montelione and Anderson, 1999) that require the whole process of structure solution to be as streamlined as possible. See Chapter 40 for a detailed description of structural genomics. The automation of structure solution is thus necessary as it has the opportunity to produce minimally biased models in a short time. Recent technical advances are fundamental to achieving this automation and make high-throughput structure determination an obtainable goal.
HIGH-THROUGHPUT STRUCTURE DETERMINATION
Automation in macromolecular X-ray crystallography has been a goal for many researchers. The field of small-molecule crystallography, where atomic resolution data are routinely collected, is already highly automated. As a result, the current growth rate of the Cambridge Structural Database (CCSD) (Allen, Kennard, and Taylor, 1983) is more than 15,000 new structures per year. This is approximately 10 times the growth rate of the Protein Data Bank (PDB) (Berman et al., 2000). See Chapters 11-13 for further details of structural databases. Automation of macromolecular crystallography can significantly improve the rate at which new structures are determined. The goal of automation moved to a position of prime importance with the development of the concept of structural genomics (Burley et al., 1999; Montelione and Anderson, 1999) and the routine application of high-resolution macromo-lecular crystallography to study protein-ligand complexes for drug discovery (Nienaber et al., 2000). To exploit the information present in the rapidly expanding sequence databases, it has been proposed that the structural database must also grow. Increased knowledge about the relationship between sequence, structure, and function will allow sequence information to be used to its full extent. The success of structural genomics requires macromolecular structures to be solved at a rate significantly faster than that at present. This high-throughput structure determination depends on automation to reduce the bottlenecks related to human intervention throughout the whole crystallographic process. Automation of structure solution from the experimental data relies on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures traditionally performed by manual intervention, and finally, the development of software packages that allow a tight integration between these algorithms. Truly automated structure solution requires the computer to make decisions about how best to proceed in the light of the available data.
The automation of macromolecular structure solution applies to all the procedures involved beginning with data collection to structure refinement. There have been many technological advances that make macromolecular X-ray crystallography easier. In particular, cryoprotection to extend crystal life (Garman, 1999), the availability of tunable synchrotron sources (Walsh et al., 1999a), high-speed CCD data collection devices (Walsh et al., 1999b), and the ability to incorporate anomalously scattering selenium atoms into proteins have all made structure solution much more efficient (Walsh et al., 1999b). The desire to make structure solution more efficient has led to investigations into the optimal data collection strategies for multiwavelength anomalous diffraction (Gonzalez et al., 1999; Gonzalez, 2007) and phasing using single anomalous diffraction with sulfur or ions (Dauter and Dauter, 1999; Dauter et al., 1999). It has been shown that MAD phasing using only two wavelengths can be successful (Gonzalez et al., 1999). The optimum wavelengths for such an experiment are those that give a large contrast in the real part of the anomalous scattering factor (e.g., the inflection point and high-energy remote). However, it has also been shown that, in general, a single wavelength collected at the anomalous peak is sufficient to solve a macromolecular structure (Rice, Earnest, and Brunger, 2000). Such an approach minimizes the amount of data to be collected and increases the efficiency of synchrotron beamlines, and is becoming a more widely used technique.
The first step of structure solution, once the raw images have been processed, is assessment of data quality. The intrinsic quality of the data must be quantified and the appropriate signal extracted. Observations that are in error mustbe rejected as outliers. Some observations will be rejected at the data-processing stage, where multiple observations are available. However, if redundancy is low then probabilistic methods can be used (Read, 1999). The prior expectation, given either by a Wilson distribution of intensities or model-based structure— factor probability distributions, is used to detect outliers. This method is able to reject strong observations that are in error, which tend to dominate the features of electron density and Patterson maps. This method could also be extended to the rejection of outliers during the model refinement process.
When using isomorphous substitution or anomalous diffraction methods for experimental phasing, the relevant information lies in the differences between the multiple observations. In the case of anomalous diffraction, these differences are often very small, being of the same order as the noise in the data. In general, the anomalous differences at the peak wavelength are sufficient to locate the heavy atoms, provided that a large enough anomalous signal is observed (Grosse-Kunstleve and Brunger, 1999; Weeks et al., 2003). However, in less routine cases it can be very important to extract the maximum information from the data. One approach used in MAD phasing is to analyze the data sets to calculate FA structure factors, which correspond to the anomalously scattering substructure (Terwilliger, 1994). Several programs are available to estimate the FA structure factors: XPREP (Bruker, 2001), MADSYS (Hendrickson, 1991), and SOLVE (Terwilliger and Berendzen, 1999a). In another approach, a specialized procedure for the normalization of structure factor differences arising from either isomorphous or anomalous differences has been developed to facilitate the use of direct methods for heavy atom location (Blessing and Smith, 1999).
Merohedral twinning of the diffraction data can make structure solution difficult and in some cases impossible. The twinning occurs when a crystal contains multiple diffracting domains that are related by a simple transformation such as a twofold rotation about a crystallographic axis, a phenomenon that occur in certain space groups or under certain combinations of cell dimensions and space group symmetry (Parsons, 2003). As a result, the observed diffraction intensities are the sum of the intensities from the two distinctly oriented domains. Fortunately, the presence of twinning can be detected at an early stage by the statistical analysis of structure-factor distributions (Yeates, 1997). If the twinning is only partial, it is possible to detwin the data. Perfect twinning typically makes structure solution using experimental phasing methods difficult, but the molecular replacement method and refinement (see later) still can be successfully used.
HEAVY ATOM LOCATION AND COMPUTATION OF EXPERIMENTAL PHASES
The location of heavy atoms in isomorphous replacement or the location of anomalous scatterers was traditionally performed by manual inspection of Patterson maps. However, in recent years labeling techniques such as selenomethionyl incorporation have become widely used. This leads to an increase in the number of atoms to be located, rendering manual interpretation of Patterson maps extremely difficult. As a result, automated heavy atom location methods have proliferated. The programs SOLVE (Terwilliger and Berendzen, 1999a) and CNS (Brunger et al., 1998; Grosse-Kunstleve and Brunger, 1999) use Patterson-based techniques to find a starting heavy atom configuration that is then completed using difference Fourier analyses. Shake-and-Bake (SnB) (Weeks and Miller, 1999), SHELX-D (Sheldrick and Gould, 1995), and HySS (Grosse-Kunstleve and Adams, 2003) use direct methods reciprocal space phase refinement combined with modifications in real space. SnB refines phases derived from randomly positioned atoms, while SHELX-D derives starting phases by automatic inspection of the Patterson map. All methods have been used with great success to solve substructures with more than 60 selenium sites. SHELX-D and SnB have been used to find up to 150 and 160 selenium sites, respectively. The HySS program from PHENIX provides a high degree of automation, terminating the search once a successful solution has been found.
After the heavy atom or anomalously scattering substructure has been located, experimental phases can be calculated and the parameters of the substructure refined. A number of modern maximum likelihood based methods for heavy atom refinement and phasing are readily available: MLPHARE (Otwinowski, 1991), CNS (Brunger et al., 1998), SHARP (de La Fortelle and Bricogne, 1997), SOLVE (Terwilliger and Berendzen, 1999a), and Phaser (McCoy, Storoni, and Read, 2004). Programs such as PHENIX (Adams et al., 2002) have the advantage of fully integrating heavy atom location (using HySS), site refinement/phasing (using SOLVE or Phaser), and automated choice of heavy atom hand.
DENSITY MODIFICATION
Often the raw phases obtained from the experiment are not of sufficient quality to proceed with structure determination. However, there are many real space constraints, such as solvent flatness, that can be applied to electron density maps in an iterative fashion to improve initial phase estimates. This process of density modification is now routinely used to improve experimental phases prior to map interpretation and model building. However, due to the cyclic nature of the density modification process, where the original phases are combined with new phase estimates, introduction of bias is a serious problem. The g correction was developed to reduce the bias inherent in the process and has been applied successfully in the method of solvent flipping (Abrahams, 1997), and also implemented in the CNS package. The g correction has been generalized to the g perturbation method in the DM program, part of the CCP4 suite (Collaborative Computational Project 4, 1994), and can be applied to any arbitrary density modification procedure including noncrystallographic symmetry (NCS) averaging and histogram matching (Cowtan, 1999). After bias removal, histogram matching is significantly more powerful than solvent flattening for comparable volumes of protein and solvent (Cowtan, 1999). More recently, areciprocal space maximum likelihood formulation of the density modification process has been devised and implemented in the program RESOLVE (Terwilliger, 2000; Terwilliger, 2002a; Terwilliger, 2003a). This method has the advantage that a likelihood function can be directly optimized with respect to the available parameters (phases and amplitudes), rather than indirectly through a weighted combination of starting parameters with those derived from flattened maps. In this way, the problem of choices of weights for phase combination is avoided. The concept of statistical density modification has been developed further in the program PIRATE (Cowtan, 2004), where many different probability distributions are used to classify the density.
MOLECULAR REPLACEMENT
The method of molecular replacement is commonly used to solve structures for which a homologous structure is already known. As the database of known structures expands as a result of structural genomics efforts, this technique will become more and more important. The method attempts to locate a molecule or fragments of a molecule, whose structure is known, in the unit cell of an unknown structure for which experimental data are available. To make the problem tractable, it has traditionally been broken down into two consecutive three-dimensional search problems: a search to determine the rotational orientation of the model followed by a search to determine the translational orientation for the rotated model (Rossmann and Blow, 1962). The method of Patterson correlation (PC) refinement is often used to optimize the rotational orientation prior to the translation search, thus increasing the likelihood of finding the correct solution (Brunger, 1997). With currently available programs, structure solution by molecular replacement usually involves significant manual input. Recently, however, methods have been developed to automate molecularreplacement. One approach has used the exhaustive application of traditional rotation and translation methods to perform a complete six-dimensional search (Sheriff, Klei, and Davis, 1999). More recently, less time-consuming methods have been developed. The EPMR program implements an evolutionary algorithm to perform a very efficient six-dimensional search (Kissinger, Gehlhaar, and Fogel, 1999). A Monte Carlo simulated annealing scheme is used in the program Queen of Spades to locate the positions of molecules in the asymmetric unit (Glykos and Kokkinidis, 2000). To improve the sensitivity of any molecular replacement search algorithm, maximum likelihood methods have been developed in the Phaser program (Read, 2001;Storoni, McCoy, and Read, 2004; McCoy et al., 2005). The traditional scoring function of the search is replaced by a function that takes into account the errors in the model and the uncertainties at each stage. This approach is seen to greatly improve the chances of finding a correct solution using the traditional approach of rotation (Storoni, McCoy, and Read, 2004) and translation searches (McCoy et al., 2005). In addition, the method performs anisotropic correction of the experimental data and a statistically correct treatment of simultaneous information from multiple search models using multivariate statistical analysis (Read, 2001). This allows information from different structures to be used in highly automated procedures, while minimizing the risk of introducing bias. In the future, molecular replacement algorithms may permit experimental data to be exhaustively tested against all known structures to determine whether a homologous structure is already present in a database, which could then be used as an aid in structure determination.
MAP INTERPRETATION
The interpretation of the initial electron density map, calculated using either experimental phasing or molecular replacement methods, is often performed in multiple stages (described later), the final goal being the construction of an atomic model. If the interpretation cannot proceed to an atomic model, it is often an indication that the diffraction data collection must be repeated with improved crystals. Alternatively, repeating previous computational steps in data analysis or phasing may generate revised hypotheses about the crystal, such as a different space group symmetry or estimate of unit cell contents. Clearly, completely automating the process of structure solution will require that these eventualities are taken into consideration and dealt with in a rigorous manner.
The first stage of electron density map interpretation is an overall assessment of the information in a given map. The standard deviation of the local root-mean-square electron density can be calculated from the map. This variation is high when the electron density map has well-defined protein and solvent regions and is low for maps calculated with random phases (Terwilliger, 1999; Terwilliger and Berendzen, 1999b). A similar, more discriminating, analysis can be performed by the calculation of the skewness of the histogram of electron density values in the unit cell (Podjarny, 1976). It has also been shown that the correlation of the local root-mean-square density in adjacent regions in the unit cell can be used as a measure of the presence of distinct, contiguous solvent and macromolecular regions in an electron density map (Terwilliger and Berendzen, 1999c).
Currently, the process of analyzing an experimental electron density map to build the atomic model is a time-consuming, subjective process and almost entirely graphics based. Sophisticated programs such as COOT (Emsley and Cowtan, 2004), O (Jones et al., 1991), XtalView (McRee, 1999), QUANTA (Oldfield, 2000), TurboFrodo (Jones, 1978), and MAIN (Turk, 2000) are commonly used for manual rebuilding. These greatly reduce the effort required to rebuild models by providing libraries of side-chain rotamers and peptide fragments (Kleywegt and Jones, 1998), map interpretation tools, and real space refinement of rebuilt fragments (Jones et al., 1991). However, it has been shown that there are substantial differences among the models built manually by different people when presented with the same experimental data (Mowbray et al., 1999). The majority of time spent in completing a crystal structure is in the use of interactive graphics to manually modify the model. This manual modification is required either to correct parts of the model that are incorrectly placed or to add parts of the model that are currently missing. This process is prone to human error because of the large number of degrees of freedom of the model and the possible poor quality of regions of the electron density map.
Although interactive graphics systems for manual model building have made the process dramatically simpler, there have also been significant advances in making the process of map interpretation and model building truly automated. One route to automated analysis of the electron density map is the recognition of larger structural elements, such as α-helices and p-strands. Location of these features can often be achieved even in electron density maps of low quality using exhaustive searches in either real space (Kleywegt and Jones, 1997) or reciprocal space (Cowtan, 1998; Cowtan, 2001), the latter having a significant advantage in speed because the translation search for each orientation can be calculated using a fast Fourier transform. The automatic location of secondary structure elements from skeletonized electron density maps can be combined with sequence information and databases of known structures to build an initial atomic model with little or no manual intervention from the user (Oldfield, 2000). This method has been seen to work even at relatively low resolution (dmin~3.0Å). However, the implementation is still graphics based and requires user input. A related approach in the program MAID also uses a skeleton generated from the electron density map as the starting point for locating secondary structure elements (Levitt, 2001). Trial points are extended in space by searching for connected electron density at Cα distance (approximately 3.7Å) with standard α-helical or β-strand geometry. Real space refinement of the fragments generated is used to improve the model. Both of these methods suffer from the limitation that they do not combine the model building process with the generation of improved electron density maps derived from the starting phases and the partial models.
To completely automate the model building process, methods have been developed that combine automated identification of potential atomic sites in the map with model refinement. In the ARP/warp system, an iterative procedure is used that describes the electron density map as a set of unconnected atoms from which protein-like patterns, primarily the main-chain trace from peptide units, are extracted. From this information and knowledge of the protein sequence, a model can be automatically constructed (Perrakis, Morris, and Lamzin, 1999). This powerful procedure, known as warpNtrace in ARP/wARP, can gradually build a more complete model from the initial electron density map and in many cases is capable of building the majority of the protein structure in a completely automated way. Unfortunately, this method currently has the limitation of a need for relatively high-resolution data (dmin < 2.3 A). Data that extend to this resolution are available for less than 60% of the ~16,500 X-ray structures in the PDB. Therefore, other approaches have been developed to automatically interpret maps at lower resolution (Holton et al., 2000; Terwilliger, 2002b; Terwilliger, 2003b; Terwilliger, 2003c; Terwilliger, 2003d). In the PHENIX system (Adams et al., 2002), the combination of secondary structure fragment location and fragment extension by RESOLVE (Terwilliger, 2003d) with iterated structure refinement by phenix.refine (Afonine, Grosse-Kunstleve, and Adams, 2005) for map improvement provides an automated model building method that is relatively insensitive to resolution and is capable of typically building 70% or more of a structure even at 3.0Å. With this technology, it is has now been possible to investigate the variability of models by building many models against the same data (DePristo, de Bakker, and Blundell, 2004; DePristo et al., 2005; Terwilliger et al., 2007).
Methods have recently been developed for the automated location and fitting of small molecules into difference electron density maps, a process critical to the crystallographic screening of potential therapeutic compounds bound to their target molecules. These methods have used reduction of the difference electron density to a simpler representation (Zwart, Langer, and Lamzin, 2004; Aishima et al., 2005) or systematic searching against the density map with rigid fragments of the small molecule (Terwilliger et al., 2006). This is still an active area of research, where problems of small molecule disorder and partial occupancy present significant challenges to robust automation.
REFINEMENT
In general, the atomic model obtained by automatic or manual methods contains some errors and must be optimized to best fit the experimental diffraction data and prior chemical information. In addition, the initial model is often incomplete and refinement is carried out to generate improved phases that can then be used to compute a more accurate electron density map. However, the refinement of macromolecular structures is often difficult for several reasons. First, the data to parameter ratio is low, creating the danger of overfitting the diffraction data. This results in a good agreement of the model to the experimental data even when it contains significant errors. Therefore, the apparent ratio of data to parameters is often increased by incorporation of chemical information, that is, bond length and bond angle restraints obtained from ideal values seen in high-resolution structures (Hendrickson, 1985). Second, the initial model often has significant errors often due to the limited quality of the experimental data or a low level of homology between the search model and the true structure in molecular replacement. Third, local (false) minima exist in the target function. The more local minima and the deeper they are will more likely lead to a failed refinement. Fourth, model bias in the electron density maps complicates the process of manual rebuilding between cycles of automated refinement.
Methods have been devised to address these difficulties. Cross-validation, in the form of the free R-value, can be used to detect overfitting (Brunger, 1992). The radius of convergence of refinement can be increased by the use of stochastic optimization methods such as molecular dynamics-based simulated annealing (Brunger, Kuriyan, and Karplus, 1987). Most recently, improved targets for refinement of incomplete, error-containing models have been obtained using the more general maximum likelihood formulation (Murshudov, Vagin, and Dodson, 1997; Pannu et al., 1998). The resulting maximum likelihood refinement targets have been successfully combined with the powerful optimization method of simulated annealing to provide a very robust and efficient refinement scheme (Adams et al., 1999). For many structures, some initial experimental phase information is available from either isomorphous heavy atom replacement or anomalous diffraction methods. These phases represent additional observations that can be incorporated in the refinement target. Tests have shown that the addition of experimental phase information greatly improves the results of refinement (Pannu et al., 1998; Adams et al., 1999). It is anticipated that the maximum likelihood refinement method will be extended further to incorporate multivariate statistical analysis, thus allowing multiple models to be refined simultaneously against the experimental data without introducing bias (Read, 2001).
The refinement methods used in macromolecular structure determination work almost exclusively in reciprocal space. However, there has been renewed interest in the use of real space refinement algorithms that can take advantage of high-quality experimental phases from anomalous diffraction experiments or NCS averaging. Tests have shown that the method can be successfully combined with the technique of simulated annealing (Chen, Blanc, and Chapman, 1999).
The parameterization of the atomic model in refinement is of great importance. When the resolution of the experimental data is limited, then it is appropriate to use chemical constraints on bond lengths and angles. This torsion angle representation is seen to decrease overfitting and improve the radius of convergence of refinement (Rice and Brunger, 1994). If data are available to a high enough resolution, additional atomic displacement parameters can be used. Macromolecular structures often show anisotropic motion, which can be resolved at a broad spectrum of levels ranging from whole domains down to individual atoms. The use of the fast Fourier transform to refine atomic anisotropic displacement parameters in the program REFMAC has greatly improved the speed with which such models can be generated and tested (Murshudov et al., 1999). The method has been shown to improve the crystallographic R-value and free R-value as well as the fit to geometric targets for data with resolution higher than 2Å. New programs, such as phenix.refine (Afonine, Grosse-Kunstleve, and Adams, 2005), are being developed with the explicit goal of increasing the automation of structure refinement, which still remains a significant bottleneck in structure completion.
VALIDATION
Validation of macromolecular models and their experimental data (Vaguine, Richelle, and Wodak, 1999) is an essential part of structure determination (Kleywegt, 2000). This is important during both the structure solution process and coordinate and data deposition at the Protein Data Bank, where extensive validation criteria are also applied (Berman et al., 2000). More recently, the MolProbity structure validation suite has been developed (Lovell et al., 2003; Davis et al., 2004). This applies numerous geometric validation criteria to assess both global and local correctness of the model. This information can be readily used to correct errors in the model. See Chapters 14 and 15 for more descriptions of validation methods based on stereochemistry and atomic packing. In the future, the repeated application of validation criteria in automated structure solution will help avoid errors that may still occur as a result of subjective manual interpretation of data and models.
CHALLENGES TO AUTOMATION
Noncrystallographic Symmetry
It is not uncommon for macromolecules to crystallize with more than one copy in the asymmetric unit. This leads to relationships between atoms in real space and diffraction intensities in reciprocal space. These relationships can be exploited in the structure solution process. However, the identification of NCS is generally a manual process. A method for automatic location of proper NCS (i.e., arotation axis) has been shown to be successful even at low resolution (Vonrhein and Schulz, 1999). A more general approach to finding NCS relationships uses skeletonization of electron density maps (Spraggon, 1999). A monomer envelope is calculated from the solvent mask generated by solvent flattening. The NCS relationships between monomer envelopes can then be determined using standard molecular replacement methods. When a model is being built automatically, it has been shown that the NCS relationships can be extracted from the local features of the electron density map (Pai, Sacchettini, and Ioerger, 2006).
These methods could be used in the future to automate the location of NCS operators and determine molecular masks. In the case of experimental phasing using heavy atoms or anomalous scatterers, it is possible to locate the NCS from the sites (Lu, 1999; Terwilliger, 2002c). The RESOLVE program automates this process such that NCS averaging can be automatically performed as part of the phase improvement procedure. NCS information can also be used in structure refinement (Kleywegt, 1996) to either decrease the number of refined parameters (NCS constraints) or increase the number of restraints (NCS restraints). The CNS program implements both these methods, and most other refinement programs implement NCS restraints. It should be noted that NCS sometimes could be very close to crystallographic symmetry. For example, a translational relationship between the molecules in the asymmetric unit can lead to very weak reflections that may be interpreted as systematic absences resulting from crystallographic centering or rotational symmetry in a molecular complex. As a result, this may lead to an assignment of a higher symmetry space group than is the case of the true crystal symmetry. These possible complications should always be considered during structure solution as they can lead to stalled R-factors in structure refinement.
Disorder
Except in the rare case of very well-ordered crystals of extremely rigid molecules, disorder of one form or another is a component of macromolecular structures. This disorder may take the form of discrete conformational substates for side chains (Wilson and Brunger, 2000), surface loops, or small changes in the orientation of entire molecules throughout the crystal. The degree to which this disorder can be identified and interpreted typically depends on the quality of the diffraction data. With low to medium resolution data, dual side-chain conformations are occasionally observed. With high-resolution data (1.5Å or better), multiple side-chain and main-chain conformations are often seen. The challenge for automated structure solution is the identification of the disorder and its incorporation into the atomic model without the introduction of errors due to misinterpretation of the data. Disorder of whole molecules within the crystal, as a result of small differences in packing between neighboring unit cells, cannot be visualized in electron density maps. However, the effect on refinement statistics such as the R and free-R values can be significant because no single atomic model can fit the observed diffraction data well. One approach to the problem is to simultaneously refine multiple models against the data (Burling and Brunger, 1994). An alternative approach is the refinement of translation-libration-screw (TLS) parameters for whole molecules or subdomains of molecules (Winn, Isupov, and Murshudov, 2001). This introduces only a few additional parameters to be refined while still accounting for the majority of the disorder. However, it still remains a challenge to automatically identify subdomains. The use of normal modes as an alternative parameterization for the molecular flexibility has the potential for refinement of the structures at much lower resolution (Delarue and Dumas, 2004; Poon et al., 2007), while also avoiding the need to identify subdomains.
CONCLUSIONS
Over the last decade, there have been many significant advances toward automated structure determination. Programs such as PHENIX (Adams et al., 2002) and AutoSHARP (Vonrhein et al., 2006) combine large functional blocks in an automated fashion. The program CNS (Brunger et al., 1998) provides a framework in which different algorithms can be combined and tested, using a powerful scripting language. The CCP4 suite (Collaborative Computational Project 4, 1994) provides a large number of separate programs that can be easily run from a graphic user interface.
Some progress toward full automation has been made by linking together existing programs, which is typically achieved using scripting languages and/or the World Wide Web. However, long-term robust solutions, such as the PHENIX system (Adams et al., 2002), are fully integrating the latest crystallographic algorithms within a modern computer software environment. Eventually, complete automation will need structure solution to be intimately associated with data collection and processing. When automated software permits the heavy atom location and phasing steps of structure solution to be performed in a few minutes, it will enable real-time assessment of diffraction data, as it is collected at synchrotron beamlines. Map interpretation will need to be significantly faster than the present situation, with initial analysis of the electron density taking minutes rather than the hours or days required currently.