Thursday, June 4, 2009

Ingredients for a design space based on probability of success

Previous posts have referred to work by DynoChem and others to provide tools to quantify uncertainty in model predictions and translate that into the (joint) probability of successfully meeting several specifications, such as CQAs, at a particular set of processing conditions (factors, or process parameters). The question of how best to calculate this probability, for any process model and set of experimental data is not straightforward to answer.

Many readers will be at least casually aware of alternative schools of thought in the statistics community, namely 'frequentist' - the statistics that most of us learned in school and university and use to a degree every day and 'Bayesian'. The former calculates probability from the frequency of observing a certain outcome; the latter refines an initial subjective estimate of probability (the 'prior') using new information from observations. Good discussions of these alternative approaches are available all over the web and elsewhere; e.g. http://www.rasmusen.org/x/2007/09/25/bayesian-vs-frequentist-statistical-theory/; and for a longer read http://nb.vse.cz/kfil/elogos/science/vallverdu08.pdf.

Whatever about the specifics and relative merits of these approaches, both provide useful insight for design space development by taking explicit account of uncertainty and risk in a multivariate system and published examples of both, as well as their inclusion in regulatory filings, will become increasingly common. Members of DynoChem Resources can access knowledge base articles and other useful materials in this context.

In this posting I am concerned with what goes before the probability calculations; specifically the modelling effort and data to support it. Unless the underlying data and modeling are sound, probability calculations, however advanced the calculation procedure, will have little or no meaning.

With the emphasis on chemical reactions in API synthesis (e.g. final step) and after the solvent, catalyst and reagents have been selected, important ingredients in the mixture, whatever statistical approach is ultimately used are:

1. upfront thinking on a mechanistic basis to determine factors and settings for initial screening experiments; supported by prior data if relevant data exist (see previous posts on process schemes);
2. screening experiments in which the process is followed by taking multiple samples; some of these experiments should screen for physical rate limitations and aim to determine whether physical or chemical phenomena are 'rate-limiting';
3. characterization experiments, in which factors affecting the limiting phenomena are studied across a range of settings; the extremities and some centre-points (with replication) may be adequate for a mechanistic model; a larger set of experiments may be required using a statistically designed (DOE) program of experiments; responses Y are measured as a function of factors X;
4. a modeling effort alongside 3 in which the relationship between Y and X is captured in either a mechanistic or DOE model, or both; the lack of fit and other statistics relating to model uncertainty are quantified; further experiments to reduce uncertainty may be merited and/or improvements in the experimental or analytical technique; data from a portion of experiments should be used for model development and the remaining experiments for model verification; ultimately a single model should fit all of the reliable data; the mechanistic model in particular may be used to extrapolate to determine 'optimum' conditions outside the ranges studied to date; note that experimental data can be one of the least reliable inputs to a model, for a host of practical reasons; unreliability of experimental data (e.g. lack of mole or mass balance) may only be noticed if the model has a mechanistic basis;
5. criticality studies, to determine the proximity to edge of failure for limiting factors; these can leverage a mechanistic model if one exists; otherwise will require further experiments to extrapolate or mimic likely failure modes;
6. factor space exploration; this may be a very broad, full factorial, exploration with a mechanistic model, or a narrower exploration using a further set of DOE experiments; in either case, model uncertainty and/or experimental error are taken into account; with the mechanistic model only, we can add formulas for derived responses that were not or cannot easily be measured (e.g. pass time, fail time); an important feature of a mechanistic model is that one set of model parameters fits all responses, not one set per response.
7. design space definition; for a limited set of factors, this defines the relationship among their ranges that produces product of acceptable quality; until recently, overlapping response surfaces for each CQA was considered adequate; a more reliable approach is to calculate the probability of success across the factor space, leading to a direct estimate of the associated risk of failure and a narrower design space; here the relative merits of Bayesian and frequentist statistics may become relevant;
8. confirmatory experiments that operating within the design space provides the required level of assurance of quality;
9. with a mechanistic model only: demonstrate to colleagues, management, regulators, manufacturing and quality control that a high level of process understanding has been achieved, otherwise the mechanistic model would not fit the data; justify the scale-independence of the design space; demonstrate the impact of scale-up on the CQA by predicting performance in large scale equipment.

The models developed above may be leveraged pre- and post-NDA in many other ways, including to guide process development, achieve yield or other business objectives, facilitate technology transfer and be used at-line. Mechanistic models in particular also offer new ways to define design space to maximize flexibility and be tolerant to minor process upsets.

Keen Bayesian statisticians reading the above will notice that a high degree of prior knowledge is used to develop these guidelines and to carry out the associated experimental and mechanistic modeling work; in that sense there is something very Bayesian about how mechanistic models are developed.

In the mechanistic approach, modeling takes place alongside experiments and new information leads to refinements in the model. The probability that the model is valid is thereby continually refined upwards as new data are included, following Bayes' theorem.

New data also add degrees of freedom to the model, leading to ultimately sharper definition of probability distributions for model responses, important for design space definition.