Tuesday, June 30, 2009

Model verification - a proposal

I have been meaning to post on model verification for a while and cumulative interactions with customers have led to this proposal. See the Twitter postings alongside for relevant material from the FDA and recent additions to DynoChem Resources. To automatically keep abreast of these, I encourage you to follow DynoChem on Twitter.

Proposed approach to model verification:

  1. Model development and parameter fitting should be based on a 'training set' of experimental data, that is a subset of all data available.
  2. Verification should in general be completed against a separate set of experimental data, probably testing the limits of the model (e.g. points at the corners of the anticipated region of validity).
  3. Those data do not need to be from 'designed' or perfect experiments; in fact, inclusion of spiked or otherwise perturbed experiments can be highly valuable and informative.
  4. Verification should be described, presented and qualified as being ‘to within E%' or 'within E response units'. E may vary within a factor space.
  5. E is not arbitrary, but equal to the prediction band width for that response, with confidence level 1-alpha, where alpha may be 5% (95% confidence) or perhaps 1% (99% confidence).
  6. The limits of applicability of the statement in 4 above (i.e. the region of factor space that is covered) should be defined.

Usage of prediction band widths (or 'prediction intervals') in this way allows a statistically sound statement to be made about the level of verification of any model in which parameters have been fitted. During model development, E reduces if the model ‘improves’, i.e. the fit improves and uncertainty is reduced. When the model is mechanistic, there is often little risk of 'overfitting' (many degrees of freedom) and the quality of mechanistic understanding, together with collection of good data, are the main factors that improve the fit. Bear in mind also that in a mechanistic model, a single set of parameters fits all responses (not separate models for each response) and the fit is judged versus multiple samples, not just end-points.

E needs to be small if users are going to operate near the CQA (or another important) limit, but can be relatively larger if not. So the verification level required for a model to be useful has an element of fitness for purpose.

Prediction bands take account of 'lack of fit' and are correspondingly wider for responses that fit poorly compared to those that fit well. For a CQA upper limit (e.g. typical for an impurity), mathematically one could therefore say that a model is verified and fit for QbD purposes if:

average response*(1+E%) is comfortably less than CQA limit

The above expression is also equivalent to evaluating the probability that CQA will be less than its limit; that probability increases if E is low and/or the average response is well below the CQA upper limit. So that probability is itself an indicator of the degree of model verification achieved.

Of course, in a good mechanistic model, E will be small for all responses, not just a CQA; focusing on reducing E will improve process understanding of the whole system and both prediction and confidence band response surfaces may be drawn to guide experimentation to this goal, see previous posts.

Follow DynoChem on Twitter

By all accounts, Twitter seems to be an excellent way to keep people informed of developments and we are starting to use it to communicate with DynoChem users. If you follow this blog, or are a member of DynoChem Resources or our Google Group, I encourage you to follow DynoChem on Twitter.

I have been using Twitter for several months and find the short postings (about 1 sentence long) a very efficient way to catch up on news quickly. It's easy to opt in or out and better enables a two-way relationship than e.g. RSS feeds. There are lots of Twitter clients for smartphones; I use TweetDeck.

Thursday, June 11, 2009

Presentations from DynoChem User Meeting 2009 now available to download

The DynoChem User Meeting 2009 was held in Philadelphia on 13-14 May. Presentations from companies such as Abbott, Amgen, AstraZeneca, Chemagis, Merck, GSK, Pfizer and Wyeth may now be downloaded from DynoChem Resources (login required).

Thursday, June 4, 2009

Ingredients for a design space based on probability of success

Previous posts have referred to work by DynoChem and others to provide tools to quantify uncertainty in model predictions and translate that into the (joint) probability of successfully meeting several specifications, such as CQAs, at a particular set of processing conditions (factors, or process parameters). The question of how best to calculate this probability, for any process model and set of experimental data is not straightforward to answer.

Many readers will be at least casually aware of alternative schools of thought in the statistics community, namely 'frequentist' - the statistics that most of us learned in school and university and use to a degree every day and 'Bayesian'. The former calculates probability from the frequency of observing a certain outcome; the latter refines an initial subjective estimate of probability (the 'prior') using new information from observations. Good discussions of these alternative approaches are available all over the web and elsewhere; e.g. http://www.rasmusen.org/x/2007/09/25/bayesian-vs-frequentist-statistical-theory/; and for a longer read http://nb.vse.cz/kfil/elogos/science/vallverdu08.pdf.

Whatever about the specifics and relative merits of these approaches, both provide useful insight for design space development by taking explicit account of uncertainty and risk in a multivariate system and published examples of both, as well as their inclusion in regulatory filings, will become increasingly common. Members of DynoChem Resources can access knowledge base articles and other useful materials in this context.

In this posting I am concerned with what goes before the probability calculations; specifically the modelling effort and data to support it. Unless the underlying data and modeling are sound, probability calculations, however advanced the calculation procedure, will have little or no meaning.

With the emphasis on chemical reactions in API synthesis (e.g. final step) and after the solvent, catalyst and reagents have been selected, important ingredients in the mixture, whatever statistical approach is ultimately used are:

1. upfront thinking on a mechanistic basis to determine factors and settings for initial screening experiments; supported by prior data if relevant data exist (see previous posts on process schemes);
2. screening experiments in which the process is followed by taking multiple samples; some of these experiments should screen for physical rate limitations and aim to determine whether physical or chemical phenomena are 'rate-limiting';
3. characterization experiments, in which factors affecting the limiting phenomena are studied across a range of settings; the extremities and some centre-points (with replication) may be adequate for a mechanistic model; a larger set of experiments may be required using a statistically designed (DOE) program of experiments; responses Y are measured as a function of factors X;
4. a modeling effort alongside 3 in which the relationship between Y and X is captured in either a mechanistic or DOE model, or both; the lack of fit and other statistics relating to model uncertainty are quantified; further experiments to reduce uncertainty may be merited and/or improvements in the experimental or analytical technique; data from a portion of experiments should be used for model development and the remaining experiments for model verification; ultimately a single model should fit all of the reliable data; the mechanistic model in particular may be used to extrapolate to determine 'optimum' conditions outside the ranges studied to date; note that experimental data can be one of the least reliable inputs to a model, for a host of practical reasons; unreliability of experimental data (e.g. lack of mole or mass balance) may only be noticed if the model has a mechanistic basis;
5. criticality studies, to determine the proximity to edge of failure for limiting factors; these can leverage a mechanistic model if one exists; otherwise will require further experiments to extrapolate or mimic likely failure modes;
6. factor space exploration; this may be a very broad, full factorial, exploration with a mechanistic model, or a narrower exploration using a further set of DOE experiments; in either case, model uncertainty and/or experimental error are taken into account; with the mechanistic model only, we can add formulas for derived responses that were not or cannot easily be measured (e.g. pass time, fail time); an important feature of a mechanistic model is that one set of model parameters fits all responses, not one set per response.
7. design space definition; for a limited set of factors, this defines the relationship among their ranges that produces product of acceptable quality; until recently, overlapping response surfaces for each CQA was considered adequate; a more reliable approach is to calculate the probability of success across the factor space, leading to a direct estimate of the associated risk of failure and a narrower design space; here the relative merits of Bayesian and frequentist statistics may become relevant;
8. confirmatory experiments that operating within the design space provides the required level of assurance of quality;
9. with a mechanistic model only: demonstrate to colleagues, management, regulators, manufacturing and quality control that a high level of process understanding has been achieved, otherwise the mechanistic model would not fit the data; justify the scale-independence of the design space; demonstrate the impact of scale-up on the CQA by predicting performance in large scale equipment.

The models developed above may be leveraged pre- and post-NDA in many other ways, including to guide process development, achieve yield or other business objectives, facilitate technology transfer and be used at-line. Mechanistic models in particular also offer new ways to define design space to maximize flexibility and be tolerant to minor process upsets.

Keen Bayesian statisticians reading the above will notice that a high degree of prior knowledge is used to develop these guidelines and to carry out the associated experimental and mechanistic modeling work; in that sense there is something very Bayesian about how mechanistic models are developed.

In the mechanistic approach, modeling takes place alongside experiments and new information leads to refinements in the model. The probability that the model is valid is thereby continually refined upwards as new data are included, following Bayes' theorem.

New data also add degrees of freedom to the model, leading to ultimately sharper definition of probability distributions for model responses, important for design space definition.

ShareThis small