Mutagenicity assessment of drug impurities according to the ICH M7 (R1) guideline and the role of expert review
The evaluation of the genotoxicity potential of drug impurities has been the focus of regulatory recommendations issued by the United States Food and Drug Administration (FDA, 2008), European Medicines Agency (EMA, 2006 and 2010), and International Conference for Harmonization (ICH, 2014, 2015). In 2017, the ICH M7 issued a revised guideline providing a framework for the assessment and control of mutagenic impurities in pharmaceuticals.
In the absence of reliable experimental mutagenicity and/or carcinogenicity data, the guideline recommends using computational toxicology assessment to predict the outcome of the bacterial mutagenicity assay (Ames test). This assessment should include two (Q)SAR prediction methodologies, that complement each other. One methodology should be expert rule-based and the second should be statistical-based. The guideline does not recommend any specific model, but it generally states that the (Q)SAR models “should follow the general validation principles set forth by the Organisation for Economic Co-operation and Development (OECD)”. The most commonly used models in the pharmaceutical sector are commercially available, e.g., Leadscope Genetox Expert Alerts and Derek Nexus (Lhasa Limited) for the expert-rule based models, and CASE Ultra from MultiCASE, Inc., and Sarah Nexus (Lhasa Limited) for the statistical models.
The outcome of the two types of prediction should allow the assignment of the impurity to one of the five ICH M7 classes as set in the guideline: Class 1 (known mutagenic carcinogens); Class 2 (known mutagens with unknown carcinogenic potential); Class 3 (compounds with alerting structure, unrelated to the structure of the drug substance); Class 4 (compounds with alerting structure, same alert in drug substance or compounds related to the drug substance, which have been tested and are non-mutagenic); Class 5 (compounds with no structural alerts, or alerting structure with sufficient data to demonstrate lack of mutagenicity or carcinogenicity). Impurities classified as class 4 or 5 do not require further action, while the class 1 impurities have to be controlled “at or below compound-specific acceptable limit”, and the class 2 or 3 impurities have to be controlled at or below the Threshold of Toxicological Concern limits.
However, the (Q)SAR models adhering to the OECD principles do not always provide clear positive or negative predictions. This may be due to three main reasons, as described by Amberg et al (2016). First, the model may consider the impurity as out-of-domain, since the structural features of the impurity are not adequately covered by the model (third OECD validation principle). Secondly, the prediction results may be categorized as equivocal or indeterminate due to weak or conflicting evidence, such that a definitive prediction cannot be made with adequate confidence. Finally, a model may be technically unable to process certain types of chemicals, e.g., metal complexes. In this frame, the ICH guideline proposes to use expert knowledge “to provide additional supportive evidence for any positive, negative or inconclusive prediction and provide a rationale to support the final conclusion”.
The expert review of ambiguous outcomes should examine the basis for the prediction in order to identify reasons even for discarding the prediction itself. Amberg et al (2016) have proposed a series of principles and procedures to perform the expert review: it would be critical to 1) determine whether there are shared alerts with known negative compounds (e.g., from structurally similar compounds – ICH M7 class 4)], 2) provide an assessment of the relevance of features or underlying data from the statistical models, 3) identify potential analogues from public or in-house sources, and 4) assess the strength of the single prediction, when only a single methodology has generated a prediction.
A more recent article from Amberg et al. (2019) further supports the value of the expert review and proposes to use additional QSAR models to strengthen the results when the initial prediction is out-of-domain. In addition, the authors present a thorough analysis of proprietary data to help understanding the likelihood of misclassifying a mutagenic impurity as non-mutagenic based on different combinations of (Q)SAR results. The results clearly indicate that when the out-of-domain predictions are either negative or indeterminate, the likelihood of misclassifying a mutagenic impurity as non-mutagenic is similar to when there are two clear negative results. Overall, the expert rule-based models seem to be the key driver of the expert review when there are conflicting results.
It is evident that a well-established procedure for the assessment of the genotoxicity potential of drug impurities has been put in place in the pharmaceutical industry. However, there is still room for improvement and even more so for those cases with conflicting results. For example, would read across, as proposed by OECD and ECHA, be accepted as an additional approach by regulatory agencies to assess the impurities? Also, would the regulatory agencies promote the use of publicly available (Q)SAR tools to meet the needs of small and medium enterprises of the pharmaceutical sector? It should be noted that several publicly available (Q)SAR models (e.g., Toxtree, US EPA T.E.S.T, and CAESAR) have been accepted by ECHA in the frame of the REACH regulation and EFSA (Benigni et al., 2019) in the context of genotoxicity assessments.