Pathogenic EGFR variant modeling in MCF10A cells
Assessing the pathogenicity of EGFR variants in high throughput requires experimental models enabling the linkage of genotype to phenotype. We first set out to assess the impact of primary EGFR mutations in MCF10A cells, an epithelial cell line isolated from a benign mammary tumor. These cells represent an attractive model because they harbor wild-type EGFR and rely on its signaling for growth. Indeed, MCF10A cells overexpressing hyperactive EGFR variants found in patients have been shown to proliferate in the absence of EGF supplementation while cells expressing wild-type EGFR remain quiescent41,42. We, thus, envisioned that introducing EGFR-activating mutations in the genome of MCF10A cells could make edited cells EGF independent and that these cells would outcompete unedited cells when deprived of EGF, enabling us to connect genotype to phenotype in a pooled screen.
First, we evaluated base editing efficiency in MCF10A cells by lentiviral delivery of the CBE BE3.9max or the ABE ABE8e along with an sgRNA predicted to introduce a mutation impacting the ‘gatekeeper’ residue Thr790. Deep sequencing of the target exon revealed efficient editing at the target site with 97% or 95% of reads containing at least one base change 6 days after infection for BE3.9max or ABE8e, respectively, indicating high editing efficiency for both base editors (Extended Data Fig. 1a). Although the total fraction of edited alleles remained stable over time, the relative proportions of editing products varied between day 6 and day 13, with alleles containing two or three edits being enriched as opposed to single edits. This is likely explained by sequential base editing of less preferred collateral bases throughout time and as the preferred base conversion has already been installed on most alleles. This potentially complicates the accurate prediction of variant effects but can also serve to broaden the mutational spectrum that can be assessed in a single experiment.
Base editing scanning identifies loss-of-function (LOF) variants
To evaluate the potential of CBEs and ABEs applied in a MAVE context, we set out to apply the tools to interrogate a spectrum of EGFR mutations. We designed a base editing variant-scanning library composed of 1,496 unique sgRNAs targeting all EGFR exons (Fig. 1a). We also included 200 sgRNAs targeting an EGFR intronic region, 206 nontargeting sgRNAs as negative controls and 103 sgRNAs targeting splice sites of essential genes as positive controls. To explore a broad range of genetic variants accessible by base editing, we cloned this library under the control of a U6 promoter in two different lentiviral backbones expressing either ABE8e or BE3.9max. The two resulting all-in-one base editing sgRNA libraries were quantified by deep sequencing and showed no dropouts and a narrow sgRNA distribution with skew ratios of 1.6 (Extended Data Fig. 1b).
We then delivered both libraries to MCF10A cells by lentiviral infection (Fig. 1b). Each infection was performed in duplicate at a low multiplicity of infection (MOI) with an average coverage of 500 cells per sgRNA. After puromycin selection of transduced cells and EGF deprivation, genomic DNA was extracted and sgRNA libraries were prepared and sequenced while ensuring appropriate coverage and depth. The sgRNA counts were then quantified from sequencing data using a custom script and normalized to nontargeting sgRNAs with MAGeCK43. We then confirmed library coverage and replicate correlation at every time point (Extended Data Fig. 1b), overall indicating a robust dataset warranting further analysis.
We started our analysis by comparing the relative abundances of sgRNAs between the plasmid library and late (day 19) time point, which should reveal sgRNAs that impact MCF10A viability. As expected, we observed no change in the nontargeting negative control sgRNAs but a strong depletion of gene editing positive control sgRNAs targeting splice sites of essential genes with both base editors (Extended Data Fig. 1c). ABE8e and BE3.9max screens distinguished between essential splice site controls and other sgRNAs with areas under the curve (AUCs) of 0.9 and 0.94, respectively, confirming the ability of our pooled library to efficiently introduce edits and elicit measurable phenotypes (Extended Data Fig. 2a).
Encouragingly, multiple sgRNAs predicted to introduce splice site, nonsense, and missense mutations in EGFR were depleted on day 19 with both base editors (Extended Data Fig. 1c). LOF genetic variants are indeed expected to impact the viability of MCF10A cells relying on EGFR signaling for growth. Together, these data confirm efficient base editing of endogenous loci in MCF10A cells and our capacity to robustly detect LOF variants in EGFR.
Identification of oncogenic EGFR mutations
Toward identifying pathogenic EGFR variants that lead to constitutive EGFR signaling in MCF10A cells, we compared sgRNA library distributions in nontreated and EGF-deprived cells. This analysis revealed a spectrum of enriched and depleted EGFR variants, largely localized to specific functional protein domains (Fig. 1c and Extended Data Fig. 2b). Among enriched hits, we observed some of the sgRNAs previously identified to impact cell viability, suggesting that cells with decreased viability are outgrown more rapidly by healthy cells in nontreated samples. To focus our analysis on activating mutations in EGFR and avoid biasing our results toward confounding LOF variants impacting MCF10A viability, we chose to only look at variants preserving cell viability in the presence of EGF (log fold change (LFC) > −0.6 when comparing plasmid and late time points) (Fig. 1d and Extended Data Fig. 2b). This approach identified 19 hits that, while spanning the protein, mostly localized to the tyrosine kinase domain, which is responsible for EGFR autophosphorylation and contains the residues most commonly impacted by mutations in NSCLC. Of the 19 hits, 10 were found in ClinVar and labeled as either pathogenic (2) or VUS (8). A further 2 were found in the Catalogue of Somatic Mutations in Cancer (COSMIC) and 7 were not listed in either database (Supplementary Table 1).
Our screen identified well-known rare pathogenic variants such as Thr790Met and Pro596Ser, which are both reported as pathogenic in ClinVar. Surprisingly, the majority of other hits introduce mutations that were previously observed in tumor samples but are currently not considered as pathogenic. For example, Ser720Phe is absent from ClinVar but was previously observed in people with NSCLC44,45 and is adjacent to Gly719, which is a residue in the kinase domain commonly impacted by mutations19. Several additional hits not considered as pathogenic such as Val765Ile [BE3.9max] and Val769Ile;Ser768Asn [BE3.9max] also affect the αC-helix and the αC–β4 loop, respectively, which are key regulatory structures for the activation of this domain. Lastly, Ile715Val;Lys716Gly [ABE8e] impacts the Lys716 ubiquitination site46 and Tyr727Cys;Lys728Glu [ABE8] contains a phosphorylation site47, suggesting a role of these post-translational modifications in EGFR regulation and stability.
Interestingly, the co-occurrence of certain screen hits provides insights into the role of autoregulation and phosphorylation in EGFR activation. For example, Asp587 and Lys609 are known to form a salt bridge further stabilized by a loop containing Pro596. This interaction is known to contribute to the autoinhibitory conformation of the inactive receptor48, suggesting that its disruption may favor EGFR constitutive activation and constitute a mechanism of oncogenic transformation. To validate this result, we delivered the Asp587Asn [BE3.9max] lentiviral vector to MCF10A cells and sequenced the target genomic locus in nontreated and EGF-deprived cells (Extended Data Fig. 2c). Deep sequencing data analysis confirmed that the predicted Asp587Asn substitution was the most prevalent in both treatment arms. Additionally, alleles harboring this variant appeared enriched under EGF deprivation while the wild-type allele was depleted, confirming that Asp587Asn confers a growth advantage to cells in the absence of EGF.
Our screen also identified three enriched sgRNAs predicted to introduce mutations impacting the Asp314 residue in the extracellular domain and to introduce variants currently classified as VUS. Surprisingly, target sequencing of cells transduced with the Asp314Asn [BE3.9max] construct revealed that the predicted amino acid substitution was depleted under EGF deprivation (Extended Data Fig. 2d). By contrast, another edit located outside of the BE3.9max editing window and introducing the previously uncharacterized Glu317Lys substitution was enriched. This result highlights the importance of validating individual base editing screen hits to account for unanticipated editing products.
Our most surprising result was a cluster of hits found in the C-terminal tail. Among them, we found two exon 25 splice donor mutations predicted to lead to the C-terminal truncation of the receptor after exon 25 (EGFRΔEx26–28). Although the autophosphorylation of this domain is essential for signal transduction of wild-type EGFR stimulated with EGF, it appears not to be required for the downstream signaling of mutant EGFR canonically associated with oncogenesis49. On the other hand, different truncated EGFR variants have been identified in glioblastoma50,51 but not NSCLC and have been reported to be associated with increased receptor activation because of the loss of an autoinhibitory region of the C-terminal tail52,53. Another study showed the activating potential of EGFRΔEx26–28 in NIH-3T3 cells, which is thought to be because of the role of the C-terminal tail in receptor internalization and degradation. This indicates that, although activating EGFR mutations are tumor specific, the MCF10A cell line is sensitive to pathogenic EGFR variants spanning cancer types, thereby offering new opportunities to interrogate multiple aspects of EGFR and its role in signaling and cancer.
To validate that C-terminal truncations lead to EGFR activation, we delivered one of the exon-25-truncating sgRNAs into MCF10A cells and let selected cells grow in the absence of EGF for 5 days before measuring their viability (Fig. 1e). We observed that infected cells edited to express EGFRΔEx26–28 displayed higher EGF-independent growth compared to uninfected cells. Deep sequencing of the target site also confirmed the enrichment of alleles with mutated splice site donors, confirming the base editing screen result (Extended Data Fig. 2e).
Taken together, these results demonstrate the ability of base editing mutational scanning to identify known and unknown EGFR-activating and likely oncogenic variants. Notably, in spite of the limited mutational spectrum introduced by each base editor, our screens identify key domains, residues and post-translational modifications likely involved in receptor regulation and stability. In sum, our screens expand the number of likely pathogenic variants in both the tyrosine kinase and the extracellular domains.
Drug-resistant variant discovery in MCF10A cells
Encouraged by these results, we then set out to expand our screening approach to evaluate the sensitivity of EGFR variants to clinically approved TKIs in the MCF10A cell line. We repeated both base editing screens in the same conditions but replaced the EGF depletion step with either gefitinib or osimertinib treatments, which are first-generation and third-generation TKIs, respectively (Fig. 2a). Both drugs compete with adenosine triphosphate (ATP) to bind the tyrosine kinase active site but through different inhibition mechanisms10,13, which also raises the possibility to dissect drug-specific resistance mechanisms. We deployed our base editing variant-scanning methodology to explore this.
We delivered the BE3.9max and ABE8e EGFR variant-scanning libraries to MCF10A cells at low MOI and selected for infected cells with puromycin. We then applied osimertinib or gefitinib treatment for 8 days before sgRNA library preparation and sequencing. Library quantification was performed as previously and we confirmed high replicate correlation (Extended Data Fig. 3a,b).
We started our analysis by comparing sgRNA enrichment between treated and nontreated samples while accounting for variant fitness (Fig. 2b and Extended Data Fig. 4a). Encouragingly, we observed that, with both drug treatments, a majority of enriched hits were located in the tyrosine kinase domain and included well-known drug-resistant variants. For example, Thr790Met is known as the most prevalent gefitinib-resistant variant and we observed a strong enrichment of the Thr790Met;Gln791Ter [BE3.9max] sgRNA under gefitinib treatment. Importantly, this same variant is not enriched under osimertinib treatment, which is expected as this molecule was specifically developed to counter its emergence. Unexpectedly, osimertinib selection led to the enrichment of different predicted variants at the same position, namely Thr790Ala;Gln791Arg [ABE8e].
We set out to validate this result in a follow-up experiment and measured the viability of MCF10A cells infected with individual sgRNA and base editor pairs followed by TKI treatment for 5 days (Extended Data Fig. 4b). As predicted by the screen, cells infected with Thr790Ala;Gln791Arg [ABE8e] showed higher resistance to osimertinib than to gefitinib, although neither variants are listed in ClinVar, while Thr790Met;Gln791Ter [BE3.9max] showed the opposite resistance profile. Deep sequencing of the genomic target site after drug treatment revealed the enrichment of the Thr790Met edit in the case of gefitinib and the Thr790Ala;Gln791Arg double-mutant allele in the case of osimertinib. In this case, the alleles containing Thr790Ala alone were depleted, suggesting that Gln791Arg is likely driving the resistance phenotype (Extended Data Fig. 4c). This confirms both the specificity and the sensitivity of our base editing variant screening approach and highlights the need to understand drug resistance at single-base resolution.
Under gefitinib selection, all enriched screen hits were located in the tyrosine kinase domain and more specifically around the ATP-binding pocket, which constitutes the binding site of both drugs (Fig. 2b, Extended Data Fig. 4d and Supplementary Table 2). In addition to the Thr790 gatekeeper residue, we identified previously unknown resistant mutations found to affect Val726, Met766 and Thr854, which are all in direct contact with the receptor-bound molecule (Fig. 2d and Extended Data Fig. 5a). We speculate that mutations impacting these residues can directly affect the binding affinity of gefitinib. Importantly, Thr854Ala is classified as VUS, Val726Ala is not listed in either database and Met766Thr has been shown to be resistant to gefitinib in vitro54 but is not listed in ClinVar. To validate this hit, we delivered a lentiviral construct containing the Met766Thr [ABE8e] sgRNA to MCF10A cells and treated them with gefitinib for 7 days, under similar conditions to the screening protocol. Deep sequencing of the target exon revealed an enrichment of reads containing the Met766Thr edit under gefitinib treatment while wild-type alleles were depleted, thus confirming the resistance phenotype (Extended Data Fig. 5b).
Similarly to gefitinib, the top three hits for osimertinib affect residues found in the ATP-binding pocket of the tyrosine kinase domain (Fig. 2d, Extended Data Fig. 5a and Supplementary Table 3). For example, Val845Ala [ABE8e] interacts with the Phe795 and Gly796 residues adjacent to Cys797, which is the osimertinib covalent binding site, hinting at a resistance mechanism involving this residue. Although this mutant is not listed in ClinVar, the similar Val845Leu variant has conflicting reports of pathogenicity in the database. We set out to confirm this phenotype by individual sgRNA delivery and target exon sequencing after drug treatment. This revealed the enrichment of the Val845Ala edit and a strong depletion of wild-type alleles under osimertinib selection, thus validating this hit (Extended Data Fig. 5c).
The remaining hits were found throughout the tyrosine kinase and C-terminal domains, highlighting distinct EGFR regulation mechanisms. For example, enriched sgRNAs were found to affect Lys852 and Gln791, which interact with each other in the osimertinib-bound receptor to form a hydrogen-bond network with residues Asp1012 and Asp1014 of the C-terminal domain. Neither of these variants are listed in ClinVar, although perturbing this network by replacing Gln791 with a hydrophobic residue has been predicted to destabilize osimertinib binding55. Interestingly, in the inactive conformation, Lys852 is also known to directly interact with another hit, Glu1005, which is part of a C-terminal ‘electrostatic hook’ that inhibits the kinase domain activity. Mutations impacting Glu1005 and Asp1006 have been shown to increase the activity of unstimulated EGFR in vitro56, thus potentially promoting the observed resistance phenotype. Targeted amplicon sequencing-based validation of the Lys852Gly [ABE8e] hit confirmed the resistance phenotype with both drugs (Extended Data Fig. 5d). However, contrary to our base editing outcome prediction, deep sequencing data revealed the enrichment of alleles substituting the lysine at position 852 with a glutamine instead of the expected glycine.
Taken together, the base editing variant-scanning results highlight key intramolecular interactions between EGFR residues involved in enzymatic activity regulation, resulting in new and intricate insights into drug-dependent resistance mechanisms. Interestingly, while top resistant mutations for both drugs are found to impact residues in the ATP-binding pocket, osimertinib-resistant hits are also found in the C-terminal domain, hinting at both shared and divergent resistance mechanisms between the drugs. These insights, substantiated by clinical validation, may in the future help clinical decision making.
Variant scanning allows for drug prioritization
In addition to drug-resistant variants, our screening data identify variants that likely increase drug sensitivity. Such candidates are revealed by sgRNAs that are depleted under drug selection. Indeed, when comparing the LFCs of each sgRNA and base editor pair under gefitinib and osimertinib selection, we observe that variant sensitivities vary between both drugs with some mutations leading to opposite effects (Fig. 2c). For example, we observed that Val845Ala [ABE8e] appeared strongly resistant to osimertinib but sensitive to gefitinib. On the other hand, other variants such as Val726Ala [ABE8e], Val769Ala [ABE8e] and Met766Thr [ABE8e] appeared to confer different levels of resistance to gefitinib but to be sensitive to osimertinib. All of these variants are located within the ATP-binding pocket, which suggests that their differential drug sensitivities resulted from specific direct interactions with each molecule.
We also identified variants that appeared to be resistant to both drugs, such as Lys852Glu [ABE8e], which is currently not listed in ClinVar or COSMIC. This residue is known to be involved in reciprocal interactions with residues of the C-terminal tail and to contribute to EGFR autoinhibition. This suggests that common resistance mechanisms can emerge when EGFR autoinhibition is disrupted. Taken together, these results provide new insights into EGFR variant-dependent drug sensitivities, which may help guide therapeutic decisions in the future for clinicians faced with EGFR variants for which clinical data are currently absent.
Distinct sensitivities of primary and compound EGFR mutations
In cancer patients, TKIs are used to counteract the activity of hyperactive EGFR mutants, such as the common Leu858Arg substitution or exon 19 deletion20. We set out to apply our base editing variant-scanning pipeline to evaluate the impact of secondary EGFR mutations on drug sensitivity in the NSCLC-derived PC-9 cell line. These cells represent an attractive model because they are sensitive to TKIs and harbor EGFRΔGlu746–Ala750, which is the most prevalent EGFR deletion in lung cancer45. Additionally, the introduction of the Thr790Met variant in PC-9 cells with base editing has previously been shown to lead to a strong gefitinib resistance phenotype57. We, thus, performed a base editing scanning screen using the same EGFR-targeting sgRNA library and experimental and computational workflows as demonstrated for MCF10A cells.
We started our analysis by comparing sgRNA counts between plasmid and day 19 conditions and confirming high replicate correlation, no shift in the negative control sgRNA population and the depletion of positive control sgRNAs targeting essential splice sites (Extended Data Figs. 6a,b and 7a,b). As was the case in the wild-type EGFR MCF10A cells, in EGFR-mutant PC-9 cells, we also identified a spectrum of EGFR secondary mutations imparting fitness effects. While many of the fitness-altering mutations were shared between MCF10A and PC-9 cells, we identified unique subsets for each cell line (Extended Data Fig. 7c). Interestingly, tyrosine kinase variants appear to have a stronger impact on viability in EGFR-mutant PC-9 compared to wild-type EGFR MCF10A cells, which we speculate could be because of reduced resilience of mutant EGFR or oncogene addiction.
Next, we set out to characterize the impact of EGFR secondary mutations on TKI drug resistance (Fig. 3a and Extended Data Figs. 7d and 8a). Similarly to MCF10A cells, we observed that the most enriched gefitinib-resistant hits are located in the tyrosine kinase domain, while top osimertinib hits span both the tyrosine kinase and the C-terminal domains. We, thus, continued our analysis by comparing shared hits between both cell lines. While this revealed a broad range of shared and distinct variants spanning EGFR, we noticed that the most commonly impacted positions appeared to be the same regardless of the initial EGFR genotype (Fig. 3b). For example, in both cell lines, Thr790Met;Gln791Ter [BE3.9max] and Ile853Val;Thr854Ala [ABE8e] are strongly enriched under gefitinib treatment while Thr790Ala;Gln791Arg [ABE8e] and Val845Ala [ABE8e] are enriched with osimertinib.
To confirm a selection of these insights, we delivered the Ile853Val;Thr854Ala [ABE8e] to PC-9 cells and treated infected cells with gefitinib or osimertinib for 7 days. Deep sequencing of the targeted region revealed that reads corresponding to the unedited alleles were strongly depleted under drug treatment, thus confirming efficient growth inhibition in unedited cells, as also previously observed in MCF10A cells (Fig. 3c and Extended Data Fig. 5e). By contrast, we observed with both drugs the enrichment of reads containing the Ile853Val;Thr854Ala double edit while Thr854Ala alone was enriched under gefitinib selection but not osimertinib. Thr854Ala is currently classified as VUS and has been previously observed in people treated with first-generation TKIs58. Our data, thus, suggest that these residues are important for the activity of both drugs, whereas Thr854Ala is sufficient to drive resistance to gefitinib but not osimertinib.
We then compared sgRNAs that were differentially enriched across the two cell lines. For example, Thr790Ala;Gln791Arg [ABE8e] and Gly719Gly;Ser720Phe [BE3.9max] appeared to only confer gefitinib resistance in PC-9 cells but not in MCF10A, suggesting the existence of different tyrosine kinase conformations and ligand interactions between both cell lines. In a follow-up experiment, we delivered the Thr790Ala;Gln791Arg [ABE8e] sgRNA to PC-9 cells and confirmed the enrichment of the Thr790Ala variant under drug treatment and its role in gefitinib resistance in this cell line (Extended Data Fig. 8b). Similarly, His773Arg [ABE8e] is currently classified as likely pathogenic and was enriched with both drugs in PC-9 cells but not in MCF10A. Validation of this construct confirmed the resistant phenotype under osimertinib selection but the allele enrichment was not statistically significant after gefitinib treatment, possibly because of a lower resistance to this molecule (Extended Data Fig. 8c).
Surprisingly, with both drugs and both base editors, we noticed the enrichment of sgRNAs predicted to impact the exon 25 splice donor and leading to the EGFRΔEx26–28 truncation. Interestingly, this truncation was found to drive EGFR activation in our MCF10A screen. However, it appears to confer drug resistance only in PC-9 cells, hinting at a resistance mechanism requiring previous receptor hyperactivation or a specific receptor conformation resulting from the ΔGlu746–Ala750 deletion. Individual sgRNA follow-up experiments confirmed an enrichment of edited alleles with a mutated splice donor site under gefitinib and osimertinib treatment (Fig. 3d). Taken together, these data confirm that the secondary truncation of a constitutively active EGFR variant such as EGFRΔGlu746–Ala750 is able to maintain its downstream signaling and might constitute a possible mechanism of resistance to first-generation and third-generation TKIs.
Together, these results highlight the importance of evaluating drug resistance variants in relevant genomic contexts, including pre-existing EGFR mutations, and confirm the relevance of sequencing the target site to validate base editor screen hits.
Interrogation of patient-derived variants with prime editing
Base editing screens enable the sensitive identification of phenotype-inducing variants at accessible nucleotides but remain limited in terms of the mutation spectrum they can introduce. In particular, ABE8e and BE3.9max coupled with wild-type SpCas9 can introduce only about 17.6% and 18.6% of coding EGFR variants listed in ClinVar and COSMIC, respectively (Extended Data Fig. 9a). While this limited capacity to induce desirable edits is in part because of protospacer-adjacent motif (PAM) restriction that can be circumvented through alternative Cas enzymes59, most of the mutations simply cannot be introduced because of the impossibility of base editing chemistry to convert between all codons. For instance, one of the most prevalent EGFR-activating variants is Leu858Arg, which is not found in our base editing screens because its codon can only be targeted by BE3.9max, where it is predicted to introduce a synonymous mutation.
To explore a more clinically relevant mutational space, we set out to leverage prime editing, which can introduce all possible base substitutions, as well as short insertions and deletions35. First, we established an MCF10A cell line harboring an MLH1 gene knockout, which has been shown to drastically improve prime editing efficiency36, and expressing the PEmax enzyme (Extended Data Fig. 9b,c). We then tested prime editing efficiency in this cell line using lentiviral delivery of engineered pegRNAs (epegRNAs, henceforth referred to as pegRNAs)37 designed to introduce 14 EGFR mutations found in variant databases or our base editing screen results. The quantification of edited allele fractions by deep sequencing on days 7, 14 and 22 after transduction revealed drastically different prime editing efficiencies across targets, with 5 pegRNAs yielding less than 2% of edited alleles while 5 others performed beyond 25% at day 22 (Fig. 4a). The highest editing efficiency was 66%, obtained for the Ala289Val variant. Interestingly, most targets appeared to have reached editing saturation at day 14 with the exception of Gly719Ser, which progressed from 7.2% to 26.3% between days 7 and 22, potentially suggesting a growth advantage of cells with this variant in presence of EGF. By contrast, the edited fraction of Thr363Ile, a mutation previously reported in glioblastoma, decreased over time, suggesting a negative impact on cell fitness.
Encouraged by these results, we designed a pegRNA library introducing all variants impacting EGFR exons and untranslated regions (UTRs) listed in the ClinVar and COSMIC databases (Fig. 4b). With the goal of directly comparing base editing and prime editing technologies, we additionally included 815 variants introduced in our previous base editing screens. Of the 3,191 unique EGFR variants found, 2,952 (~92%) could be introduced by prime editing with a maximum distance of 20 nucleotides between the edit and nick site. The remaining 239 variants could not be targeted because no spacer was available in close proximity or because the corresponding pegRNAs contained poly(T) or BsmBI restriction sites that would preclude their expression or cloning, respectively. For each accessible variant, barcoded pegRNAs were designed with different combinations of primer binding site (PBS) lengths to understand how this parameter impacts library production and maximize our chances of successful editing. Whenever possible, additional synonymous mutations were added alongside the desired edit, which has been shown to increase prime editing efficiency60. In these cases, pegRNAs harboring only the intended edit or the synonymous variant were also added to the library to control for unexpected effects on these combinations. The resulting library was cloned into a lentiviral vector expressing a puromycin resistance marker and green fluorescent protein (GFP) (Fig. 4b and Extended Data Figs. 9d–f).
Upon sequencing the pegRNA plasmid library, we observed a bias between EGFR-targeting pegRNAs with different PBS lengths (Fig. 4c). Interestingly, this effect is not observed in the case of nontargeting pegRNAs that have a scrambled extension not complementary to their spacer. We speculate that a longer PBS negatively impacts pegRNA cloning because the PBS, unlike the reverse transcription template (RTT), is complementary with the pegRNA spacer, which could result in the formation of secondary structures during oligo library amplification or cloning. While this bias should be considered when designing large pegRNA libraries, especially when including a broad range of PBS lengths, we nonetheless decided to move forward with our small library, as its global bias was minimal (skew ratio = 7.7).
We then set out to introduce our library of patient-derived mutations in EGF-dependent MCF10A cells and assess their EGFR activation potential. We delivered our pegRNA library to MCF10A∆MLH1 cells stably expressing PEmax36 and let them accumulate edits for 14 days before EGF deprivation. Cells were harvested after 8 days of selection followed by pegRNA barcode sequencing. After confirming high replicate correlations (Extended Data Fig. 9g), we compared pegRNA counts between the nontreated and EGF-deprived arms of the screen. As expected, we observed no change in the distributions of nontargeting constructs, while the most enriched pegRNAs introduced intended edits with or without additional synonymous mutations (Extended Data Fig. 10a).
Our screen revealed many known and surprising unknown hits. For example, we observed an enrichment of pegRNAs introducing multiple pathogenic mutations affecting the Ala289 and Thr263 residues in the extracellular domain (Fig. 4d and Extended Data Fig. 10b). Additionally, we identified Thr363Ile, a variant present in COSMIC and predicted to be pathogenic but absent from ClinVar. In these cases, individual pegRNAs of different extension lengths introducing the same edit were enriched together, suggesting that these did not arise through technical noise. These extracellular domain mutations are not frequently found in epithelial cell-derived breast cancers61 (the origin of MCF10A cells) but are most frequently found in glioblastoma, with Ala289Val and Ala289Asp being the most common62,63 (Extended Data Fig. 10c).
Surprisingly, our screen did not capture common mutations impacting the tyrosine kinase domain such as Leu858Arg or exon 19 deletions, likely because of low editing efficiencies observed for these variants. By contrast, our screen identified individual pegRNAs introducing exon 20 insertions affecting the αC–β4 loop of the tyrosine kinase domain. These constitute a largely uncharacterized category of oncogenic mutations found in lung cancer and associated with resistance to first-generation and third-generation TKIs20,64. While Asn771_Pro772insVal and Asn771_Pro772insHis are classified as drug resistant and pathogenic, respectively, we also identify variants not listed in ClinVar such as His773_Val774 duplication or His773_Val774delinsLeuMet.
To validate a selection of our hits, we delivered individual pegRNAs to PEmax-expressing MCF10A∆MLH1 cells and measured the fraction of edited alleles after 8 days of EGF deprivation (Fig. 4e). While initial prime editing efficiencies for most pegRNAs varied from 11% to 59%, we observed the enrichment of alleles harboring Ala289Asp/Thr/Val, Thr263Pro and all tested exon 20 insertions in the absence of EGF. Interestingly, the allelic fraction of the His773_Val774dup edit reached only 2.6% after selection in spite of an initial editing efficiency of 0.1%, thus representing a 23-fold enrichment and suggesting a strong phenotypic effect of this variant.
Next, we set out to evaluate the impact of additional synonymous variants on pegRNA enrichment and the sensitivity of our screen. We, thus, compared the enrichment of pegRNAs introducing only intended edits with that of constructs harboring additional synonymous mutations (Extended Data Fig. 10d). This revealed that the majority of hits were enriched with both vector types while showing similar or increased enrichment in the presence of synonymous edits. Surprisingly, Val774Met, Ser1026Gly and Gly804Gln were enriched only in the absence of synonymous edits, while Pro1090Leu and Gln581Leu/Lys were only enriched in their presence. Individual validation of these variants failed to demonstrate their enrichment in the absence of EGF, suggesting that they are false positives resulting from experimental noise (Extended Data Fig. 10e). Taken together, these results suggest that additional synonymous edits can serve to increase both prime editing efficiency and library redundancy, thus increasing the confidence in hits enriched in both their presence and their absence.
While many pathogenic mutations were identified as hits in our prime editing screen, future improvements to the prime editing technology will likely increase the sensitivity of the assay. Taken together, prime editing screens represent a promising avenue to expand genetic diversity in mutational scans and reveal pathogenic mutations undiscoverable by base editing, such as short insertions.