Int&in - Home

Welcome to the Int&in web server for intein split site activity prediction.

Int&in, as the name easily reveals, is dedicated to inteins. In particular, split inteins.

Inteins are "intervening proteins": proteins found within other proteins, albeit transiently. As a matter of fact, inteins eventually excise themselves from their precursors in a process known as protein splicing (Hirata et al., 1990; Kane et al., 1990). This is an autocatalytic process, meaning that it does not need any co-factor, which concludes itself with the formation of a peptide bond between the two polypeptides initially bordering the intein (referred to as N- and C-exteins; see Figure 1A). In nature, we find so-called contiguous inteins, encoded by a single gene, as well as split inteins, encoded by two separate genes. In this case, we speak of N- and C-terminal intein fragments (Figure 1B). These fragments must first form a complex to become splice-competent; therefore, the affinity between the intein fragments clearly plays an important role in defining the splicing competency of split inteins. Regardless whether contiguous or split, inteins use a common series of chemical reactions to splice themselves out of the host. For details about this process, please refer to Di Ventura and Mootz, 2019.

Figure 1. Schematic representation of the protein splicing reaction carried out by contiguous (A) and split (B) inteins.

The Int&in web server executes a machine learning algorithm to predict whether a certain split site will give rise to two intein fragments able to assemble a functional intein competent of protein splicing. We call such sites active. Those that give rise to a split intein unable to splice are called inactive.

This explains why we chose the name "Int&in": to signify the two intein fragments (Int, in) coming together.

Why would one want to predict active split sites in inteins?
Inteins are used in protein engineering for various applications such as protein semisynthesis, labeling, and circularization (Topilina and Mills, 2014; Di Ventura and Mootz, 2019; Wang et al., 2022). In particular, split inteins enable the ligation of separate polypeptides (it could entire proteins, or protein domains or even small motif, such as nuclear localization signals, degrons etc) with no or minimal scarring -that is, without the addition or with the addition of only few amino acids originally absent from the protein of interest-, reducing the chance of compromising the function of the final protein product (Wang et al., 2022).

For instance, we recently developed a method called SiMPl that uses split intein-mediated reconstitution of enzymes that confer resistance towards antibiotics to select pure populations of cells carrying two distinct constructs with a single antibiotic in bacteria and mammalian cells (Palanisamy et al., 2019; Palanisamy et al., 2021).

Albeit there are many naturally split inteins, researchers have strived (and continue to do so) to create new split inteins either out of contiguous ones, or starting from split inteins, with the specific aim to obtain two intein fragments with low affinity for each other, which would require interacting partners to come in physical proximity and become splice-competent. This has allowed for the creation of so-called conditional inteins where the interaction between the intein fragments was triggered by an external input, such as a small chemical or light (Mootz and Muir, 2002; Böcker et al., 2019). Split inteins split at artificial new sites giving rise to two fragments having low affinity for each other were also used to develop methods to detect protein-protein interactions (Yao et al., 2020). Finally, in protein semisynthesis, where one intein fragment is chemically synthesised, it is also beneficial to have as short a fragment as possible. While one could resort to inteins naturally asymmetrically split, with one fragment being particularly short, having a method to predict where to split any intein of choice to obtain short N- or C-fragments would be very powerful.

So far, split sites in inteins have been heuristically determined...

With Int&in there is no need anymore to do trial-and-error experiments!

The algorithm
Int&in is based on a machine learning algorithm that predicts with high accuracy active and inactive split sites in inteins. It uses a logistic regression classifier on a number of sequence and structural properties to evaluate each amino acid of a given intein sequence for its potential to be an active split site. The model was trained on a dataset created by us of 126 split sites generated using the gp41-1, Npu DnaE and CL inteins and validated using 97 split sites extracted from the literature. It achieves an accuracy of 0.79 and 0.78 for the training and validation sets, respectively.

To know more about the details behind the algorithm please refer to Schmitz et al., 2023.

Before using the algorithm we advise you to read the tutorial.

Enjoy splitting your favourite intein!

Funding