Proto is not affiliated with RIMD. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
MAFFT (Katoh and Standley, 2013) is a multiple sequence alignment program that constructs an alignment through progressive alignment along a guide tree followed by optional iterative refinement. Pairwise distances between input sequences are first estimated rapidly using either k-mer counting or a Fast Fourier Transform that detects homologous segments in compositionally transformed sequences. A guide tree is built from these distances, sequences are progressively aligned along the tree, and the alignment is optionally refined by an iterative cycle that repeatedly removes and re-aligns subsets of sequences. MAFFT exposes several algorithm variants that differ in pairwise scoring and refinement strategy.FFT-NS-i is the default progressive method with iterative refinement on FFT-derived distances and is appropriate for large datasets. L-INS-i (localpair) performs local pairwise alignment with iterative refinement and is appropriate for sequences with one alignable domain flanked by variable regions. G-INS-i (globalpair) performs global pairwise alignment with iterative refinement and is appropriate for sequences of similar length. E-INS-i (genafpair) is a local-alignment variant that handles sequences with multiple conserved domains separated by long unalignable regions.
Learning Resources
- MAFFT software homepage (Osaka University). Official distribution site and user documentation for the command-line program that this toolkit invokes.
- MAFFT algorithm comparison (Osaka University). A side-by-side comparison of the alignment algorithm variants that the
align_methodfield selects. - MAFFT online server (Osaka University). Hosted entry point to the same MAFFT pipeline, useful for a quick browser-based alignment before scripting against the tool.
Tools
MAFFT Alignment (mafft-align)
Performs multiple sequence alignment over two or more input sequences using the bundled mafft command-line program. The selected algorithm variant is controlled by the align_method configuration field. The tool returns a typed MSA object containing the aligned sequences and their identifiers, with helpers for column statistics and serialisation to FASTA or A3M.API Reference
Config: MafftConfig
Config: MafftConfig
"auto" (MAFFT picks by input size), "localpair" (L-INS-i), "globalpair" (G-INS-i), or "genafpair" (E-INS-i).Available options: auto, localpair, globalpair, genafpair0 = no refinement; ~1000 enables the full *-INS-i pipelines with *pair methods.mafft CLI tokens for niche flags (e.g. ["--retree", "3", "--reorder"]).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: MafftOutput
Output: MafftOutput
Applications
This tool is appropriate for any analysis that benefits from a multiple sequence alignment of homologous protein or nucleotide sequences. Common downstream uses include phylogenetic-tree inference, conservation analysis over alignment columns to identify functionally important residues, homology modelling against a related reference, motif and domain discovery across a protein family, and variant-effect analysis in the context of the conserved structural and functional positions revealed by the alignment.Usage Tips
align_method="auto"is the default and lets MAFFT select an algorithm based on input size. Uselocalpairfor sequences with a single conserved domain flanked by variable regions,globalpairfor full-length homologs of similar length, andgenafpairfor multi-domain sequences separated by long unalignable regions. The*pairvariants run in O(N^2) time and are appropriate for up to a few hundred sequences.max_iterations=0(the default) skips iterative refinement. Raise it to enable the full*-INS-irefinement pipeline when paired with one of the*pairmethods. A value around1000is appropriate for high-accuracy alignments of small to medium datasets.threads=1is the default; raise it on large alignments. MAFFT parallelises both the all-against-all distance computation and the iterative refinement passes, so increasing the thread count yields substantial wall-time reductions on alignments of hundreds of sequences or longer.- Inputs must contain at least two non-empty sequences. The input validator hard-errors otherwise. Auto-generated identifiers default to
seq_0,seq_1, and so on whensequence_idsis omitted. extra_argsaccepts verbatimmafftCLI tokens. Pass any CLI flag not exposed as a typed field through this list (for example["--retree", "3", "--reorder"]to control the guide-tree rebuild schedule). Tokens are inserted before the input FASTA path and take precedence over MAFFT’s own defaults.
Toolkit Notes
These apply to every MAFFT tool in this toolkit (mafft-align).
- Outputs are returned as typed
MSAobjects. Themsafield ofMafftOutputexposes the aligned sequences, their identifiers, alignment dimensions, column-level conservation statistics, and gap-statistics properties. The result serialises to FASTA or A3M through the standard export interface.

RIMD