Raw sequences were aligned against all possible germline VH and JH from IgBLAST. Each sequence was assigned the VH and JH segments that had the highest number of overlapping nucleotides, assuming no deletion or insertion occurred. Since we did not attempt to identify DH at this stage, we defined L as the distance between VH and JH by positioning the germline VH and JH genes on the observed sequence and determining the distance between the end of VH and the beginning of JH. See Figure.
Take for example the following sequences:
>1_11_10_8_27_34_11_8_GO4TH0H01ARBNQ rank=0093876 x=194.0 y=308.0 length=477;IGHV1-18 286 2 1;IGHD6-13 13 0 1;IGHJ6 53 0 1;CARVSSSSWHYYGMDVW;;IGHM 113 0 1
TGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGA GCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAG TGTCTAGCAGCAGCTGGCACTACTACGGT
>45_34_2_1_46_40_7_9_GO4TH0H01BQI53 rank=0047658 x=595.0 y=505.0 length=503;IGHV3-11 279 2 1;;IGHJ2 49 0 1;;;IGHM 99 1 1
TAGTATCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGA ACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGCGAGAGACGATAGCA GTGGCTCCCACGACTATTGGTACTTCGATCTCT…
For example, the first sequence was classified with V=1, J=10 and a distance of 10 nucleotides between them, and the second sequence with V=45 and J=2 and a distance of 13 nucleotides.
In order to indentify sequences originating from the same clone, we first grouped all sequence according to their VH and JH usage and L (the distance between V and J). Since SHM occurring during clonal expansion are not supposed to induce deletion or addition, every clone emerging from the same founder cell should have the same distance between VH and JH.
TGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGA GCACAGCCTACATGGAGCCGAGGAGCCTGAGATCCGACGACACGGCCGCGTATTACTGTGCGAGA AGGGACGATTTTTGGAGTGGTTACTACGGT
TGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGA GCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGA GTGGATAGTTCGGGGAGGTACTACTACGGT
TGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGA GCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGA AGGGACGATTTTTGGAGTGGTTACTACGGT
TGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGA GCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGA GACGCCGATTTTTGGAGTGGTTACTACGGT
For example, this group is constituted from sequences having V=1, J=10 and a distance of 10 nucleotides between V and J germline sequences. Furthermore, the artificial VJ sequence was built from the germline V=1 and J=10, and 10 gaps were added between them.
All the sequences within a group were aligned together with an artificial sequence composed of the germline sequences and gaps between them according to the distance between V and J in a current group.
The sequences were aligned (using MUSCLE 3.6), and a phylogenetic tree was built using one of the following methods (from PHYLIP 3.69 program package); maximum parsimony for groups containing less than 100 sequences and neighbor joining otherwise. The resulting trees were cut into independent trees at branches with more than 4 mutations between 2 sequences. Each sub-tree was defined as a clone.
For example, sequences L_16, L_13 and L_25 were classified as originating from the same clone, whereas L_15 belongs to another clone.