Clustering TCR datasets

Here, we want to analyze common patterns in TCR alpha sequences and specifically look for a recently discovered sequence motif in the CDRs that was published by Mudd, et al.:

SIFNT LYKAGEL CA[G/A/V]XNYGGSQGNLIF

The final results of the COVID19 clustering can be accessed here. The healthy clustering results are accessible here.

Clustering COVID19 data

First, we will have a look at COVID19 datasets. Access the Cluster page and choose “TCR alpha” as Clustering Mode. The datasets table will update with the available datasets. Select the following via the checkbox in the last column:

  • Bacher-2020

  • Bieberich-2021

  • Liao-2020

  • Meckiff-2020

  • Notarbartolo-2021

  • Ramaswamy-2021

  • Sureshchandra-2021

  • Wen-2020

  • ZhangF-2020

  • ZhangJY-2020

Make sure to set appropriate values for sequence identity (90) and coverage (90). The input form should look like this:

../_images/tcr-cluster-form-covid.png

Click “Cluster” and wait for the result to appear, this should only take a few minutes. The results should look like this:

../_images/tcr-cluster-results-covid.png

We can see that many clusters exhibit motifs from invariant TCRs (i.e. MAIT-like and iNKT cells), including the largest one. The second largest cluster however contains the above mentioned public Spike protein targeting motif. Just as with the Search function, additional metadata can be downloaded by clicking on the Download Expanded Results button. There are a few more clusters of interest, which we can find by filtering the table by the expected CDR sequences (in the upper right corner). Note that some of these don’t conform to the motif definition because they are longer:

../_images/tcr-cluster-mudd-covid.png

Clustering healthy data

For comparison, let’s also have a look at healthy (pre-pandemic) data. Alternatively, select the following datasets on the Cluster page:

  • Bacher-2020

  • Gao-2022

  • Luo-2022

  • Notarbartolo-2021

  • Ramaswamy-2021

  • Sureshchandra-2021

  • Wen-2020

  • ZhangF-2020

  • ZhangJY-2020

The input form should look like this:

../_images/tcr-cluster-form-healthy.png

After a few minutes, the following results should appear:

../_images/tcr-cluster-results-healthy.png

Again, we see that most clusters have invariant receptors. This time, no major clusters exhibit the public Spike protein targeting motif. We can find some smaller ones, by filtering the table by the expected CDR sequences. As it turns out, only one cluster contains the correct sequence motif:

../_images/tcr-cluster-mudd-healthy.png