Clustering TCRαβ and GEX

This example shows how to cluster the joint information from TCRαβ and GEX data. We use the human huARdb v2 reference dataset as an example.

Load the reference dataset

1import tcr_deep_insight as tdi
2import torch
3
4gex_reference_adata = tdi.data.human_gex_reference_v2()

Searching for cGxTr clusters without any constraints

Search for TCR clusters without considering disease type or HLA information. The clustering will be performed based on the similarity of TCR sequences and the GEX data.

1tdi_cluster_result = tdi.tl.cluster_tcr(
2  tcr_reference_adata,
3  max_distance=4.,
4  max_cluster_size=100,
5  n_jobs=16
6)

Searching for cGxTr clusters with at least one common HLA alleles

If the HLA information is available in the tcr_adata.obs object, we can include the HLA information in the clustering.

 1# if HLA information is available in the tcr_adata object
 2include_hla_keys = {
 3  'A': ['A_1','A_2'],
 4  'B': ['B_1','B_2'],
 5  'C': ['C_1','C_2']
 6}
 7hla_map = {
 8  key: dict(
 9      zip(
10        range(len(tcr_adata.obs)),
11        tcr_adata.obs.loc[:,val].to_numpy()
12      )
13    ) for key,val in include_hla_keys.items()
14}
15
16tdi_cluster_result_hla = tdi.tl.cluster_tcr(
17  tcr_reference_adata,
18  max_distance=4.,
19  max_cluster_size=100,
20  n_jobs=16,
21  include_hla_keys=include_hla_keys
22)

Searching for cGxTr clusters with unique disease type

if label_key is provided in the tcr_adata.obs object, we can include the disease type information in the clustering. We constrain the TCRs in the same cluster to have the same disease type. See more details in the tcr_deep_insight.tl.cluster_tcr() function documentation.

1tdi_cluster_result_disease = tdi.tl.cluster_tcr(
2  tcr_reference_adata,
3  label_key='disease_type',
4  max_distance=4.,
5  max_cluster_size=100,
6  n_jobs=16
7)

Searching for cGxTr clusters with constrains on TCRs

We can constrain that the TCRs in the same cluster have the same TRBV gene segment or the same CDR3β length.

1tdi_cluster_result_disease = tdi.tl.cluster_tcr(
2  tcr_reference_adata,
3  label_key='disease_type',
4  max_distance=4.,
5  max_cluster_size=100,
6  same_trbv=True,
7  same_cdr3b_length=True,
8  n_jobs=16
9)

We also provide the constrain on the alpha chain, including arguments same_trav, same_cdr3a_length.

Combining the clustering constrains

The constrains can be combined together.

For more information, please refer to the tcr_deep_insight.tl.cluster_tcr() function documentation.