Abstract
Background:
Galectin-3 (LGALS3) is an important glycoprotein involved in the malignant transformation of thyrocytes acting in the extracellular matrix, cytoplasm, and nucleus where it regulates TTF-1 and TCF4 transcription factors. Within LGALS3 gene, a common single-nucleotide polymorphism (SNP) (c.191C>A,
Methods:
A case/control association study in 1223 controls and 1142 unrelated consecutive DTC patients was carried out to evaluate the association between rs4644-P64H and the risk of DTC. We used the nonmalignant cell line Nthy-Ori (rs4644-C/A) and the CRISPR/Cas9 technique to generate isogenic cells carrying either the rs4644-A/A or rs4644-C/C homozygosis. Then, the transcriptome of the derivative and unmodified parental cells was analyzed by RNA-seq. Genes differentially expressed were validated by quantitative reverse transcription PCR and further tested in the parental Nthy-Ori cells after LGALS3 gene silencing, to investigate whether the expression of target genes was dependent on galectin-3 levels.
Results:
rs4644 AA genotype was associated with a reduced risk of DTC (compared with CC, ORadj = 0.66; 95% confidence interval = 0.46–0.93; Pass = 0.02). We found that rs4644 affects galectin-3 as a transcriptional coregulator. Among 34 genes affected by rs4644, HES1, HSPA6, SPC24, and NHS were of particular interest since their expression was rs4644-dependent (CC>AA for the first and AA>CC for the others), also in 574 thyroid tissues of Genotype-Tissue Expression (GTEx) biobank. Moreover, the expression of these genes was regulated by LGALS3-silencing. Using the proximity ligation assay in Nthy-Ori cells, we found that the TTF-1 interaction was genotype dependent.
Conclusions:
Our data show that in thyroid, rs4644 is a trans-expression quantitative trait locus that can modify the transcriptional expression of downstream genes, through the modulation of TTF-1.
Introduction
Genetic studies have shown several loci associated with predisposition to differentiated thyroid carcinoma (DTC) (1 –6). However, these loci could explain only a small part of the attributable risk, and more predisposing factors need to be explored (7). Within this context, galectin-3 (encoded by LGALS3 gene) is an ideal candidate to be evaluated. This protein has a pivotal role in the regulation of basic cellular functions at the nuclear, cytoplasmic, and extracellular level (8). In normal human thyroid, the expression of galectin-3 is low or absent, but it increases with tumor progression as occurs in other types of cancer (9 –18).
For this reason, it was studied as an effective serum biomarker for DTC (19
–21). The overexpression of galectin-3 is one of the drivers of the malignant transformation of several tissues, including thyroid (22
–24). Thus, the activity of galectin-3 drew the interest of many researchers eliciting a large volume of studies mostly aimed at elucidating its role in the extracellular matrix (25,26). However, in the nucleus, galectin-3 affects gene expression by regulating splicing (27,28) and gene transcription (29,30). However, galectin-3 is not a transcription factor (TF), but it binds to specific TFs and modulates their activities (31
–34). In particular, in breast cancer cells, galectin-3 interacts with the CRE/SP1 sites (32), TCF4 (33,34), and with beta-catenin, thereby activating Wnt signaling (35) and the transcription of important cancer genes (32,34,36). Nuclear galectin-3 was shown to be an important mediator of the beta-catenin/Wnt pathway also in thyroid cancer (TC) (37,38). Furthermore, in TC cell lines, galectin-3 interacts with TTF-1 (26) and colocalized with TCF4 to orchestrate the transcription of downstream genes involved in thyroid differentiation and proliferation (39
–41). At codon 64 of galectin-3, there is a common genetic polymorphism (c.191C>A,
The aim of the present work was to evaluate the role of rs4644 as a risk factor for DTC and its effects on the transcriptome in a model of thyrocytes engineered by the CRISPR/Cas9 system.
Materials and Methods
Case/control association study: statistical power, subjects, and genotyping
The case/control association study was carried out in 1142 patients with DTC and 1223 controls presenting consecutively to the University Hospital of Cisanello (Pisa, Italy). Volunteers provided their written informed consent and the study was approved by the local Ethics Committee. The interview of both cases and controls was carried out via a self-administered questionnaire, for collecting the main covariates at the time of blood samples being drawn. The genotyping was performed by the use of TaqMan assay. All the details are reported in the Supplementary File.
Cell cultures and gene editing
The nonmalignant human thyroid cell line Nthy-ori-3-1 (Nthy-Ori; Sigma-Aldrich, Saint Louis, MO) was used for the in vitro assays and for gene editing. These cells, immortalized with SV40, were grown in medium RPMI 1640 supplemented with 10% fetal bovine serum (EuroClone SpA, Milan, Italy).
Nthy-Ori cells have a heterozygote genotype C/A allowing conversion into the homozygote genotypes (either A/A or C/C), with the CRISPR/Cas9 gene editing system in only one step. Gene edited cells were named ORI-AA, ORI-CC, and ORI-wild type (WT). All the experimental details for the creation of engineered cell lines are reported in the Supplementary File.
RNA-seq analysis and validation
Two independent experiments were performed. A total of 400 ng of RNA for each cell line was used to perform RNA-seq for profiling and comparing transcripts (by Eurofins Genomics S.r.L., Vimodrone, Milan). The reads from the three samples were aligned using human genome hg38/GRC38, UCSC as reference (ANNOTATIONS: Gencode v22, Ensembl 80).
The statistically different RNA-seq genes were also validated by quantitative reverse transcription PCR (RT-qPCR) using beta-actin, GAPDH, and HPRT1 as reference genes. Data were analyzed with Bio-Rad CFX Manager 3.1. The primer sequences are reported in Supplementary Table S1. All experiments were repeated three times.
Silencing of LGALS3 and Western blot analysis
The same genes analyzed by RT-qPCR in ORI-CC, ORI-AA, and ORI-WT cells were tested in unmodified Nthy-Ori cells following LGALS3 gene silencing. Primers are reported in Supplementary Table S1. The silencing of LGALS3 was performed by using 10nM of a pool of specific siRNAs for LGALS3 (Qiagen, Valencia, CA) (Supplementary Table S1). AllStars siRNA (SI03650318; Qiagen) was used, at the same concentration, as negative control. Neon Transfection System (Thermo Fisher Scientific, Waltham, MA) was used for siRNA electroporation following the manufacturer's protocol. Mouse primary monoclonal anti-LGALS3 (clone 1C1B2, 1:2000; Proteintech, Rosemont) and anti-actin (Clone C4, 1:5000; Millipore, MA) were used as primary antibodies, while IgG-HRP (1:5000; Santa Cruz Biotechnology, Inc., Santa Cruz) was used as a secondary antibody to test the silencing.
Furthermore, we evaluated the level of galectin-3 in ORI-AA and ORI-CC samples, as previously described (49). Anti-galectin-3 antibody (anti-galectin-3 (H160), SC-20157; Santa Cruz Biotechnology, Inc.) was used at 1:1000 dilution. The immunocomplexes were detected using an HRP-conjugated secondary antibody (donkey anti-rabbit 1:10,000), and the immunoblots were developed by using the ECL detection system. Details of the experimental methods are provided in the Supplementary File.
Proximity ligation assay
Protein/protein interaction was evaluated by proximity ligation assay (PLA, Duolink kit; Sigma Aldrich, MO) using specific antibodies to detect protein targets, and specific DNA primers linked to the antibodies. The multistep assay included the hybridization phase followed by DNA amplification with fluorescent probes. The amplified DNA was detected as dots, so a few interacting molecules can produce a visible signal, making the assay highly sensitive. We validated the interaction between galectin-3 and TTF-1 in ORI-AA, ORI-CC, and ORI-WT cells. Primary antibodies against TTF-1 (rabbit 1:50; Cell Signaling Technology) and galectin-3 (clone 1C1B2, Mouse 1:100; Proteintech) were used. Dots were counted in 25 cells per cell line in two independent experiments.
In silico analyses
Five hundred and seventy-four thyroid samples reported in the Genotype-Tissue Expression (GTEx) portal were evaluated to determine any correlation between RNA-seq or RT-qPCR data and the different tissue-specific expressions of genes. Relevant parameters were the normalized effect size (NES), that is, the slope of the linear regression calculated as the effect of the alternative allele, ALT, compared with the reference allele, REF, in the human genome, and the statistical significance of the regression analysis. The transcription factor binding sites within the promoters of the genes of interest were investigated by Gene Promoter Miner tool.
Statistical analyses
The departure from the Hardy–Weinberg equilibrium was analyzed with the χ2 goodness-of-fit test (one degree of freedom, type-I error = 0.05). The association analysis was performed with multivariate logistic regression analysis. RNA-Seq reads were aligned and analyzed with TopHat and Cufflinks package. The genes differentially expressed were detected with cummeRbund package by applying the false discovery rate correction. Details of these analyses are provided in the Supplementary File.
Results
In Supplementary Figure S1, the flowchart summarizes how the study was carried out. The main results of the case/control association study are reported in Table 1. The population respected the Hardy–Weinberg equilibrium (p = 0.36), and the controls (composed mainly by blood donor volunteers) showed a statistically significant older age than cases (52.98 vs. 42.01) because the control group was mainly composed of volunteer blood donors. Increased risk of DTC was associated with sex (females vs. males: ORadj = 2.93; 95% confidence interval [CI = 2.53–3.79]; Pass < 10−6) and smoking habit (smokers vs. nonsmokers: ORadj = 1.34 [CI = 1.13–1.59]; Pass = 8.00 × 10−4), whereas no differences were found by body mass index.
Main Covariates of the Population Analyzed in the Case/Control Association Study with Their Odd Ratios and 95% Confidence Intervals
According to the codominant model, the table reports the calculated ORs and CI of the LGALS3-rs4644 genotypes C/A and A/A versus the reference (the common homozygotes C/C) adjusted for the covariates and the results of the MAX trend tests. Association analyses were provided for the total DTC samples and for the subsets of PTC and FTC histotypes.
BMI, body mass index; CI, 95% confidence interval; DTC, differentiated thyroid carcinoma; FTC, follicular thyroid carcinoma; ORs, odd ratios; PTC, papillary thyroid carcinoma.
The uncommon homozygotes AA showed a statistically significant decreased risk of DTC (ORadj = 0.66 [CI = 0.46–0.93]; Pass = 0.02). The heterozygotes showed an intermediate risk with the MAX trend-test statistically significant for the additive model (Ptrend = 4.21 × 10−3). The papillary thyroid carcinoma (PTC) and follicular thyroid carcinoma (FTC) histotypes showed a similar trend with only PTC being statistically significant.
We studied whether the LGALS3 rs4644-P64H variants could affect gene transcription. Therefore, Nthy-Ori cells carrying the desired genotypes were engineered by CRISPR/Cas9 (Fig. 1). As control, a replicate of Nthy-Ori cells was subjected to the same process but without the administration of the donor DNA. Figure 2 shows the cDNA sequencing of the ORI-WT (Fig. 2A), of the ORI-CC (Fig. 2B), and of the ORI-AA (Fig. 2C). As shown in Fig. 2D, the splicing was unaffected by the persisting LoxP site within the intron III after the activation of the Cre recombinase. The same sequencing confirmed also the expression of the correct genotype. Furthermore, Western blot showed that the total amount of galectin-3 was expressed at the same level in both the derivative cell lines (Supplementary Fig. S2).

Scheme of the vector used for LGALS3 gene editing, double nickase CRISPR/Cas9 vectors (

Sanger sequencing electropherograms showing the gene editing (
The global transcriptomes of ORI-CC and ORI-AA cells were analyzed by the RNA-seq. About 99% of the reads were mapped to the human reference Homo sapiens genome assembly GRCh38 (hg38) (Genome Reference Consortium;

Volcano plot showing the relative expression (fold-change, Y-axis) of the genes measured in the ORI-AA and in ORI-CC cells and the statistical significance of the comparisons (X-axis). Genes showing a statistically significant different expression level between ORI-CC and ORI-AA cells are circled.
List of 39 Differentially Expressed Genes in ORI-AA and ORI-CC Cells Selected for Further Validation and Experimentation According to the Criteria Specified in the Text
Group A and Group B included genes overexpressed in ORI-AA or ORI-CC. Genes in Group C are involved in thyroid carcinogenesis, in group D are listed genes with the highest differential expression. For each sample, the genes are listed with the expression values in FPKM.
These genes showed an almost negligible expression not allowing to perform any calculation.
The values have been replaced with a minimal number (10−6) to allow a rough estimate (denoted with the signs “<” or “>”) of the fold-change.
FC, fold-change; FPKM, fragment per kilobase per million mapped reads.
To increase the power of the study in detecting truly positive results, for replication we included extra 15 low-confidence genes. The first eight were chosen because they showed both a statistical significance at a nominal level of 0.05 and were known to be involved in thyroid carcinogenesis (Table 2, Group C: NOTCH3, DNMT3B, CTGF, CRYAB, IL1A, TERC, TNFAIP6, and DKK1). The remaining seven (Table 2, Group D: C2orf16, CTB-32H22.1, DUSP6, HSPA6, NHS, SPC24, and SPATA5) were not statistically significant at any level but showed the highest differential expression (fold-change >3.00 or <−3.00, arbitrary threshold). The complete comparisons among cell lines are reported in Supplementary Table S2. The selected 39 genes underwent validation with RT-qPCR confirming the results of the RNA-seq analysis, with the exception of 5 genes in groups C and D, that is, HSPB8, TNFAIP6, CTB-32H22.1, DUSP6, and SPATA5 that displayed an opposite trend compared with the RNA-seq and were not considered further (Supplementary Table S3). To evaluate whether the expression of the remaining 34 genes could correlate with the number of variant alleles, a model of simple linear regression was used. The gene expression level was considered the dependent variable and the ORI-AA, ORI-WT, and ORI-CC cell lines (carrying 2, 1, and 0 variant alleles, respectively) as the independent quantitative variable. This analysis showed that the expression of 29 genes correlated with the allele dosage (Table 3), being the ORI-WT intermediate between ORI-AA and ORI-CC cells. The expression of the remaining five genes (ANO2, GABBR1, RP11-672A2.1, C2orf16, and NHS) in ORI-WT was similar to that found in ORI-AA or ORI-CC, suggesting that, in some cases, the state of heterozygosity behaves like homozygosity, as generally occurs in the dominant traits.
List of 34 Genes Analyzed by Real-Time qPCR
The statistical significance referred to the linear regression model using the allele dosage of rs4644 as independent variable being ORI-AA = 0, ORI-WT = 1, and ORI-CC = 2 variant alleles. The model is reported as intercept, slope, and standard error (se) of the slope.
se, standard error; WT, wild type.
The parental Nthy-Ori cells were treated with control siRNA (siCtrl) or silenced with siLGALS3 (Supplementary Fig. S3), and the mRNA expression of the 34 target genes was evaluated by RT-qPCR. Following LGALS3 gene silencing, 22 genes showed a statistically significant differential expression after the correction for multiple testing, whereas 2, CTGF and RNU1-27P, were significant at the nominal level of p = 0.05 (Fig. 4).

Genes differentially expressed between cells treated with siCtrl and siLGALS3 (*q-value <0.05; **q-value <0.01; ***q-value <0.001).
When the promoters of the deregulated genes were analyzed, GPMiner showed 11 TFs in common (Supplementary Table S4). Among them, TTF-1 and TCF4 were the most interesting given their known interaction with galectin-3. By PLA, we showed that the interaction between TTF-1 and galectin-3 was genotype dependent with average dots of 3.98 ± 0.36 for ORI-AA and 1.66 ± 0.65 for ORI-CC (p-value = 0.04) (Supplementary Fig. S4). We evaluated further the relationship between rs4644-P64H genotypes and the expression of the 22 target genes in 574 thyroid tissues using the GTEx project data set. Our in vitro findings were confirmed for HES1 (p-value = 0.029; NES = −0.083), HSPA6 (p-value = 0.0097; NES = 0.095), NHS (p-value = 0.027; NES = 0.087), and SPC24 (p-value = 0.031; NES = 0.098) (Supplementary Fig. S5).
Discussion
We analyzed the rs4644 single-nucleotide polymorphism (SNP) that is completely conserved across phylogenetically distant organisms and predicted by previous studies (42 –44) to affect the activity of the encoded galectin-3. We found that the A-allele (H64) was associated with a statistically significant decreased risk of DTC following adjustment for covariates. This SNP was found to be associated with the risk of prostate, cervical, breast, and gastric carcinoma, and glioma (35,37 –39,50,51). In particular, in prostate cancer and glioma, the A-allele showed a protective effect as observed here in DTC (48,51).
Since nuclear galectin-3 coregulates gene expression, we determined whether there could be differences in gene regulation due to rs4644, providing a rationale for the predisposition to DTC. We found that 39 genes were differentially expressed; most of them were dependent on the allele dosage. These genes belong to processes (apoptosis: BIRC3, BAG3, HSPA1B; angiogenesis: C3, HSPB1; adhesion: C3, SPP1, EGR1) or signaling pathways (beta-catenin/Wnt: NFATC4, C3; MAPK: DUSP6, IL1A, HSPA6, HSPA1B, HSPB1; PI3K-Akt: SPP1; TNF: BIRC3) often deregulated in tumors.
The CC-genotype was associated with increased expression of genes related to the initiation and progression of pancreas ductal adenocarcinoma (HES1) (52), endometrial carcinoma (ANO2) (53), prostate/skin/kidney cancer (EGR1) (54,55), and thyroid carcinoma (DUSP6) (56,57). The protective AA-genotype was associated with increased expression of genes with antimalignant effects in colorectal cancer (GABBR1) (58) and TC (NOTCH3) (59), or with tumor suppressor activities (CRYAB and IL1A) (60 –62). Among the 39 genes, 22 were shown to be also dependent upon the expression level of galectin-3 in gene silencing experiments.
The analysis of the promoter regions of the 22 genes showed that there are common binding sites for 11 transcription factors, among which were TTF-1 and TCF4. These are well-known TFs interacting with galectin-3 in the nucleus of thyrocytes (39). These previous observations were further corroborated by the PLA analysis, showing an increase in interaction between TTF-1 and galectin-3 in ORI-AA compared with the ORI-CC cells, providing the mechanism for the transcriptomic differences observed in the isogenic cell lines. Kim et al. (50) demonstrated that the germ line variant H64 increases nuclear accumulation of beta-catenin promoting TCF4 transcriptional activity, reinforcing the notion that also TCF4 may be involved in the interaction with galectin-3.
Further analysis, using the GTex portal of 574 thyroid tissues, confirmed that the rs4644 A-allele was associated with reduced expression of HSPA6, NHS, and SPC24, while the C-allele with a higher expression of HES1 supporting that the SNP behaves as a truly trans-eQTL. HSPA6 is a molecular chaperone involved in the protection of the proteome from stress, and the folding and transport of newly synthesized polypeptides. A study of 552 DTC cases and 752 controls showed that the SNP rs9427401 within HSPA6 was associated with increased risk of DTC (63). SPC24 is involved in the mitotic checkpoint pathway and TC tissues showed high levels of the protein (64). SPC24 knockdown in anaplastic TC cell lines inhibited cell growth and invasiveness. In nude mice xenograft models, these cells formed smaller tumors compared with SPC24 expressing cells (64). HES1 encodes for a transcriptional repressor involved in cell differentiation, proliferation, invasion, metastasis, and progression of several types of cancer, including TC (65,66). In thyroid-derived TPC-1, BCPAP, and 8505C cell lines, it has been shown that Hes1 belongs to the Notch signaling pathway, one of the most important pathways for the control of differentiation, proliferation, and apoptosis of thyrocytes (67,68). NOTCH3 expression, the most important regulator of the Notch pathway in thyroid (69), was found to be dependent on LGALS3 genotypes and galectin-3 levels in our study.
In conclusion, LGALS3 genotypes are associated with susceptibility to DTC. LGALS3 genotypes lead to changes in the expression of downstream genes. H64 and P64 galectin-3 have different binding affinities for transcription factors, in particular for TTF-1 and TCF4.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This work was supported by the regular budget of the University of Pisa (Ex60%).
Supplementary Material
Supplementary File
Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Figure S5
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
