Khai Pang Leong1,3, Mei Yun Yong1, Liuh Ling Goh2, Chia Mun Woo1, Chia Wei Lim3, Ee Tzun Koh1

1Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Tan Tock Seng, Singapore
2Molecular Diagnostic Laboratory, Tan Tock Seng Hospital, Tan Tock Seng, Singapore
3Personalized Medicine Service, Tan Tock Seng Hospital, Tan Tock Seng, Singapore

Keywords: Genetics, rheumatoid arthritis, sequencing


Objectives: This study aims to uncover variants of large effect size and allele frequency below 5% by sequencing all extant genes associated with rheumatoid arthritis (RA) in a homogeneous patient cohort.

Patients and methods: This retrospective study was conducted between January 2001 and December 2017. We selected Chinese RA patients positive for anti-citrullinated peptide antibody (ACPA). All the 128 known candidate genes identified through genome-wide association studies were sequenced in 48 RA patients (15 males, 33 females; mean age 53.32±8.98 years; range, 32 to 75 years) and 45 controls (11 males, 34 females; mean age 32.18±9.54; range, 21 to 57 years). The exonic regions of these genes were sequenced. The resultant data were analyzed for association using single variant association and pathway-based association enrichment tests. The genetic burden due to low-frequency variants was assessed with the C-alpha test. The candidate variants that showed significant association were validated in a larger cohort of 500 RA cases (71 males, 429 females; mean age 48.6±12.2 years; range, 24 to 92 years) and 500 controls (66 males, 434 females; mean age 32.3±10.1 years; range, 21 to 73 years).

Results: Thirty-nine variants in 21 genes were identified using single variant association analysis and C-alpha test, with stepwise filtering. Among these, the missense variant in interleukin-6 signal transducer (IL-6ST) 5:55260065 (p.Cys47Phe) was significantly associated with RA in Chinese patients in Singapore.

Conclusion: Our results suggest that a mutation in IL-6ST (5:55260065) confers risk of RA in Chinese patients in Singapore.


Rheumatoid arthritis (RA) is a chronic systemic inflammatory disease manifesting as joint pain, swelling, and destruction. The discovery of the association of RA and human leukocyte antigen (HLA) Dw4 suggested a genetic basis for the pathogenesis.[1] Subsequently, most of the disease-susceptibility genetic variants have been identified through genome-wide association studies (GWASs). Hitherto, more than 100 genes have been implicated in the pathogenesis of RA.[2]

The heritability of RA is estimated to be 55%,[3] but only 65% of this, or 36% of the total disease liability, can be explained by summing the effects of the single nucleotide polymorphisms (SNPs).[4] This missing heritability has been explained by rare variants of large effect size.[5] However, sequencing the exons of 25 genes in six autoimmune diseases failed to reveal any disease-associated rare variants.[6] Diogo et al.[7] sequenced the exons of 25 RA-associated genes in 500 RA cases and 650 controls of European ancestry and found two susceptibility variants, interleukin (IL)-2 receptor alpha and IL-2 receptor beta. Bang et al.[8] failed to identify any rare variants after sequencing the exons of 398 genes, including 106 known RA loci, in 1,217 Korean RA patients and 717 controls. Li et al.[9] sequenced the exomes of 58 RA patients and 66 controls and identified five susceptibility genes, NCR3LG1, RAP1GAP, CHCHD5, HIPK2, and DIAPH2.

We decided to sequence the exons of all 128 candidate genes known to be associated with RA together with their flanking non-coding regions, in a homogeneous patient cohort. We were interested in identifying novel risk variants of large effect size. We obtained these genes for targeted sequencing from a meta-analysis of 22 studies[10] and the RAvariome database.[11] Thus, in this study, we aimed to uncover variants of large effect size and allele frequency below 5% by sequencing all extant genes associated with RA in a homogeneous patient cohort.

Patients and Methods

This retrospective study was conducted at Tan Tock Seng Hospital between January 2001 and December 2017. The RA patients and healthy controls were derived from our prospective RA Registry and Healthy Control Tissue Bank, respectively. All participants were at least 21 years of age at study entry and fulfilled the 1987 American College of Rheumatology or the American College of Rheumatology/ European League Against Rheumatism 2010 classification criteria for RA.[12,13] We selected patients of Chinese descent who were positive for the anti-cyclic citrullinated peptide antibody (ACPA) to minimize heterogeneity.[14] We sequenced the 128 genes in 48 RA patients (15 males, 33 females; mean age 53.32±8.98 years; range, 32 to 75 years) (group 1A) and 45 healthy controls (11 males, 34 females; mean age 32.18±9.54; range, 21 to 57 years) (group 1B) to discover potential susceptibility variants. Subsequently, we verified these RA-susceptibility single nucleotide variants (SNVs) by genotyping an independent group of 500 Chinese subjects with RA (71 males, 429 females; mean age 48.6±12.2 years; range, 24 to 92 years) (group 2A) and 500 healthy controls (66 males, 434 females; mean age 32.3±10.1 years; range, 21 to 73 years) (group 2B). The study protocol was approved by the National Healthcare Group Ethics Committee (NHG DSRB 2014/01141). A written informed consent was obtained from each participant. The study was conducted in accordance with the principles of the Declaration of Helsinki.

We calculated the power of our study with 48 patients and 45 controls and significance level of 5%.[15] If the susceptibility variant has a prevalence of 5% in the controls and 20% in the patients, the power is 62.4%. If the variant has a prevalence of 5% in the controls and 30% in the patients, the power is then 92%. If the variant has a prevalence of 2% in the controls and 20% in the patients, the power is 83.1%.

Library preparation of the genomic deoxyribonucleic acid was performed with the NimbleGen SeqCap EZ kit (Roche, Penzberg, Germany). The targeted next-generation sequencing (NGS) panel was designed to capture the exons and the 5’ and 3’ untranslated region (UTR) up to 2kb upstream (Table 1). We sequenced the coding exons and flanking noncoding regions using Illumina MiSeq (San Diego, CA, USA). Amplification of the libraries was performed and assessed using a bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA) and Qubit fluorometer (ThermoFisher Scientific, Waltham, MA, USA). The indexed samples were pooled at a final concentration of 8 pmol/L, and then parallel sequenced using MiSeq.

The resulting FASTQ files were aligned to the human reference genome (hg19), followed by duplicate removal by Picard Tools version 1.129. The Genome Analysis Toolkit version 3.3 was used for local realignment around indels, base recalibration, and variant recalibration and genotyping. Poor quality reads were removed and variants were selected based on quality score >30, coverage >10, and Hardy-Weinberg equilibrium (HWE) >0.001. Variants were annotated with dbSNP, UCSC, ClinVar, OMIM, KEGG, Pfam, Ensembl, ESP, 1000G, CADD, Polyphen, SIFT, ENCODE, HPRD, COSMIC, GERP, FitCons, and VISTA database using GEnome MINIing (GEMINI). The single-variant GWAS was performed using standard allelic testing implemented in PLINK tool. For multiple variant testing, the C-alpha variance-component test implemented in GEMINI was used with multiple allele frequencies thresholds.[16]

Candidate variants were prioritized through association analysis, selecting those with minor allele frequency (MAF) below 5% or predicted to be protein altering (i.e., missense, nonsense, frameshift or canonical splice site changes). Validation was performed in 500 cases and 500 controls using Sequenom’s MassARRAY system (Agena Biosciences, San Diego, CA, USA). One SNP, rs11567882 (GATA3), was genotyped using TaqMan (Applied Biosystems, Foster City, CA, USA) in the same set of 1,000 samples.

Statistical analysis

We calculated the power of our study with 48 patients and 45 controls and significance level of 5%.[15] If the susceptibility variant has a prevalence of 5% in the controls and 20% in the patients, the power is 62.4%. If the variant has a prevalence of 5% in the controls and 30% in the patients, the power is then 92%. If the variant has a prevalence of 2% in the controls and 20% in the patients, the power is 83.1%.


The overall target enrichment and NGS yielded an average of 1,472,141 reads; 88% of these reads mapped to the targeted regions. The mean read length was 148.57 and GC content was 44.34%. After excluding variants that were off target, coverage lower than 10x and quality score below 30, we obtained a set of 3,512 variants. Among these, 2,193 (62.4%) were low-frequency variants with MAF <5%; of which 1,237 (35.2%) were novel to the 1,000 genome project dataset.

We used three strategies for identifying potential susceptibility variants. First, we examined 148 variants with significant association (p<0.05); 70 of them were low-frequency variants of MAF <5%. Fifty-three of the variants reside in the non-coding region, upstream of 5’UTR or 3’UTR. Two long non-coding ribonucleic acid variants and six synonymous variants were excluded. We further removed 32 HLA-DRB1 variants for subsequent analysis, except for rs9269688 (odds ratio [OR] 2.49), rs3180268 (OR 2.97), and rs77637983 (OR 0.43). Second, we conducted a pathway enrichment analysis using KEGG. Loss-of-function variants were identified in HLA-DRB1 rs9269957 and rs9269958 of the control group. We did not observe any other pathway for coding variants for RA in which a significant enrichment existed at p<0.05. Third, we investigated gene-based tests for the potential involvement of rare variants and RA susceptibility by using the C-alpha test. We identified an association between low-frequency variants and RA in five genes with p< 0.05. There were nine non-synonymous low-frequency variants associated with RA. Other than PXK rs199881366, the other variants appeared to be protective. Thirty-nine low-frequency variants with significant association were identified for subsequent validation (Table 2).

In group 2A of 500 ACPA-positive RA patients, 85.5% were female, the baseline disease activity score-28 was 3.11±1.24, and 91.33% were positive for the rheumatoid factor. Group 2B consisted of healthy controls, of whom 13.2% were female.

Out of the 39 SNVs shortlisted for validation, 18 failed the MassARRAY multiplex assay design. Twenty-two SNVs were validated in 500 cases and controls, with 21 SNVs genotyped using iPLEX mass spectrometry (Agena Bioscience, San Diego, CA, USA) and one SNP using TaqMan chemistry (Table 3). On the MassARRAY system, results for three HLA-DRB1 SNVs and AHNAK2 were unsatisfactory due to cluster skewing and inefficient primer binding. Eight SNVs were found to be monomorphic. One SNV showed statistically significant association: the missense variant in IL-6 signal transducer (IL-6ST) rs777853685 (p.Cys47Phe) (p=0.0194).


We sequenced 128 known RA susceptibility genes to determine if there are low-frequency variants within the coding and flanking noncoding regions and found one novel variant in IL-6ST. This gene was first identified as a susceptibility factor through meta-analysis.[17] Our variant is ~171kb away from the previously reported rs6859219.

The IL-6 is produced in the synovium of RA patients.[18,19] Variants in the IL-6 gene are associated with RA. IL-6ST is a 130 kDa signal-transducing component of the IL-6 receptor for IL-6. It is also involved in the signalling of ciliary neurotrophic factor, leukemia inhibitory factor, and oncostatin M. IL-6ST variant at position 5:55260065 (rs777853685) has a low population frequency of 0.3% in Genome Aggregation Database and multiple lines of computational evidence support a deleterious effect on gene product with Deleterious Annotation of Genetic Variants score of 0.8728. The alteration at this position represents a polar amino acid (cysteine) substitution with a non-polar hydrophobic residue (phenylalanine), which is likely to impact secondary protein structure. A pLI 0.998 score indicates that the gene is extremely intolerant to loss of function mutation. A gain-of-function variant of the other 80 kDa component of the IL-6 receptor is associated with RA.[20] How a loss-of-function mutation in IL-6ST contributes to the pathogenesis of RA is unknown.

The problem of the missing heritability of RA remains unresolved. Alternative explanations are that the rare variants lie outside the known susceptibility genes in the intronic regions, or that we have underestimated the contribution of the common variants (necessitating re-examination of the role of twin studies or re-calculation of the contribution of SNPs).[21] Beyond genetics, the pathogenesis of RA remains unresolved after decades of devoted research.[22]

A shortcoming of this study is the small number of patients in the discovery set. The strengths of this study are the single ethnicity of the research participants and the uniform ACPA status.

In conclusion, our results suggest that a mutation in IL-6ST 5:55260065 confers risk of RA in Chinese patients in Singapore.

The TTSH Rheumatoid Arthritis Study Group

Angela Li-Huan Chan, Grace Yin Lai Chan, Madelynn Tsu-Li Chan, Faith Li-Ann Chia, Hiok Hee Chng, Choon Guan Chua, Hwee Siew Howe, Ee Tzun Koh, Li Wearn Koh, Kok Ooi Kong, Weng Giap Law, Samuel Lee Shang Ming, Khai Pang Leong, Tsui Yee Lian, Xin Rong Lim, Jess Mung Ee Loh, Mona Manghani, Justina Wei Lynn Tan, Sze-Chin Tan, Teck Choon Tan, Claire Teo Min-Li, Bernard Yu-Hor Thong.

Conflict of Interest

The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Financial Disclosure

This work is supported by the research grants from the National Healthcare Group Small Innovative Grant (SIG/15036) and Singapore National Medical Research Council (NMRC/CG/017/2013).


We thank Ms Jocelyn Gay and Ms Joo Yong Ong for data collection.


  1. Stastny P. Mixed lymphocyte cultures in rheumatoid arthritis. J Clin Invest 1976;57:1148-57.
  2. Wellcome Trust Case Control Consortium. Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661-78.
  3. MacGregor AJ, Snieder H, Rigby AS, Koskenvuo M, Kaprio J, Aho K, et al. Characterizing the quantitative genetic contribution to rheumatoid arthritis using data from twins. Arthritis Rheum 2000;43:30-7.
  4. Stahl EA, Wegmann D, Trynka G, Gutierrez-Achury J, Do R, Voight BF, et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 2012;44:483-9.
  5. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747-53.
  6. Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 2013;498:232-5.
  7. Diogo D, Kurreeman F, Stahl EA, Liao KP, Gupta N, Greenberg JD, et al. Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. Am J Hum Genet 2013;92:15-27.
  8. Bang SY, Na YJ, Kim K, Joo YB, Park Y, Lee J, et al. Targeted exon sequencing fails to identify rare coding variants with large effect in rheumatoid arthritis. Arthritis Res Ther 2014;16:447.
  9. Li Y, Lai-Han Leung E, Pan H, Yao X, Huang Q, Wu M, et al. Identification of potential genetic causal variants for rheumatoid arthritis by whole-exome sequencing. Oncotarget 2017;8:111119-29.
  10. Okada Y, Wu D, Trynka G, Raj T, Terao C, Ikari K, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 2014;506:376-81.
  11. Nagai Y, Imanishi T. RAvariome: a genetic risk variants database for rheumatoid arthritis based on assessment of reproducibility between or within human populations. Database (Oxford) 2013;2013:bat073.
  12. Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-24.
  13. Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO 3rd, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum 2010;62:2569-81.
  14. Viatte S, Massey J, Bowes J, Duffus K; arcOGEN Consortium, Eyre S, Barton A, et al. Replication of Associations of Genetic Loci Outside the HLA Region With Susceptibility to Anti-Cyclic Citrullinated Peptide-Negative Rheumatoid Arthritis. Arthritis Rheumatol 2016;68:1603-13.
  15. Ryan TP. Sample size determination and power. Hoboken: John Wiley & Sons, Inc.; 2013.
  16. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an unusual distribution of rare variants. PLoS Genet 2011;7:e1001322.
  17. Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet 2010;42:508-14.
  18. Houssiau FA, Devogelaer JP, Van Damme J, de Deuxchaisnes CN, Van Snick J. Interleukin-6 in synovial fluid and serum of patients with rheumatoid arthritis and other inflammatory arthritides. Arthritis Rheum 1988;31:784-8.
  19. Swaak AJ, van Rooyen A, Nieuwenhuis E, Aarden LA. Interleukin-6 (IL-6) in synovial fluid and serum of patients with rheumatic diseases. Scand J Rheumatol 1988;17:469-74.
  20. Marinou I, Walters K, Winfield J, Bax DE, Wilson AG. A gain of function polymorphism in the interleukin 6 receptor influences RA susceptibility. Ann Rheum Dis 2010;69:1191-4.
  21. Golan D, Lander ES, Rosset S. Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci U S A 2014;111:E5272-81.
  22. Boissier MC, Biton J, Semerano L, Decker P, Bessis N. Origins of rheumatoid arthritis. Joint Bone Spine 2020;87:301-6.