Genetic Variants in Circadian Rhythm Genes and Self-Reported Sleep Quality in Women with Breast Cancer

Introduction: Women diagnosed with breast cancer (BC) are at increased risk of sleep deficiency. Approximately 30–60% of these women report poor sleep during and following surgery, chemotherapy, radiation therapy, and anti-estrogen therapy. The purpose of this study was to examine the relationship between genetic variation in circadian rhythm genes and self-reported sleep quality in women with BC. Methods: This cross-sectional study recruited women with a first diagnosis of breast cancer at five sites in Nebraska and South Dakota. Sixty women were included in the study. Twenty-six circadian genes were selected for exome sequencing using the Nextera Rapid Capture Expanded Exome kit. 414 variants had a minor allele frequency of ≥5% and were included in the exploratory analysis. The association between Pittsburgh Sleep Quality Index (PSQI) score and genetic variants was determined by two-sample t-test or ANOVA. Results: Twenty-five variants were associated with the PSQI score at p < 0.10, of which 19 were significant at p<0.05, although the associations did not reach statistical significance after adjustment for multiple comparisons. Variants associated with PSQI were from genes CSNK1D & E, SKP1, BHLHE40 & 41, NPAS2, ARNTL, MYRIP, KLHL30, TIMELESS, FBXL3, CUL1, PER1&2, RORB. Two genetic variants were synonymous or missense variants in the BHLHE40 and TIMELESS genes, respectively. Conclusions: These exploratory results demonstrate an association of genetic variants in circadian rhythm pathways with self-reported sleep in women with BC. Testing this association is warranted in a larger replication population.


Introduction
Breast cancer remains the second leading cause of cancer deaths among women [1]. While overall survivorship has increased over time, sleep deficiency is one of the most frequent and distressing symptoms reported by women with breast cancer and has a negative impact on quality of life and functional status [2, 3]. About 30-60% of women with breast cancer report problems sleeping at diagnosis and the percent increases during chemotherapy treatments [4,5]. One of the main adverse events from aromatase inhibitors that lead to drug discontinuance is sleep disorders [6]. Several predictors of sleep deficiency have been identified but mechanisms responsible for poor sleep in patients with cancer are poorly understood [7,8].
Significant heritability of sleepiness, usual bedtime, and usual sleep duration has been discovered [9], which suggests that genetic factors may make some individuals more susceptible to sleep disturbance. A series of publications detail associations between cytokine gene variations and self-reported sleep or symptom clusters that included sleep in patients with cancer [10][11][12][13][14]. Also, evidence suggests cytokine dysregulation is associated with sleep disturbance in humans [15].
Circadian clocks synchronize physiological and behavioral rhythms with time. Dysregulated expression of circadian clock-related genes is greatly affected by polymorphic variants and has been associated with cancer [16]. An interesting report by Truong and team [17] examined breast cancer risk, night work, and circadian clock gene polymorphisms. The team examined polymorphisms from 577 validated single nucleotide polymorphisms (SNPs) in 23 circadian clock genes in a large sample of breast cancer cases and controls. Two SNPs in retinoic acid receptor-related orphan receptor (RORA; rs1482057 and rs12914272) were associated with breast cancer in the whole sample and among post, but not pre-menopausal women. Authors summarize that the results support the hypothesis that circadian clock gene variants modulate breast cancer risk.
Little attention, however, has focused on genetic associations between circadian clock genes and sleep deficiency in patients with cancer. Two systematic reviews [18,19] summarize genomic variants associated with cancerrelated fatigue but no circadian clock genes are included.
Based on this knowledge, the purpose of this exploratory study was to analyze correlations between self-reported sleep index values of sleep quality and genetic variants in 26 circadian clock genes in women with breast cancer.

Methods Design
A cross-sectional feasibility study design was used. The parent study examined data from the Breast Cancer Collaborative Registry (BCCR) questionnaire to understand risk factors predicting sleep quality in patients with breast cancer [20].

Study population
The BCCR was used to locate cases collected by UNMC/Nebraska Medicine, Omaha, NE from January 2008 to January 2017. Inclusion criteria in the parent study were: 1) women with a first breast cancer diagnosis; and 2) at any phase of the cancer trajectory. Additional inclusion criteria for this exploratory study included: 3) completed the Pittsburgh Sleep Quality Index (PSQI) in the BCCR questionnaire and 4) had a blood sample that had been analyzed using exome sequencing. Exclusion criteria were those: 1) diagnosed with recurrent breast cancer, and 2) males. The Institutional Review Board (IRB) of the University of Nebraska Medical Center approved the study. At enrollment, patients provided informed consent for use of the data in clinical studies. Women were invited to participate during routine oncology appointments.

Breast Cancer Collaborative Registry (BCCR) Questionnaire
The BCCR, which is a part of the integrated Cancer Repository for Cancer Research (iCaRe2), was developed in collaboration with breast cancer experts and research questions were standardized to satisfy the needs of all the centers [21]. The questionnaire contains standard data to provide a comprehensive review of the patient's demographic, medical, tumor, lifestyle, environmental, quality of life, and sleep quality that could influence breast cancer diagnosis and survivorship. Demographic data include variables such as participant's age, race/ethnicity, marital status, and educational status. Medical data include height/weight/BMI and a list of chronic conditions but no comorbidity index; gynecologic data such as menstrual status, pregnancy, breast-feeding, and birth control; and breast cancer data such as therapies received, functional changes, and symptoms since surgery or completing therapy. Tumor data include stage and receptor status. Lifestyle data include history of smoking, alcohol consumption, and physical activity. Environmental factors include annual household income and history of night or rotating shiftwork. Measures of physical and mental health status and subjective sleep quality complete the questionnaire. More information about the BCCR is published [20]. All participants completed the BCCR questionnaire either at a clinic appointment or at home and returned it by United States Postal Service.

Sleep
Subjective sleep quality during the past month was measured using the 19-item Pittsburgh Sleep Quality Index (PSQI) [22,23]. A global score and seven component scores were obtained, including sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, sleeping medication use, and daytime dysfunction. Components are scored on a 0-3 scale and combined with equal weights, yielding a global score (0-21). Higher scores indicate more severe complaints and poor sleep quality. Cronbach's alpha for the global PSQI was reported as 0.80 and was 0.71 in this study. A global PSQI score >5 has a sensitivity of 89.6% and a specificity of 86.5% in identifying poor sleepers. Optional questions 10-11 were not included.

Genetic Analysis
Genomic DNA was isolated from blood and sequenced on n = 128 participants from the parent study. Twenty-six circadian genes were selected for analysis based on results from the 2008 Sleep Research Society Presidential Task Force on Sleep/Circadian Rhythm SNP Gene Array Initiative and the report by Troung [17]. Exome sequencing was performed using the Nextera Rapid Capture Expanded Exome kit (Illumina, San Diego, CA). Target DNA included exons, untranslated regions (UTRs) and miRNAs. Following the manufacturers' suggested protocol, 50 ng of genomic DNA from each sample was subjected to "tagmentation" to generate a genome wide library of fragments. The targeted content was captured by hybridization of the library to the oligonucleotides provided by the manufacturer. The resultant exome library for each sample was quantified by qPCR and 10 pM of the pooled libraries were loaded three samples per lane on an Illumina HiSeq2500 DNA sequencer and 150 bp paired-end runs were performed.

Bioinformatic Methods
We used an established variant calling pipeline using bcbio-nextgen python toolkit (https://github.com/bcbio/ bcbio-nextgen) for the exome sequencing data. Initially, raw sequencing reads in FASTQ format were trimmed by the fqtrim tool.
(https://ccb.jhu.edu/software/fqtrim) to remove adapters, terminal unknown bases (Ns) and low quality 3' regions (Phred score <30). The quality of trimmed sequence reads were assessed using quality control tool FastQC [24]. The trimmed reads passing FastQC were aligned to the hg19 reference genome with Borrows-Wheeler Aligner [25] and further processed through the GATK pipeline [26,27] for base quality score recalibration, INDEL realignment, and mark duplicates, according to GATK's best practices recommendations [27,28]. Four variant callers, MuTect [29], freebayes [30], VarDict [31], and VarScan [32] were used to call variants from the sequencing data. All the called germline variants from the 128 blood samples were saved into 128 Variant Call Format (VCF) files. We further wrote a perl script to extract variants within the range of the 26 candidate genes (with 1Kb flanking) from the 128 germline VCF files and a python script to format the extracted variants into an excel table for follow-up association analyses.

Data Analysis
Due to positive skew, the primary outcome of sleep index value (PSQI) was log transformed to meet normality assumptions. Genetic variants with a minor allele frequency (MAF) less than 5% were excluded in the analysis. For each genetic variant, the association between log-transformed sleep index value (PSQI) and the genetic variant was determined by two-sample t-test or ANOVA. SAS software version 9.4 (SAS Institute Inc., Cary, NC) was used for all analyses. Linkage disequilibrium was determined using Haploview software [33].

Demographic and Clinical Characteristics of Participants
Participants' baseline demographic and clinical characteristics were representative of the breast cancer population ( Table 1). Women's mean age was 58.6 (SD = 13.6; range 27-85; median 59.6) years and they were predominantly Non-Hispanic whites (88.3%); married (62.7%); had some post-secondary education (74.1%); and were diagnosed at Stage I or Stage II breast cancer (81.4%).

Selection of Genetic Variants
Sequencing data from 26 circadian rhythm genes were obtained from 128 subjects; however, only 60 subjects had both sequencing data and self-reported PSQI scores. For these 60 subjects, we identified 5,279 genetic variants, of which 4,865 were excluded in the analysis because of a minor allele frequency (MAF) less than 5%. The remaining 414 variants were analyzed for their association with PSQI scores (continuous variable). Figure 1 illustrates the STROBE (Strengthening the Reporting of Observational studies in Epidemiology) diagram and the final sample for analysis.

Association between Genetic Variants and PSQI Score
Tables 2 and 3 list 25 genetic variants that were associated with the global PSQI score at p < 0.10, and 19 of these were significant at p < 0.05. The associations did not meet statistical significance after adjustment for multiple comparisons, possibly because of the exploratory nature of the study (large number of comparisons with a small samples). These genetic variants were found throughout the genome (chromosomes 2, 3, 5,7,9,11,12,13,22) and represented 15 genes including CSNK1D & E, SKP1, BHLHE40 & 41, NPAS2, ARNTL, MYRIP, KLHL30, TIMELESS, FBXL3, CUL1, PER1 & 2, and RORB. Most variants were found in intronic and untranslated regions except for two, which were synonymous and missense variants in BHLHE40 and TIMELESS genes, respectively. Mean log-transformed PSQI scores were higher for 10 polymorphisms among heterozygous subjects, relative to those with the homozygous genotype, the remaining variants were lower. Linkage disequilibrium was determined only on chromosome 3 (rs908078 vs rs34870629, rs34883305, rs74439275; r 2 = 0.52) and chromosome 5 (rs2110585 vs rs3815506, rs73791514; r 2 = 0.85).

Discussion
Studies have reported that 30-60% of breast cancer patients have poor sleep quality before receiving adjuvant chemotherapy and continue to have these symptoms even one year after the start of chemotherapy [34,35]. However, there is much variability in sleep quality symptoms among breast cancer patients, and it is not known why certain patients develop these symptoms and others do not. While environmental factors influence sleep, a growing body of evidence suggests genetic modulation of sleep quality [36]. Its genetic regulation is substantiated by the identification of polymorphisms in specific sleep disorders and the existence of familial sleep disorders [15]. Twin studies have shown sleep heritability (h 2 ) of 0.30-0.50. However, no study was located that evaluated the association of genetic variants in circadian pathway genes and sleep quality among breast cancer patients.
Findings from this exploratory study suggest that circadian genes may play a role in sleep quality in women with breast cancer. Twenty-five genetic variants were associated with the global PSQI score. The genetic variants found were throughout the genome (chromosomes 2, 3, 5,7,9,11,12,13,22) (Figure 2). Then PER and CRY form a negative feedback loop that represses their own transcription by acting on the heterodimer complex [37]. There is also evidence that TIMELESS is required for circadian regulation and interacts with CRY and PER proteins. The ARNTL heterodimers also induce another regulatory loop that activates RORA & B and subsequent transcription of ARNTL. Many other circadian proteins undergo post-translational modifications that affect the function of the feedback loops, including phosphorylation, acetylation, sumoylation and ubiquitination (CSNK1D & E, FBXL3, SKP1, CUL1).
Previous studies have investigated genetic markers of sleep in the general population using circadian candidate gene and genome wide association (GWA) study designs. The effects of PER3 variants, especially the variable number tandem repeats (VNTR), have been associated with many phenotypes including diurnal preference and sleep loss/circadian misalignment. In our study we did not find an association of the VNTR with sleep quality; however, this lack of replication may be due to the small size of the current study. In a candidate gene study, ARNTL (rs3816358) and NPAS2 (rs3768984) were associated with later actigraphic sleep and wake onset time in an elderly male population (n = 2,527) [38]. ARNTL was also found to be associated with sleep duration in a GWA study, though at a loci 40kb upstream of the gene, rs41348446 [39]. We also found associations between ARNTL and NPAS2 with sleep quality, however at different loci than previous studies.
Most variants found in this study are located in intronic and untranslated regions. The functional significance of these variants is unclear due to their possible linkage with other polymorphisms nearby. We found a missense variant in TIMELESS that was associated with poorer sleep quality as assessed by PSQI. While no studies have documented this association, the missense variant could potentially alter protein folding and interaction with PER and CRY, and thereby inhibit the primary transcription/translation loop in the circadian pathway, thus resulting in sleep disturbance.
Not only can circadian genes directly affect an individual's susceptibility to sleep disturbances, studies have found that genetic variation in circadian genes are risk factors for breast cancer, most likely by impacting the biological pathways that regulate DNA damage and repair, carcinogen metabolism and or detoxification, cell-cycle progression and apoptosis. One of the first epidemiologic studies correlated PER3 variants with increased risk of breast cancer [3]. This circadian-cancer link was confirmed in a meta-analysis showing association between risk of cancer and variants in NPAS2, RORA, RORB, and CLOCK [40][41][42]. As this study included only women with breast cancer, the link between breast cancer risk and genetic variants could not be ascertained.
There are several strengths and limitations of this study. To our knowledge, it is the first to include an extensive selection of variants and genes in the circadian rhythm pathway in association with self-reported sleep in a sample of women with breast cancer. We included 5,279 genetic variants found in 26 circadian genes. We used statistical methods to identify the association between self-reported sleep quality and circadian-related genetic variants. However, findings from this study must be interpreted with caution due to the small sample size. Larger studies replicated in several populations are needed to fully understand the biological implications of circadian pathway genes and their role in sleep disturbance among breast cancer patients. Our results also indicate that the exome sequencing methodology detected not only coding polymorphisms in the genome, but also a significant number of non-coding variants. We have since modified our sequencing protocol to more precisely target exomes and will use this newer technology to increase the probability of detecting functional coding genetic variants in a larger study. Another limitation of this study is that we used sleep quality as a subjective measure. The PSQI was completed only at one time and timing varied among participants. Future studies could focus on patterns of sleep and sleep-wake activity rhythm using objective measures and/or a biomarker such as melatonin, and their association with circadian genes.

Conclusions
Despite these limitations, findings from this study provide preliminary evidence for a role of circadian rhythm pathway genes in sleep quality among women with breast cancer. We conclude that these results merit further studies using larger sample sizes and more precise exome sequencing technology to allow for confirmatory analyses and identification of functional genetic variants, respectively. This research team is seeking funding to conduct a larger study in the near future.