A Functional Data Analysis Approach for Circadian Patterns of Activity of Teenage Girls

Background: Longitudinal or time-dependent activity data are useful to characterize the circadian activity patterns and to identify physical activity differences among multiple samples. Statistical methods designed to analyze multiple activity sample data are desired, and related software is needed to perform data analysis. Methods: This paper introduces a functional data analysis (fda) approach to perform a functional analysis of variance (fANOVA) for longitudinal circadian activity count data and to investigate the association of covariates such as weight or body mass index (BMI) on physical activity. For multiple age group adolescent school girls, the fANOVA approach is developed to study and to characterize activity patterns. The fANOVA is applied to analyze the physical activity data of three grade adolescent girls (i.e., grades 10, 11, and 12) from the NEXT Generation Health Study 2009–2013. To test if there are activity differences among girls of the three grades, a functional version of the univariate F-statistic is used to analyze the data. To investigate if there is a longitudinal (or time-dependent activity count) difference between two samples, functional t-tests are utilized to test: (1) activity differences between grade pairs; (2) activity differences between low-BMI girls and high-BMI girls of the NEXT study. Results: Statistically significant differences existed among the physical activity patterns for adolescent school girls in different grades. Girls in grade 10 tended to be less active than girls in grades 11 & 12 between 5:30 and 9:30. Significant differences in physical activity were detected between low-BMI and high-BMI groups from 8:00 to 11:30 for grade 10 girls, and low-BMI group girls in grade 10 tended to be more active. Conclusions: The fda approach is useful in characterizing time-dependent patterns of actigraphy data. For two-sample data defined by weight or BMI values, fda can identify differences between the two time-dependent samples of activity data. Similarly, fda can identify differences among multiple physical activity time-dependent datasets. These analyses can be performed readily using the fda R program.


Introduction
Longitudinal or time-dependent activity data are useful to characterize the physical activity patterns and to identify activity differences among multiple samples. Statistical methods designed to analyze activity data collected annually are desired, and related software is needed to perform routine data analysis. In particular, the methods which may characterize the temporal trends and differences of the activity data are important and needed [8]. It is a tradition in the circadian research to employ simple cosinor models or harmonic modeling approaches to detect the 24-hour activity patterns in the activity data and to compare amplitude and phase shifts between groups of interest [7,9,10]. However, analyses of these data could benefit from functional-based approaches.
In this paper, we develop a functional data analysis (fda) approach to measure and analyze physical activity patterns in adolescent girls [5]. The functional data analysis focuses on an overall comparison of the two curves with temporal or point-wise (at multiple time points during the 24 hours) comparisons between groups. An advantage of the functional data analysis approach is that since all the tests are permutation-based, it is less sensitive to distributional assumptions. One may perform a functional analysis of variance (fANOVA) to detect the temporal activity difference of two or multiple samples. As examples, fANOVA methods are used to explore: (1) the role of body mass index (BMI) on the activity patterns among teenage girls over time; and (2) the differences in the activity patterns of teenage girls over consecutive years. As a second research objective, we investigate the variability by race and family affluence in activity for the adolescent girls.
Specifically, we utilize and analyze the physical activity data of adolescent girls in grades 10, 11, and 12 from the NEXT Generation Health Study 2009-2013. The NEXT Generation Health Study is a longitudinal study investigating the health behaviors of adolescents. The study contains a nationally-representative sample of U.S. students followed from grades 10-12. The goals of the NEXT study include (1) to identify the trajectory of adolescent health status and health behaviors from mid-adolescence through the post high school year; (2) to examine individual predictors of the onset of key adolescent risk behaviors and risk indicators during this period; (3) to identify family, school, and social/environmental factors that promote or sustain positive health behaviors; and (4) to identify transition points in health risk and risk behaviors and changes in family, school, and social/environmental precursors to these transitions.
The remainder of this article is organized as follows. In Section 2, we provide an introduction of the physical activity data of the NEXT study and a brief outline of the fANOVA methods we need. In Section 3, we show the analysis results of the NEXT study. Section 4 includes the discussion of the findings and some remarks.

Data
In the NEXT Generation Health Study, activity counts were measured using Actiwatch2 devices manufactured by Respironics Inc. (http://www.actigraphcorp.com/ company/). In this study, we analyze physical activity data of adolescent girls of grades 10, 11, and 12 collected over 3 consecutive years using Actiwatches, respectively. Only female student data are used for our analysis. The demographic information of subjects in the three grades is provided in Table 1. In grade 10, the data of 95 students are available, and for grades 11 and 12, the activity data of 85 and 84 students are available due to some dropouts, respectively. Race/ethnicity includes four categories: "Hispanic", "African American", "Asian", and "White". The number of "Asian" students is very small (2 in grade 10, 1 in grade 11, and 2 in grade 12), and the numbers of the other three categories range from 23 to 35 and vary slightly among the three grades.
The study data analyzed is a subsample of the NEXT data that oversampled overweight children. This subsample of 550 adolescents (NEXT-Plus Cohort) consisted of 50% overweight individuals. Two BMI groups, "Low BMI Group" vs. "High BMI Group", are defined via pre-assigned categories based on percentiles of 550 adolescents: "Low BMI Group" was defined by < 95th percentile and "High BMI Group" was defined by >= 95th percentile. This corresponds to the upper and lower 20% percentiles of the study data in this paper ( Table 1). Family Affluence includes three categories: "Low Affluence", "Moderate Affluence", and "High Affluence". Parental Education 1 and 2 describe the highest level of education completed by each respective guardian in the household; education level is reported using a seven point scale.
Physical activity counts were measured every 30 seconds using the Actiwatch2 devices. The Actiwatch activity monitor contains an omni-directional sensor sensitive to 0.01gravity (0.098 mzs22), and is capable of detecting acceleration in two planes. The sensor integrates the degree and speed of motion and produces an electrical current that varies in magnitude such that an increase in speed and motion produces an increase in voltage stored as activity counts. For this study, activity monitors were placed on participants' wrists. Times in different time zones were accounted for to make sure that the time interval of physical activity counts started at 0:00 for each individual. Only full weekday 24-hour records from 0:00 -23:59.5 with activity count sums greater than 30,000 are included. To make the activity data comparable, activity counts were collected in 30-second epoches for 7 consecutive days. To facilitate data analysis, the activity counts were summed up over every 15 minutes to give 4 observations an hour. In total, each student has N = 4 × 24 = 96 counts each day for the 7 consecutive days. Thus, each student has an accumulated activity count at each time point t i each day, where t i = i/ 4, i = 0,1, 2, . . . , N -1 = 95. Usable data are averaged over the 7 consecutive days at each time point for one full day, and so the final data consist of N = 96 observations for each individual.

Functional Data Analysis
Consider a sample with n subjects listed as individuals i = 1, . . ., n. For an individual i, assume that N activity counts are available at times t 1 , . . . , t N . The activity count at t j is denoted by y i (t j ), j=1, . . ., N, and so the activity counts of individual i can be summarized as Y i = (y i (t 1 ), . . . , y i (t N )) [1]. The activity profile y i (t) of individual i is a function of time t, which can be estimated by Y i . To estimate the activity function y i (t) from the activity counts Y i , we use an ordinary linear square smoother [1][2][3][4][5]. Specifically, let φ k (t), k = 1, . . ., K, be a series of K basis functions, such as B-spline basis and Fourier basis functions. Let Φ denote the N by K matrix containing the values φ k (t j ), where j ε 1, . . ., N. Using the discrete realizations Y i = (y i (t 1 ), . . . , y i (t N ))′, we may estimate the activity function y i (t) using an ordinary linear square smoother as follows where φ (t) = (φ 1 (t), . . . , φ K (t))′. The estimate ŷ i (t) smoothes activity patterns over time. In this article, we consider the Fourier basis functions: φ 0 (t) = 1, φ 2r-1 (t) = sin(2πrt/N), and φ 2r (t) = cos(2πrt/N), r = 1, . . ., (K -1)/2, where K is taken as a positive odd integer. In our analysis of the physical activity data of NEXT study, K is taken a value of 25. We also try other values such as K = 21, 23, and 27, and the results are similar to those of K = 25. One may use B-spline basis functions, but the activity data are likely to be periodic and Fourier basis functions make more sense [1][2][3][4][5].
We are interested in whether there are activity differences between two BMI groups and between grade pairs for the adolescent girls at every time point. To investigate differences between two groups, we use permutation t-tests. Let [x 11 (t), . . . , x 1n1 (t)] and [x 21 (t), . . . , x 2n2 ] be two sub-samples of activity functions with sample sizes n 1 and n 2 . For each time value t, we consider the absolute value of a t-statistic to evaluate the difference 1 2  Var[x 2 (t)] are variance functions of functions of x 1i (t) and x 2i (t), respectively.
To test if there are activity differences among the three grade adolescent girls, we use a functional version of the univariate F-statistic. Let y i (t), i = 1, . . ., n, be a sample consisting of three sub-samples of activity count functions of grades 10, 11, and 12. In addition, let x ij take value 1 or 0 which indicates if the activity count function is from j = 1 for grade 10, j = 2 for grade 11, and j = 3 for grade 12. For instance, x i1 = 1 indicates that y i (t) is from grade 10, and x i1 = 0 indicates that y i (t) is from grade 11 or 12. One may want to notice that one and only one of x i1 , x i2 , and x i3 is equal to 1, and so the summation x i1 + x i2 + x i3 is equal to 1 for all i. Consider the following functional linear model of functional activity data where β 0 (t) is functional intercept, β j (t) is functional regression coefficient of x ij , and ε i (t) is the functional error term. The functional version of the univariate F-statistic is defined by where ŷ (t) are the predicted values from the functional linear model (3) [5].
To find a critical value of this statistic, we use a permutation test. We perform the following procedure: (1) randomly shuffle the labels of the smoothed activity functions; (2) recalculate the maximum of T(t) or F(t) with the new labels. Repeating this many times allows a null distribution of no activity difference to be constructed. This provides a reference for evaluating the maximum value of the observed T(t) or F(t). In our analysis, we execute a permutation test T(t) or F(t) by a default value of 200 random shuffles. A p-value of the test T(t) or F(t) is the proportion of permutation T(t) or F(t) values that the maximum of T(t) or F(t) are larger than the T Obs (t) or F Obs (t) statistics for the observed one. As suggested in Ramsay JO, Hooker G, and Graves S, two different ways are used to calculate the p-values: global test and point-wise test. The global test provides a single p-value level which is the proportion of maximized T(t) or F(t) values that are larger than maximized T Obs (t) or F Obs (t) at all time points t. The point-wise test provides a curve which is the proportion of all permutation T(t) or F(t) values which are larger than the observed T Obs (t) or F Obs (t) at each time point t.

Results
Data display and smoothed activity functions. We smoothed each individual's activity counts by the linear square smoother defined in relation (1). Figure 1 shows activity patterns and activity differences by grade of subject 909010214. In the Figure, activity counts over time are shown by black dots on the left-hand side plots (a), (b), and (c). In the right-hand side plots (d), (e), and (f), the difference of activity counts between three pairs of grades are shown by black dots. The smoothed Fourier expansion in each plot is shown by the red solid curve. Figure 1 suggests that there are activity differences among the three grade girls since the differences are not always around 0 in the daytime.
Smoothed activity patterns and difference among grades. Figure 2 shows smoothed activity patterns of girls by grade and combinations of all activity data. In the plots (a), (b), and (c), the individual smoothed Fourier expansions of activity data of grades 10, 11, and 12 are shown, respectively. In plot (d), smoothed Fourier expansions of combinations of all activity data of three grade girls are shown. Each individual's smoothed activity pattern is represented by a black line. The mean activity pattern across all subjects is shown by the red line in each plot. In the plots (b) and (c), there is an activity peak between 5:30 and 9:30 for grades 11 and 12 girls, but no peak for the grade 10 girls in plot (a) in the time interval. Figure 3 shows the observed F-statistic and t-statistic results and related permutation critical values for a relationship between activity counts and grade girls. The red solid curve represents the observed statistic F(t) or T(t) at each time point, and the green dashed and blue dotted lines correspond to permutation critical values for the maximum and point-wise statistics at a significance level α = 0.05, respectively. When F(t) or T(t) is above the green dashed or blue dotted line, the two/three grade girls have significantly different mean activity patterns at those time points. The global critical value (green dashed line) is preferred since it is more conservative. Thus, we use the global critical value to check if there are activity differences in a time interval.
The plot (a) in Figure 3 shows that there are activity differences among the three grade girls in two time intervals: one from 5:30 to 9:30, and the other from 22:00 to 2:00. The plots (b) and (c) of Figure 3 show that the physical activity pattern in girls of grade 10 is significantly different from the patterns observed in girls of grades 11 and 12 in these two time intervals. The plot (d) of Figure 3 reveals that the physical activity patterns in girls of grades 11 and 12 are not significantly different from each other. Combining the results of Figures 2 and 3, the grade 10 girls have lower activity than girls of grades 11 and 12 between 5:30 and 9:30. Therefore, physical activity patterns in grade 10 girls differ significantly from the activity patterns in girls of both grades 11 and 12, which is consistent with the smoothed activity patterns in Figure 2.
Smoothed activity patterns and difference between BMI groups. Figure 4 shows the smoothed activity patterns by BMI status in the three grades. In the plots (a) and (b), the individual smoothed Fourier expansions of grade 10 girls are shown for low-BMI vs. high-BMI groups, respectively. In the plots (c) and (d), the individual smoothed Fourier expansions of grade 11 girls are shown for low-BMI vs. high-BMI groups, respectively. In the plots (e) and (f), the individual smoothed Fourier expansions of grade 12 girls are shown for low-BMI vs. high-BMI groups, respectively. From the plots (a) and (b) of grade 10 girls, the smoothed activity functions of low-BMI group are generally higher than those of high-BMI group. Figure 5 shows the observed t-statistic results and related permutation critical values for the activity counts between low-BMI and high-BMI groups in the three grades. There is a significant activity difference between the low-BMI and high-BMI groups in grade 10 girls [plot (a) in Figure 5], but not in girls of grades 11 and 12 [plots (b) and (c) in Figure 5]. The difference in grade 10 girls occurs between 8:00 and 11:30. Ogbagaber et al. (2014) [7] found that there is a significant difference between the low-BMI and high-BMI groups in grade 10 girls, and the result in plot (a) of Figure 5 confirms the result. Activity differences among three categories of race/ethnicity and among family affluence. In our analysis, we investigated if there are activity differences among three categories of race/ethnicity: "Hispanic", "African American", and "Asian and White" (i.e., "Asian" students are combined into a single category with "White" students due to the small number of Asians). In addition, we examined activity differences among the three categories of family affluence. No significant differences were found for the race/ethnicity or family affluence categories since the observed F-statistic is below the green dashed critical values in Figures S.1 and S.2 as displayed in the Supplementary Materials.

Discussion
This paper introduced a functional data analysis approach to perform an fANOVA for physical activity count data and to investigate the impact of covariates such as weight or BMI on physical activity. The fANOVA approach was developed to study changes in circadian rhythms across longitudinal follow-ups, i.e., to study and to characterize the activity patterns of physical activity across multiple grade girls. The fANOVA was applied to analyze the physical In addition, we examined activity differences among three categories of race/ethnicity or among the three categories of family affluence. To investigate if there is a temporal difference between two samples, functional t-tests were utilized to test the following: (1) activity differences between activity data of grade pairs; and (2) activity differences between low-BMI girls and high-BMI girls of the NEXT study. To get critical values of the tests, one may perform permutation tests to avoid problems of normal distribution assumptions of the count data. Statistically significant differences existed among the activity patterns over time for adolescent school girls. The tenth grade girls tended to be less active than high grade adolescent school girls of grades 11 & 12 between 5:30 and 9:30 and between 22:00 and 2:00. Significant differences of physical activity were detected between low-BMI and high-BMI groups from 8:00 to 11:30 for grade 10 girls, and low-BMI group girls of grade 10 tended to be more active. For the school girls of grades 11 or 12, no significant difference existed in the physical activity patterns between low-BMI and high-BMI groups. No significant differences were found for race/ethnicity or family affluence categories.
The difference over time may be due to the way subjects were accrued. The original 95 participants were girls measured during the summer months (Tables S.1 and S.2 in the Supplementary Materials). In subsequent grades, some girls were measured in the spring or winter and were likely in school, working, or engaged in different activities than in the 10th grade. Also, 11th-and 12th-grade girls may have had more independence and less structure in daily activities relative to girls in the 10th grade. These may be reasons that the tenth grade girls tended to be less active. Activity tends to decline during high school as reported in Allison et al. (2007) [10]. The functional data analysis approach was proposed and used in Wang et al. (2011) [8] to analyze the physical activity data of two groups. The models and methods of this paper are similar to those of Wang et al. (2011) [8]. In our analysis, we analyzed physical activity data of two groups and three groups and the methods can be used to analyze data of higher numbers of groups. Ogbagaber et al. (2012) [7] developed a harmonic shape invariant model to estimate circadian cycles and to analyze the grade 10 data of the NEXT study. The fda approach is flexible since the R package fda can facilitate the implementation of the proposed methods easily. In addition, it is possible to adjust for covariates, e.g., the covariates can be added to the functional linear model (3).
We address the first three goals of the NEXT study outlined in the section of Introduction by analyzing physical activity data by functional data analysis approaches: (1) to study the trajectory of physical activity of adolescent girls; (2) to examine the effect of risk indicators such as BMI; and (3) to examine the effect of family affluence and race/ ethnicity on physical activity data. The analytic approaches presented in this paper are very general and can be applied to similar problems, such as physical activity data of both males and females in addition to BMI status. In this NEXT study, we only have female adolescent girls. In other studies, multiple sub-sample data may be available, and the methods from this paper can be very useful for future analyses.

Acknowledgment
Three anonymous reviewers and the editor, Dr. Refinetti, provided very good and insightful comments for us to improve the manuscript. We thank Dr. Giles Hooker for the kind and prompt response of many questions and inquiries about his R package of fda. Dr. Denise Haynie, Dr. Candice Grayton, and Dr. Kaigang Li kindly explained to us the data structure of the NEXT study to facilitate our analysis.
This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih. gov). This study was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Maryland, USA.

Supplementary Materials for A Functional Data Analysis Approach for Circadian Patterns of Activity of Teenage Girls Appendix A. Activity Measurement Recording Counts by Months
In the Table S.1, we provide the sub-sample sizes in grades 10, 11, and 12 by recording months. In the Table S.2, we provide the sub-sample sizes in grades 10, 11, and 12 by summer recording (i.e., months 6,7,8) vs. non-summer recording (i.e., months 1-5, 9-12).

Grade 10
Grade