TY - JOUR
T1 - Microsatellites’ mutation modeling through the analysis of the Y-chromosomal transmission
T2 - Results of a GHEP-ISFG collaborative study
AU - Antão-Sousa, Sofia
AU - Gusmão, Leonor
AU - Modesti, Nidia M.
AU - Feliziani, Sofía
AU - Faustino, Marisa
AU - Marcucci, Valeria
AU - Sarapura, Claudia
AU - Ribeiro, Julyana
AU - Carvalho, Elizeu
AU - Pereira, Vania
AU - Tomas, Carmen
AU - de Pancorbo, Marian M.
AU - Baeta, Miriam
AU - Alghafri, Rashed
AU - Almheiri, Reem
AU - Builes, Juan José
AU - Gouveia, Nair
AU - Burgos, German
AU - Pontes, Maria de Lurdes
AU - Ibarra, Adriana
AU - da Silva, Claudia Vieira
AU - Parveen, Rukhsana
AU - Benitez, Marc
AU - Amorim, António
AU - Pinto, Nadia
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2024/3
Y1 - 2024/3
N2 - The Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) organized a collaborative study on mutations of Y-chromosomal short tandem repeats (Y-STRs). New data from 2225 father-son duos and data from 44 previously published reports, corresponding to 25,729 duos, were collected and analyzed. Marker-specific mutation rates were estimated for 33 Y-STRs. Although highly dependent on the analyzed marker, mutations compatible with the gain or loss of a single repeat were 23.2 times more likely than those involving a greater number of repeats. Longer alleles (relatively to the modal one) showed to be nearly twice more mutable than the shorter ones. Within the subset of longer alleles, the loss of repeats showed to be nearly twice more likely than the gain. Conversely, shorter alleles showed a symmetrical trend, with repeat gains being twofold more frequent than reductions. A positive correlation between the paternal age and the mutation rate was observed, strengthening previous findings. The results of a machine learning approach, via logistic regression analyses, allowed the establishment of algebraic formulas for estimating the probability of mutation depending on paternal age and allele length for DYS389I, DYS393 and DYS627. Algebraic formulas could also be established considering only the allele length as predictor for DYS19, DYS389I, DYS389II-I, DYS390, DYS391, DYS393, DYS437, DYS439, DYS449, DYS456, DYS458, DYS460, DYS481, DYS518, DYS533, DYS576, DYS626 and DYS627 loci. For the remaining Y-STRs, a lack of statistical significance was observed, probably as a consequence of the small effective size of the subsets available, a common difficulty in the modeling of rare events as is the case of mutations. The amount of data used in the different analyses varied widely, depending on how the data were reported in the publications analyzed. This shows a regrettable waste of produced data, due to inadequate communication of the results, supporting an urgent need of publication guidelines for mutation studies.
AB - The Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) organized a collaborative study on mutations of Y-chromosomal short tandem repeats (Y-STRs). New data from 2225 father-son duos and data from 44 previously published reports, corresponding to 25,729 duos, were collected and analyzed. Marker-specific mutation rates were estimated for 33 Y-STRs. Although highly dependent on the analyzed marker, mutations compatible with the gain or loss of a single repeat were 23.2 times more likely than those involving a greater number of repeats. Longer alleles (relatively to the modal one) showed to be nearly twice more mutable than the shorter ones. Within the subset of longer alleles, the loss of repeats showed to be nearly twice more likely than the gain. Conversely, shorter alleles showed a symmetrical trend, with repeat gains being twofold more frequent than reductions. A positive correlation between the paternal age and the mutation rate was observed, strengthening previous findings. The results of a machine learning approach, via logistic regression analyses, allowed the establishment of algebraic formulas for estimating the probability of mutation depending on paternal age and allele length for DYS389I, DYS393 and DYS627. Algebraic formulas could also be established considering only the allele length as predictor for DYS19, DYS389I, DYS389II-I, DYS390, DYS391, DYS393, DYS437, DYS439, DYS449, DYS456, DYS458, DYS460, DYS481, DYS518, DYS533, DYS576, DYS626 and DYS627 loci. For the remaining Y-STRs, a lack of statistical significance was observed, probably as a consequence of the small effective size of the subsets available, a common difficulty in the modeling of rare events as is the case of mutations. The amount of data used in the different analyses varied widely, depending on how the data were reported in the publications analyzed. This shows a regrettable waste of produced data, due to inadequate communication of the results, supporting an urgent need of publication guidelines for mutation studies.
KW - Microsatellites
KW - Mutation
KW - Mutation modeling
KW - Mutation rate estimation
KW - Y chromosome
KW - Y-STRs
UR - http://www.scopus.com/inward/record.url?scp=85183424006&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183424006&partnerID=8YFLogxK
U2 - 10.1016/j.fsigen.2023.102999
DO - 10.1016/j.fsigen.2023.102999
M3 - Article
C2 - 38181588
AN - SCOPUS:85183424006
SN - 1872-4973
VL - 69
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
M1 - 102999
ER -