Abstract
Keywords
Introduction
An approach to preventing neurodevelopmental disorders is to gain better understanding of how neurodevelopment is coordinated and then to identify interference from environmental, genomic, and epigenomic factors. The development of the nervous system requires tight regulation and coordination of multiple functions essential to protect and nourish neurons. As the nervous system develops, the immune system, the circulatory system, and cranial and skeletal systems must all undergo synchronized and coordinated development. Neurodevelopmental disorders follow the disruption of this coordination.
A significant advance in genome sequence–level resolution of balanced cytogenetic abnormalities greatly improves the ability to document changes in regulation and dosage for genes critical for the neurologic system. Based on DNA sequence analyses, some chromosome rearrangements have been identified as causing individual congenital disorders because they disrupt genes essential for normal development.1–3 There is poor understanding and no effective treatment for many of these overwhelming abnormalities. Signs and symptoms include autism, microcephaly, macrocephaly, behavioral problems, intellectual disability, tantrums, seizures, respiratory problems, spasticity, heart problems, hearing loss, and hallucinations. 1 Because the abnormalities do not correlate well with eventual outcome, genetic counseling is difficult and uncertain. 3
In congenital neurologic disease, inheritance is usually autosomal dominant so the same chromosomal abnormalities occur in every cell. The genetic events that lead to most neurodevelopmental disorders are not understood 4 but several maternal infections and other lifestyle factors are known to interfere.
DNA homology between non-humans and humans is a known fact, and DNA swapping between vertebrates and invertebrates has been reported. An early draft of the human genome found human genomes have undergone lateral gene transfer to incorporate microorganism genes. 5 Lateral transfer from bacteria may have generated many candidate human genes. 6 Genome-wide analyses in animals found up to hundreds of active genes generated by horizontal gene transfer. Fruit flies and nematodes have continually acquired foreign genes as they evolved. Although these transfers are thought to be rarer in primates and humans, at least 33 previously unreported examples of horizontally acquired genes were found. 6 These findings argue that horizontal gene transfer continues to occur to a larger extent than previously thought. Transferred genes that survive have been largely concerned with metabolism and make important contributions to increasing biochemical diversity. 7 A picture of humans as a large ecosystem of human and non-human DNAs is emerging.
The present work implicates foreign DNA largely from infections as a cause of the chromosome anomalies that cause birth defects. Infections replicate within the human central nervous system by taking advantage of immune deficiencies such as those traced back to deficient microRNA production 8 or other gene damage. Disseminated infections can then interfere with the highly active DNA break repair process required during meiosis. The generation of gametes by meiosis is the most active period of recombination, which occurs at chromatin positions enriched in epigenetic marks. Hundreds of double-strand breaks accompany meiotic recombination. 9 Gametes with errors in how this recombination occurs cause chromosome anomalies in the zygote. In contrast to oocytes, meiotic recombination in sperm cells occurs continuously after puberty.
The exact DNA sequences of known pathogenic rearrangements in individual, familial, and recurrent congenital disorders1–3 make it possible to test for association with foreign DNA. Even rare developmental disorders can be screened for homology to infections within altered epigenomes and chromatin structures. Considering effects of foreign DNAs can assist renatal and genetic counseling, diagnosis, prevention, and early intervention.
The results showed that DNA abnormalities in some neurodevelopmental disorders closely match DNA in multiple infections that extend over long linear stretches of human DNA and often resemble repetitive human DNA sequences. Massive changes in the identity and distribution of sets of homologous alien DNAs that accompany chromosome abnormalities may drive and stabilize them. Removing competition by changing the sets of foreign DNAs may encourage pathogens.
The affected human sequences are shown to exist as linear clusters of genes closely spaced in 2 dimensions. Interference from infection can also delete or damage human gene clusters and change epigenetic functions that coordinate neurodevelopment. This microbial interference accounts for immune, circulatory, and structural deficits that accompany neurologic deficits.
Congenital neurodevelopmental disorders are thus viewed as resulting from an assault on human DNA by foreign DNA and perhaps subject to selection based on their similarity to host DNA. It is important to remember that effects in 2 dimensions can sometimes alter 3-dimensional topology as well. 10
Testing and verifying predictions from a viable model may spur the development of methods for identifying contributions from infections in intractable rare disorders that are not now available. Convergent arguments from testing predictions based on any proposed model might lessen the effects of limitations on currently available technology.
Materials and Methods
Data sources
DNA sequences from acquired congenital disorders were from published whole genome sequences at chromosome breakpoints and rearrangement sites.1–3 Comparison with multiple databases of microbial sequences determined whether there was significant human homology. Emphasis was on cases with strong evidence that a particular human chromosome rearrangement was pathologic for the congenital disorder. Patients in the 3 major studies1–3 used in this analysis were 98 females and 144 males. Of the patients with background information available, most (46) were younger than the age of 10, 7 were in the range of 10 to 20, 6 were in 20 to 40, and 1 patient was older than 40 years.
Testing for homology to microbial sequences
Hundreds of different private rearrangements in patients with different acquired congenital disorders were tested for homology 11 against nonhuman sequences from microorganisms known to infect humans as follows: Viruses (taxid: 10239), and retroviruses including HIV-1 (taxid: 11676), human endogenous retroviruses (taxids: 45617, 87786, 11745, 135201, 166122, 228277, and 35268); bacteria (taxid: 2); Mycobacteria (taxid: 85007); fungi (taxid; 4751), and chlamydias (taxid: 51291).
Because homologies represent interspecies similarities, “Discontinuous Megablast” was most frequently used, but long sequences were sometimes tested against highly similar microbial sequences. Significant homology (indicated by homology score) occurs when microbial and human DNA sequences have more similarity than expected by chance (E value ⩽ e-10). 12 Confirmation of microorganism homologies was done by reverse testing multiple variants of complete microorganism genomes against human genomes and by extending the original analyses to 20 000 matches.
Various literature analyses have placed Alu repeats into 8 subfamilies having consensus sequences (GenBank; accession numbers U14567-U14574). Microbial sequences were independently compared with all 8 consensus Alu sequences and with 442 individual AluY sequences. Plots of chromosome locations of repetitive elements were from the UCSC genome browser (GrCH38)..
Chromosome localizations
The positions of microbial homologies in human chromosomes were determined using BLAT or BLAST. Comparisons were also made to cDNAs based on 107 186 Reference Sequence (RefSeq) RNAs derived from the genome sequence with varying levels of transcript or protein homology support. Tests for contamination by vector sequences in these nontemplated sequences were also carried out with the BLAST program. Inserted sequences were also compared with Mus musculus GRCM38.p4 [GCF_000001635.24] chromosome plus unplaced and unlocalized scaffolds (reference assembly in Annotation Release 106). Homology of inserted sequences to each other was tested using the Needleman and Wunsch algorithm. Lists of total homology scores for microbes vs human chromosome rearrangements were compared by the Mann-Whitney
Results
Interdependent functions are clustered together on the same chromosome segment
The nervous system has a close relationship to structures essential for immunity, circulation, cell barriers, and protective enclosures. Genes essential for all these functions must develop in concert so chromosome segments deleted in neurodevelopmental disorders may be critical for this coordination. Genes for these related functions are located close to each other on the same linear segment of a chromosome (Figure 1). For example, deletions at 4q34 in patient DGAP161 are shown in Figure 1. The genes within the 4q34 deletion in each of the 4 categories tested are color coded in Figure 1. Figure 1 is representative of 6 other chromosome bands that were also tested and gave similar results: 2q24.3, 6q13-6q14.1, 10p14-10p15.1, 13q14.2, 18p11.22-p11.32, and 19q12-q13.1. Deletion of these clustered arrangements has been correlated with serious neurodevelopmental disorders. 1 Alternatively spliced forms of the same gene may encode for pleiotropic functions that must be synchronized and coordinated among diverse cells. Multiple functions for the same gene in different cell types are commonly found. Hormonal signaling represents a major control mechanism. 13

Chromosome 4q34 as a typical example of close relationships between nervous system genes and genes for other essential developmental functions.
Many deleted gene clusters include long stretches of DNA strongly related to foreign DNAs
To investigate the chromosomal segment deletions that likely cause neurodevelopmental disorders, homologies to infection were tested in sequences within and flanking deleted clusters. Strong homologies to infections were interspersed. To demonstrate the extent of these relationships, a deleted 4q34 chromosome segment (Patient DGAP161) 1 was tested for homology to microbes and then compared to repetitive elements in the same region.
Figure 2 shows that stretches of homology to microorganisms are distributed along a 500,000 bp segment at the 5’ end of chromosome 4q34. For comparison, the distribution of repetitive elements is shown below the plots of homologous foreign DNAs.

Example of total homologies to microbial sequences which are dispersed throughout the normal 4q34 chromosome segment deleted in patient DGAP161.
Enormous effects on microbial homologies accompany changes in junction sequences between chromosome bands caused by deletions.
Figure 3 represents a snapshot of how local homologies to foreign DNAs shift when a chromosome segment is deleted. Figure 3 (top) represents 200,000 bps on each side of the normal 4q33-4q34 junction and its change after the 4q34 deletion to the new junction 4q33-4q35. Around the breakpoint and 3′ to it, microbial homologues become very different. Although the representations of microbial homologies as red rectangles appear to be small on Figure 3 (top), the matching sequences actually extend for hundreds of base pairs. Normally, there are multiple homologies to a variety of microorganisms at the breakpoint (yellow rectangle) which may help destabilize the area. After the deletion, the junction formed now has new and strong homology to pathogens

Snapshot of very large differences in local microbial homologies in one 200 kb section of chromosome 4q34.
Deleted segments in familial chromosome anomalies point toward a general mechanism for infection as a cause of neurodevelopmental disorders
Genome sequencing of an entire family may be necessary 3 because some family members carry balanced chromosomal translocations but do not have neurodevelopmental disease. In 3 of the 4 families with familial balanced chromosomal translocations, patient-specific unbalanced deletions were found but the results did not overlap any database of human reference genomes. 3 A disease-associated deletion in the study of Aristidou et al 3 (family 2) was tested by comparing equivalent numbers of bps at the junction sequence created by the deletion vs the original chromosome junction sequence without the deletion (GRCh37: Chr16:49,741,265-49,760,865).
In Figure 4, the changes are enormous, involving new distributions (Mann-Whitney,

A structural variant unique to an affected member of family 2 in Aristidou et al 3 has massive changes in the distribution and identities of homologous microorganisms when compared with the unaffected mother.
Epigenetic functions of mutated and deleted genes in neurodevelopmental disorders relate neurologic deficits to deficits in the immune system, the circulatory system, and structural genes.
Genes in bold type are related to epigenetic control; checkmark indicates identical gene with mainly epigenetic functions is listed in column 2. Other genes that do not completely match column 2 are listed individually.
Some chromosome regions with microbial homologies are only deleted in affected family members in families that share a recurrent translocation
Recurrent de novo translocations between chromosomes 11 and 22 have so far only been detected during spermatogenesis and have been attributed to palindromic structures that induce genomic instability. 2 The recurrent breakpoint t(8;22)(q24.13;q11.21) 2 was tested to determine whether palindromic rearrangements might arise because infection interferes with normal chromatin structures.
Figure 5 shows strong homology to bacterial and viral sequences in a family with a recurrent translocation DNA sequences from an unaffected mother carry a balanced translocation rearrangement
2
with different homologies than are present in affected cases (Figure 5). The distributions of homologous microorganisms are clearly different for the unaffected mother vs affected Case 12 Der(8) (Mann-Whitney,

Changes in alien DNA homologies in an affected child born to a mother with a recurrent translocation.
Microbial DNA homologies in areas around a mutated epigenetic driver gene
In some patients with neurodevelopmental disorders, a chromosomal anomaly disrupts a critical driver gene with strong evidence that the disrupted driver contributes to the disease.
1
The genes identified as underlying phenotypic drivers of congenital neurologic diseases include chromatin modifiers.
1
In agreement with this designation, Table 1 shows that most pathogenic driver genes are more specifically epigenetic factors (at least 45 of the 66 patients listed in Table 1). Using a value of 815 as a rough estimate of the total number of epigenetic factors in the human genome
42
containing 20 000 genes, the probability that association between neurodevelopmental patients and epigenetic modifications occurs by chance is
Pathogenic driver gene mutations caused by large chromosome deletions amplify their effects because of epigenetics
Because most identified driver genes of neurodevelopmental disorders
1
are epigenetic factors (Table 1), the functions they control in individual patients and in families with members affected by neurodevelopmental disorders
3
were compared with genes in pathogenic chromosomal deletions. Parts of pathogenic chromosome deletions affected these kinds of critical neurodevelopmental driver genes.
1
Like clustered chromosomal deletions, virtually all pathogenic driver genes have strong effects on the immune system, angiogenesis, circulation, and craniofacial development. Figure 6 summarizes how the functions of damaged epigenetic drivers are distributed and shows that all 46 gene drivers of neurodevelopment have pleiotropic effects. By comparison, pleiotropy has been documented for 44% of 14 459 genes in the GWAS catalog.
43
By this standard, neurodevelopmental driver genes are disproportionally pleiotropic (

As a result of chromosome anomalies, driver genes truncated or deleted in congenital neurodevelopmental disorders are mainly epigenetic regulators or effectors. The pie chart shows the percentages of 46 driver genes that have the epigenetic functions indicated. Loss of these driver gene functions then impacts a group of functions that must be synchronized during the complex process of neurodevelopment. These are the same general functions lost in deleted gene clusters.
Clear evidence of a non-human insertion
In 48 patients, 1 multiple infection-matching sequences were included in chromosomal anomalies generated by balanced chromosomal translocations (data not shown). Sequences around individual breakpoints were tested for microbial insertions by first comparing the sequences with human and then with microbial DNA. For example, chromosome breakpoint 2 in patient DGAP154 matched human DNA X-chromosome in 2 segments with a gap in the sequence (Figure 7). The gap did not match human sequences but did correspond to nematodes and yeast-like fungi, suggesting one or more of these microorganisms had inserted foreign DNA into patient DGAP154 (Figure 7). More frequently, however, other breakpoints in DGAP154 chromosomes matched many microbial sequences. A simple example of one of these alignments around DGAP154 breakpoint 3 shows that many microorganisms align with human DNA. The similarity between critical human DNA epigenetic factors and microbial DNA (which is more abundant) can set up competitions during recombination and break repair (Figure 7 bottom). Large changes in the sets of homologous foreign DNAs also accompany chromosome gene rearrangements that affect driver genes.

Foreign DNA sequences can compete with human DNA at epigenetic regulators around breakpoints.
Multiple infections identified by homology match signs and symptoms of neurodevelopmental disorders
These kinds of alignments suggest candidates that can contribute to the signs and symptoms in each individual (Table 2). Eight patients have growth retardation, and 20 of 48 patients had impaired speech. 1 Multiple infections can cause these problems. For instance, HIV-1 causes white matter lesions associated with language impairments and also harms fetal growth. There are nearly 50 matches to HIV-1 DNA in the chromosome anomalies of 35 patients.
Recurrent infections found to have homology to chromosomal abnormalities in neurologic birth defects can cause developmental defects.
Abbreviations: CMV, cytomegalovirus; CNS, central nervous system; EBV, Epstein-Barr virus; HTLV, Human T-cell lymphotropic virus.
Within chromosome anomalies, stealth viruses have about 35 matching sequences. Stealth viruses are mostly herpes derivatives that emerge in immunosuppressed patients such as cytomegalovirus (CMV). Stealth virus 1 (Table 2) is Simian CMV with up to 95% sequence identity to isolates from human patients. First trimester CMV infection can cause severe cerebral abnormalities followed by neurologic symptoms. 49 CMV is also a common cause of congenital deafness and visual abnormalities. Twenty-seven of 48 neurodevelopmental patients had hearing loss. Herpes simplex virus is another stealth virus that directly infects the central nervous system and can cause seizures (reported for 9 patients).
Chromosome anomalies in patient DGAP159 have strong homology to
Signs and symptoms in patient DGAP159 are consistent with known neurodevelopmental effects of bacterial meningitis including hearing loss, developmental delay, speech failure, and visual problems.
Tests for artifacts in matches to human-microbial DNA
Genome rearrangements for patient data 1 produced 1986 matches to microbial sequences (range = 66%-100%) with E ⩽ e-10 and a mean value of 83% identity. About 190 Alu sequences resembled microbial sequences, supporting the idea that homologies among repetitive human sequences and microbes are real. Correspondence between microbial sequences and multiple human repetitive sequences increases possibilities that microbial sequences can interfere with essential human processes. Contamination of microbial DNA sequences by human Alu elements 50 was ruled out by comparing about 450 AluJ, AluS, and AluY sequences with all viruses and bacteria in the National Center for Biotechnology Information (NCBI) database.
Tests for DNA sequence artifacts
To further test the possibility that some versions of these microbial sequences were sequencing artifacts or contaminated by human genomes, microbial genomes were (reverse) tested for homology to human genomes. Similarities to human sequences were found across multiple strains of the same microorganism (Table 3). For example, an Alu homologous region of the HIV-1 genome (bps = 7300-9000) in 28 different HIV-1 isolates was compared with human DNA. All 28 HIV sequences matched the same region of human DNA, at up to 98% identity. In contrast, only 1 of 20 zika virus sequences matched humans, and zika virus was not considered further.
Independent evidence that microbial genomes have regions of homology to human DNA as predicted by results.
A model for infection interference in neurodevelopment
Autosomal dominant inheritance of neurodevelopmental disorders containing microbial DNA suggests interference with gamete generation in 1 parent. The mechanism proposed in Figure 8 is based on significant changes in microbial homologies on multiple human chromosomes. Large amounts of foreign DNA present during human meiosis with its many double-strand breaks during the most active period of recombination produce defective gametes. Errors in spermatogenesis underlie a prevalent and recurrent gene rearrangement that causes intellectual disability, and dysmorphism (Emanuel syndrome). 2 In contrast, recombination in ova occurs in fetal life and then meiosis is arrested until puberty. 51

Soon after conception, erasure of epigenetic marks generates pluripotent stem cells and then the epigenome is reprogrammed.
The resemblance of foreign DNA to host background DNA may be a major factor in selecting infection and in human ability to clear the infection. Only 1 rare defective gamete is modeled in Figure 8 but the male generates 4 gametes during meiosis beginning at puberty. Only 1 gamete survives in the female because 3 polar bodies are generated. In neurodevelopmental disorders, massive changes in similarities to foreign DNAs accompany chromosome anomalies such as deletions and insertions. Foreign DNAs can insert itself, interfere with epigenomic marking or with break repairs during meiosis. A preexisting balanced chromosomal translocation in the family 3 increases the chances of generating a defective gamete during meiosis.
Discussion
Long stretches of DNA in many foreign DNAs match millions of repetitive human DNA sequences. Individual microorganisms also match nonrepetitive sequences. Human infections may be selected for and initially tolerated because of these matches. It is almost impossible to completely exclude the possibility of sequencing artifacts or contamination of microbial sequences with human sequences. However, rather than reflecting widespread, wholesale error due to human DNA contamination in many laboratories over many years, microbial homologies more likely suggest that DNA sequences in the microbiome have been selected because they are homologous to regions of human DNA. This may be a driving force behind the much slower evolution of human repetitive DNAs.
Infections such as exogenous or endogenous retroviruses are known to insert into DNA hotspots.
10
Foreign DNAs are proposed to drive neurodevelopmental anomalies because humans harbor large numbers of foreign DNAs. Changes in the composition of foreign DNAs can stabilize rearrangements and favor pathogens. (Figure 8). Thus, the genetic background of an individual may be a key factor in determining the susceptibility to infection and to the effects of infection. At the genetic level, this suggests selective pressures for infections to develop and use genes that are similar to human versions and to silence or mutate genes that are immunogenic. Infection genomes evolve rapidly on transfer to a new host.
52
The presence of genes in infections that have long stretches of identity with human genes makes the infection more difficult to recognize as nonself. For example, there is no state of immunity to
Neurons interact with cells in the immune system, sensing and adapting to their common environment. These interactions prevent multiple pathological changes. 54 Many genes implicated in neurodevelopmental diseases reflect strong relationships between the immune system and the nervous system. It was always possible to find functions within the immune system for genes involved in neurodevelopmental disorders (Figures 1, 4, and 6). Damage to genes essential to prevent infection leads to more global developmental neurologic defects including intellectual disability. These homologies include known microbes known to produce teratogens. Analysis of mutations within clusters of genes deleted in neurodevelopmental disorders predicts loss of brain-circulatory barriers, facilitating infections. Damage to cellular genes essential for autophagy may lead to abnormal pruning of neural connections during postnatal development.
Aggregated gene damage accounts for immune, circulatory, and structural deficits that accompany neurologic deficits. Other gene losses listed in Table 1 and in deleted chromosome segments (Figure 1) account for deficits in cardiac function, cell barriers, bone structure, skull size, muscle tone, and many other nonneurologic signs of neurodevelopmental disorders.
The arrangement of genes in clusters converging on the same biological process may simplify the regulation and coordination between neurons and other genes during neurodevelopment and neuroplasticity. Genes that are required for related functions, requiring coordinated regulation have been shown to be organized into individual topologically associated domains. 55 Neurons are intimately connected to chromatin architecture and epigenetic controls. 56 A disadvantage of the clustered arrangement of co-regulated accessory genes is that homology to microbial infections or other causes of chromosome anomalies anywhere in the cluster can then ruin complex coordinated neurological processes.
The results in Table 1 and Figure 6 emphasize the role of epigenetic factors in neurodevelopmental diseases. Chromatin modifier genes are disproportionately affected in patients with neurodevelopmental disorders 1 and include 2 types of modifiers. Epigenetic factors signal chromatin remodelers, which are large multi-protein complexes. Epigenetic factors are responsible for differentiation from pluripotent states; chromatin remodeling also has major roles in developmental stage transitions. There are 5 families of chromatin remodelers that all control access to DNA within nucleosomes, exchanging and repositioning them. Chromatin remodeling arrays contain an ATPase subunit resembling motor proteins 57 and are distinct from epigenetic factors.
Epigenetic regulators that affect multiple functions required by the same process make their mutation especially critical. Longer range developmental interactions in chromosome regions exacerbate the effects of infection. Mutations or deletions (Figure 1 and Table 1) show that this effect can occur in neurodevelopmental disorders. Functions that must be synchronized are grouped together on the same chromosome region and can be lost together.
Microbial DNA sequences are unlikely to be contaminants or sequencing artifacts. They are all found connected to human DNA in disease chromosomes; for example, multiple microbial sequences from different laboratories are all homologous to the same Alu sequence. Alu element–containing RNA polymerase II transcripts (AluRNAs) determine nucleolar structure and rRNA synthesis and may regulate nucleolar assembly as the cell cycle progresses and as the cell adapts to external signals. 58 HIV-1 integration occurs with some preference near or within Alu repeats. 59 Alu sequences are largely inactive retrotransposons, but some human-microbial homologies detected may be due to insertions from Alu or other repetitive sequences. Neuronal progenitors may support de novo retrotransposition in response to the environment or maternal factors. 60
Their variability and rarity make neurologic disorders difficult to study by conventional approaches. The techniques used here can improve prenatal and genetic counseling. However, a limitation is the inability to unequivocally identify 1 infection and to absolutely distinguish infection by 1 foreign DNA from multiple infections.
Conclusions
DNAs in some congenital neurodevelopmental disorders closely match multiple infections that extend over long linear stretches of human DNA and often involve repetitive human DNA sequences. The affected human sequences are shown to exist as linear clusters of genes closely spaced in 2 dimensions.
Interference from infection and foreign DNAs can delete or damage human gene clusters and alter the epigenome. This interference accounts for immune, circulatory, and structural deficits that accompany neurologic deficits.
Neurodevelopmental disorders are proposed to begin when parental infections cause insertions or interfere with epigenetic markings and meiosis. Shifts in homologous sets of foreign DNAs can be massive and may drive chromosomal rearrangements.
Congenital neurodevelopmental disorders are thus viewed as resulting from an assault on human DNA by foreign organisms. Recognizing and considering these effects can inprove prenatal and genetic counseling.
