Human Y-chromosome gene classification using Fractal Dimension & Shannon Entropy

366 0 0.0 ( 0 )

Download Cite

Added by Todd Holden

Publication date 2014

fields Biology

and research's language is English

Authors Todd Holden - JianMin Ye

Genomics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

All genes on the human Y-chromosome were studied using fractal dimension and Shannon entropy. Clear outlier clusters were identified. Among these were 6 sequences that have since been withdrawn as CDSs and 1 additional sequence that is not in the current assembly. A methodology for ranking the sequences based on deviation from average values of FD and SE was developed. The group of sequences scored among the 10% largest deviations had abnormally high likelihood to be from centromeric or pseudoautosomal regions and low likelihood to be from X-chromosome transposed regions. lncRNA sequences were also enriched among the outliers. In addition, the number of expressed genes previously identified for evolutionary study tended to not have large deviations from the average. Keywords: Y-chromosome; Shannon di-nucleotide entropy; fractal dimension; centromeric genes; gene degredation; lncRNA

rate research

Mapping the spectrum of 3D communities in human chromosome conformation capture data

79 - Sang Hoon Lee , Yeonghoon Kim , Sungmin Lee 2018

Several experiments show that the three dimensional (3D) organization of chromosomes affects genetic processes such as transcription and gene regulation. To better understand this connection, researchers developed the Hi-C method that is able to detect the pairwise physical contacts of all chromosomal loci. The Hi-C data show that chromosomes are composed of 3D compartments that range over a variety of scales. However, it is challenging to systematically detect these cross-scale structures. Most studies have therefore designed methods for specific scales to study foremost topologically associated domains (TADs) and A/B compartments. To go beyond this limitation, we tailor a network community detection method that finds communities in compact fractal globule polymer systems. Our method allows us to continuously scan through all scales with a single resolution parameter. We found: (i) polymer segments belonging to the same 3D community do not have to be in consecutive order along the polymer chain. In other words, several TADs may belong to the same 3D community. (ii) CTCF proteins---a loop-stabilizing protein that is ascribed a big role in TAD formation---are well correlated with community borders only at one level of organization. (iii) TADs and A/B compartments are traditionally treated as two weakly related 3D structures and detected with different algorithms. With our method, we detect both by simply adjusting the resolution parameter. We therefore argue that they represent two specific levels of a continuous spectrum 3D communities, rather than seeing them as different structural entities.

Genomics Statistical Mechanics Biological Physics

Size and location of radish 1 chromosome regions carrying the fertility restorer Rfk1 gene in spring turnip rape

472 - T. Niemela , M. Seppanen , F. Badakshi 2012

In spring turnip rape (Brassica rapa L. spp. oleifera) the most promising F1 hybrid system would be the Ogu-INRA CMS/Rf system. A Kosena fertility restorer gene Rfk1, homologue of the Ogura restorer gene Rfo, was successfully transferred from oilseed rape into turnip rape and that restored the fertility in female lines carrying Ogura cms. The trait was, however, unstable in subsequent generations. The physical localization of the radish chromosomal region carrying the Rfk1 gene was investigated using 8 GISH (genomic in situ hybridization) and BAC-FISH (bacterial artificial chromosome fluorescence in situ hybridization) methods. The metaphase chromosomes were hybridized using radish DNA as the genomic probe and BAC64 probe, which is linked with Rfo gene. Both probes showed a signal in the chromosome spreads of the restorer line 4021-2 Rfk of turnip rape but not in the negative control line 4021B. The GISH analyses clearly showed that the turnip rape restorer plants were either monosomic (2n=2x=20+1R) or disomic (2n=2x=20+2R) addition lines with one or two copies of a single alien chromosome region originating from radish. In the BAC-FISH analysis, double dot signals were detected in sub-terminal parts of the radish chromosome arms showing that the fertility restorer gene Rfk1 was located in this additional radish chromosome. Detected disomic addition lines were found to be unstable for turnip rape hybrid production. Using the BAC-FISH analysis, weak signals were sometimes visible in two chromosomes of turnip rape and a homologous region of Rfk1 in chromosome 9 of the B. rapa A genome was verified with BLAST analysis. In the future this homologous area in A genome could be substituted with radish chromosome area carrying the Rfk1 gene.

Genomics Populations and Evolution

Relationships among the nucleotide content of human genome sequence, gene structure, and gene expression features (PhD synopsis)

486 - Diana Duplij (Institute of Molecular Biology , Genetics , Kiev 2010

The Dissertation is focused on the studies of associations between functional elements in human genome and their nucleotide structure. The asymmetry in nucleotide content (skew, bias) was chosen as the main feature for nucleotide structure. A significant difference in nucleotide content asymmetry was found for human exons vs. introns. Specifically, exon sequences display bias for purines (i.e., excess of A and G over C and T), while introns exhibit keto-amino skew (i.e. excess of G and T over A and C). The extents of these biases depend upon gene expression patterns. The highest intronic keto-amino skew is found in the introns of housekeeping genes. In the case of introns, whose sequences are under weak repair system, the AT->GC and CG->TA substitutions are preferentially accumulated. A comparative analysis of gene sequences encoding cytochrome P450 2E1 of Homo sapiens and representative mammals was done. The cladistic tree on the basis of coding sequences similarity of the gene Cyp2e1 was constructed. A new programming tools of NCBI database sequence mining and analysis was developed, resulting in construction of a own database.

Genomics

Regularization Strategies for Hyperplane Classifiers: Application to Cancer Classification with Gene Expression Data

89 - Erik Andries 2006

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

Genomics

The dichotomy structure of Y chromosome Haplogroup N

526 - Kang Hu , Shi Yan , Kai Liu 2015

Haplogroup N-M231 of human Y chromosome is a common clade from Eastern Asia to Northern Europe, being one of the most frequent haplogroups in Altaic and Uralic-speaking populations. Using newly discovered bi-allelic markers from high-throughput DNA sequencing, we largely improved the phylogeny of Haplogroup N, in which 16 subclades could be identified by 33 SNPs. More than 400 males belonging to Haplogroup N in 34 populations in China were successfully genotyped, and populations in Northern Asia and Eastern Europe were also compared together. We found that all the N samples were typed as inside either clade N1-F1206 (including former N1a-M128, N1b-P43 and N1c-M46 clades), most of which were found in Altaic, Uralic, Russian and Chinese-speaking populations, or N2-F2930, common in Tibeto-Burman and Chinese-speaking populations. Our detailed results suggest that Haplogroup N developed in the region of China since the final stage of late Paleolithic Era.

Populations and Evolution