GEN: Big impact of big data mining on cancer treatment

Release date: 2016-05-13

A large number of tumor DNA sequencing channels are close to 20,000 genomes

In a panel discussion at the World Economic Forum in Davos, Switzerland in January 2016, US Vice President Joe Biden asked clinicians and researchers to cite examples of human barriers to fighting cancer. When several important topics emerge, the most important issue is “big data”, specifically, the collection, analysis and application of “big data”.

Researchers say that "big data" is effective because there is significant information that can be analyzed from big data sets. The larger the volume of a large sample, the more small problems that are difficult to find in small sample volumes. Other researchers say: "Big data" is so big? But the more the better.

Keith Perry, chief information officer and senior vice president of the St. Jude Children's Research Center in the United States, believes that "big data" includes an additional three layers of meaning: the data type of multiple varieties, the speed of data generation, and the degree of data integration. In his view, many of the current databases do not interface with each other because they are produced by separate prevention, research, and clinical departments, and there is currently no potential platform for integrating these different structures and centralizing information.

Another Dr. Narayan Desai from Ericsson cited his 2015 news article, the basic question that genomics will have to solve is how the data is generated. Although the current data collection and analysis capabilities are limited, it should be used well, because the accessibility of sequencing will lead to explosive growth of access information, and is largely dispersed, and traditional information mining will be difficult to solve.

Impact 1: hidden weaknesses

Recently, some scientists have suggested that targeted and creative use of existing data can guide clinical practice. Professor Nevann Krogan from the University of California, San Francisco (UCSF) said that genomics has brought about major changes in cancer treatment, far more than the genetics of the past. Although the sequencing provider believes that the more money we invest, the clearer the results, and in fact it is not. We have now reached the saturation point for extracting valid information.

In the case of cancer, there have been "massive" data for a variety of cancers. Despite the data boom, Professor Krogan believes that the data needed to break through cancer treatment has reached the target. The sheer volume of new data can only show the amazing diversity of cancer, and even a single tumor contains unique thousands of genetic mutations, making it harder for researchers to figure out which genes are driving disease.

Professor Krogan and colleagues published an article on Molecular Cell on May 21, 2015: In addition to accumulating more data, researchers need to more closely identify the association of existing data and set up the “Tumor Cell Map Project”. (CCMI), which aims to systematically introduce the interactions between cancer genes and how they lead to disease and health, thus studying the "road map" of mutant genes and proteins in cancer cells.

Impact 2: Tumor samples

The Tumor Cell Map Project (CCMI) brings together top biomedical scientists at the University of California, San Diego (UCSD) and top cell structural scientists at the University of California, San Francisco (UCSF) to study genomics-related information. Explain tumor genomic information.

Professor Ideker of the University of California, San Diego, said that the DNA sequencing of cancer has reached nearly 20,000 genomes, but it is still difficult to analyze the genetic network of the cancer genome, that is, "no two tumor patients look very similar at the genetic level." The Cancer Genome Atlas (TCGA) project, the International Association of Cancer Genome (ICGC), has begun systematically analyzing multiple information on thousands of tumors, including mRNA and microRNA expression, DNA copy number and methylation, and DNA sequences.

There is a strong need for a method to integrate and interpret molecular-scale molecular information to gain insight into the process of driving tumor progression. There is also an urgent need for medical institutions to address the company's inability to obtain clinically relevant data when analyzing tumor genes. An improper conclusion.

Impact 3: Subnetwork Analysis

Solving sub-network analysis requires an integrated information approach, especially to synthesize database interactions between genes that express known proteins in their internal subnets or pathways. This requires a huge interactive network of aggregated expressions of genes or proteins formed within each subnetwork, rather than a single gene or protein.

The researchers said that these subnets can identify different clinical behaviors caused by differences in gene expression between patients in different populations. Compared to traditional analysis, although this method requires a large amount of bioinformatics, statistics and protein structure knowledge, this subnet analysis can explain the molecular pathway under the difference of gene expression, after all, the data it uses already exists.

Dr. Ideker and his peer bioinformatics experts say that for most patients with moderate breast cancer risk, traditional factors are not predictive, and about 70-80% of lymphoma-negative patients are receiving unnecessary adjuvant chemotherapy. Many of the current risk factors may be secondary manifestations rather than the primary mechanism of disease. A new challenge is how to identify new diseases that are more directly related to the disease, and more accurately predict the risk transfer of individual patients.

Impact 4: The impact of prognosis

The researchers' latest findings support genetic network analysis to provide prognostic information. For example, Dr. Chang and colleagues at the University of California, San Diego (UCSD) published a paper in the 2012 issue of Blood: Using a collection of monoclonal B cells in the blood, bone marrow, and secondary lymphoid tissue, using a genetic network to analyze and predict chronic Characteristics of patients with lymphocytic leukemia (CLL).

Specifically, the researchers used the subnetwork-based gene expression analysis profile area to group the risk of progression of chronic lymphocytic leukemia in different patients. There are large differences in patients with chronic lymphocytic leukemia in clinical patients: some patients are asymptomatic for many years; some patients have more severe symptoms shortly after being diagnosed.

Since there must now be clear evidence of disease progression or disease-related complications, it is recommended to discontinue treatment, but standard therapies are associated with significant toxicity, so accurate predictions are critical. Reports by gene chip detection have shown that surrogate markers can be used as prognostic factors for chronic lymphocytic leukemia, such as IGHV mutation status.

The expression level of the predicted subnet changes over time, but shows strong similarity at a later point in time. Big data mining has become a therapeutic strategy and a potential observational route for cancers such as patients with chronic lymphocytic leukemia.

Source: Bio-Exploration

Nutritional Specialties

We supply specialties, which is food indicated to have specific health functions. That is suitable for specific food groups, can regulate body functions, not to treat the disease for the purpose of food.


1. Materials or ingredients used by the conventional food processing.
2. In the usual form and method of ingestion.
3. Marked with a label biological adjustment functions.

Nutritional Supplements,Nutritional Products,Regional Specialties,Nutritional Specialties

SINOCHEM PHARMACEUTICAL CO., LTD , https://www.sinochemnutrition.com