Bioinformatics analysis identifies key genes of prognostic value in lung cancer

Lung cancer is the most common human malignancy worldwide and can be divided into different types of carcinomas depending on their pathological features. Advances in medical science and technology have led to the identification of some lung cancer-related marker genes, including EGFR (epidermal growth factor receptor), BRAF (B-Raf proto-oncogene), RAS (RAS proto-oncogene, GTPase) and HER2 (human epidermal growth factor receptor 2). However, the underlying biomarker and key genes associated with different types of lung cancer are still poorly understood. In this study, we analyzed a GEO (Gene Expression Omnibus) dataset and identified 28 upregulated intersection DEGs (different expression genes) and 125 downregulated intersection DEGs among AC (adenocarcinoma), PTC (primary typical carcinoid), PLCC (primary large cell carcinoma), PLCNC (primary large cell lung carcinoma) and PSCLC (primary small cell lung carcinoma). Through PPI (protein-protein interaction) network analysis, we identified 14 genes among the DEGs, namely MFAP4 (microfibril-associated protein 4), PDZD2 (PDZ domain containing 2), FBLN1 (fibulin 1), FBLN5 (fibulin 5), EFEMP1 (EGF containing fibulin extracellular matrix protein 1), KDR (kinase insert domain receptor), S1PR1 (sphingosine-1-phosphate receptor 1), CAV1 (caveolin 1), GRK5 (G protein-coupled receptor kinase 5), EDNRA (endothelin receptor type A), EDNRB (endothelin receptor type B), CALCRL (calcitonin receptor-like receptor), PTGER4 (prostaglandin E receptor 4), and ADRB1 (adrenoceptor beta 1), which were found to be downregulated in different subtypes of lung cancer and associated with poor survival outcomes. In addition, most of the screened DEGs demonstrated good predictive ability in LUAD (lung adenocarcinoma) and LUSC (lung squamous cell carcinoma). Among them, MFAP4 was found to promote cell proliferation while also suppressing cell migration and angiogenesis. In summary, we propose MFAP4, PDZD2, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, EDNRB, CALCRL, PTGER4 and ADRB1 as potential prognostic markers in lung cancer patients.


Introduction
Lung cancer is the leading cause of cancer-related death worldwide. In non-small cell lung cancer (NSCLC), the importance of genes as biomarkers is increasingly recognized for their pivotal role in diagnosis, treatment and prognosis estimation, gradually surpassing the significance of histological classification [1]. Comparatively, less than half of lung adenocarcinoma cases exhibit targeted driver gene mutations such as EGFR, BRAF and HER2 or rearrangements involving ALK (ALK receptor tyrosine kinase) and ROS1. However, for other NSCLC cases lacking these specific molecular markers, particularly non-squamous NSCLC, the only treatment option is conventional platinum-based dual therapy [2]. Despite significant advancements in chemotherapy, radiotherapy, surgery, and immunotherapy for lung cancer, the 5-year survival rate for NSCLC patients remains low at 23%. To address this, comprehensive studies are necessary to uncover the molecular mechanisms underlying lung cancer and develop effective therapeutic strategies.
Significant progress has been achieved in the field of bioinformatics, particularly with advancements in microarray technology, has witnessed rapid progress, which has enabled the widespread use of high-throughput gene expression analysis platforms to identify differentially expressed genes (DEGs) involved in tumorigenesis [3,4]. Through the application of microarray technology, an increasing number of DEGs associated with lung cancer have been discovered [5,6]. Nevertheless, it is important to acknowledge that identifying DEGs through this technology has certain limitations regarding sample size, TMN (Tumor, Node, Metastasis) classification, gender, ethnicity and other factors, which should be accounted when interpreting the results of DEG analyses [3].
In this study, we aimed to identify key genes associated with lung cancer by analyzing microarray data obtained from the GEO database using bioinformatics analysis. We investigated the relevant Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways specific to lung cancer and constructed networks to identify related key genes. Furthermore, by analyzing data from The Cancer Genome Atlas (TCGA) encompassing lung adenocarcinoma and lung squamous cell carcinoma, we examined the expression of these key genes in NSCLC and evaluated their impact on patients' survival. In addition, we also explored the application of these potential candidate biomarkers in the diagnosis of NSCLC, providing valuable insights into potential targets and prognostic markers for enhancing the diagnosis and treatment of lung cancer.

PCA, volcano plot and Venn diagram analysis
PCA (principal component analysis) was performed as previously depicted [7].
The GEO dataset GSE1037, which consists data on cDNA microarray containing 40,386 elements, was used to analyze the gene expression profiles of 38 surgically resected samples of lung neuroendocrine tumors and 11 SCLC (small cell lung cancer) cell lines (accessed via: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1037) with the R package (limma analysis), after applying a batching correction. PCA was used to visualize the differences between the normal group (primary normal lung tissue, n = 19) and the cancer group, including PTC (primary typical carcinoid, n = 12), AC (adenocarcinoma, n = 12), PLCNC (primary large cell neuroendocrine carcinoma, n = 8), PSCLC (primary small cell lung carcinoma, n = 15) and PLCC (primary large cell carcinoma, n = 14).
In the volcano plot, the vertical lines represent genes with an absolute value of |log2(fold change)| ≥ 2, indicating upregulated or down-regulated DEGs, while the horizontal line indicates a significance level of p < 0.05. Significantly upregulated genes are denoted by red dots, while significantly downregulated genes are represented by blue dots. By using a Venn diagram, we identified 28 upregulated genes and 125 downregulated genes that were differentially expressed based on the intersection of the gene sets.

GO/KEGG pathway enrichment analysis and protein-protein interaction analysis
The expression of 14 key genes in lung tumor and nontumor tissues was analyzed using the FactoMineR package in R, and the results were visualized using heat maps. GO-KEGG analysis (https://www.genome.jp/kegg/) was performed as previously described [8]. The online platform Metascape (https://metascape.org/) and R language version 4.3.1 (Beagle Scouts, Murray Hill, NJ, USA) were used for enrichment analysis of those above-mentioned intersecting differential genes. PPI (protein-protein interaction) analysis was performed on the STRING online platform (https://stringdb.org/).

Correlation and cluster analysis of differential gene expression
The correlation of key gene expression between normal lung tissue and lung carcinoma tissue was analyzed using the clus-terProfiler package of R language [9, 10].

ROC curve analysis
ROC (Receiver Operating Characteristic) curves of differential genes were analyzed using the R language based on the TCGA-GTEX (Genotype-Tissue Expression) data. The type of data from TCGA comprised RNAseq data, while data from GEO were derived from cDNA microarrays.

Cell proliferation, migration and angiogenesis
Cell proliferation and migration assays were conducted following previously described methods [12]. The indicated cell lines were seeded into a 96-well plate to assess cell viability at various time points (0 h, 24 h, 48 h and 72 h). Additionally, the same cell lines were seeded into a transwell chamber after incubating for 24 h, following which the number of migrated cells was counted. For angiogenesis, human umbilical vein endothelial cells (HUVECs) were cultured in a medium obtained from lung cancer cells transfected with the designated plasmids for 48 h to conduct the tubule formation experiment. The cells were washed, detached, seeded and imaged as previously described [13].

Statistical analysis
The SPSS v22.0 software (IBM, Armonk, NY, USA) was used for statistical analyses. The Student's t-test and one-way ANOVA (Analysis of variance) were performed, and data are presented as means ± SEM (standard error of the mean) of three independent experiments. A p-value of 0.05 or less was considered significant.

Analysis of differential genes DEGs in lung cancer based on GEO data
Bioinformatics methods were applied to analyze DEGs from GSE1037 to identify key genes related to the prognosis of lung cancer patients. After batch correction, the data were analyzed by PCA (Fig. 1A). The results showed significant differences between diverse kinds of lung cancer, including AC, PLCC, PLCNC, PSCLC and PTC, and the normal tissue group. We analyzed the DEGs between the lung cancer and normal group by volcano plot and identified 743 upregulated and 769 downregulated genes in PTC, 313 upregulated and 199 downregulated genes in AC, 454 upregulated and 501 downregulated expression genes in PLCNC, 325 upregulated and 1014 downregulated expression genes in PSCLC, and 537 upregulated and 416 downregulated genes in PLCC (Fig. 1B). From these, 28 upregulated and 125 downregulated intersection DEGs were identified among the AC, PTC, PLCC, PLCNC and PSCLC samples via Venn analysis (Fig. 1C).
F I G U R E 1. Analysis of differential genes DEGs in lung cancer based on GEO data. (A) PCA was used to analyze the difference between normal and lung cancer, including AC (adenocarcinoma), PTC (primary typical carcinoid), PLCC (primary large cell carcinoma), PLCNC (primary large cell lung carcinoma) and PSCLC (primary small cell lung carcinoma). (B) Volcano plot was used to assess DEGs between normal and different types of lung cancer. Blue: decreased expression genes, red: increased expression genes, grey: no significant change expression genes. (C) Venn plot analyzed the intersection genes among AC (adenocarcinoma), PTC (primary typical carcinoid), PLCC (primary large cell carcinoma), PLCNC (primary large cell lung carcinoma) and PSCLC (primary small cell lung carcinoma).

Differential gene enrichment analysis, cluster analysis and protein interaction network PPI analysis
Enrichment analysis was performed on the above intersection DEGs ( Fig. 2A), and the results indicated that the screened DEGs were associated with angiogenesis, cell-substrate adhesion, circulatory system and locomotion process. KEGG enrichment analysis showed that the DEGs were mainly involved in the MAPK pathway. Additionally, the PPI network analysis identified 14 genes among the DEGs, namely MFAP4, PDZD2, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, EDNRB, CALCRL, PTGER4 and ADRB1 (Fig. 2B).
Next, we analyzed the expression of the selected 14 DEGs in tumor tissues and normal tissues of lung cancer patients and found that all were downregulated expression in lung carcinoma tissues compared to normal lung tissues (Fig. 3A). Moreover, the PPI analysis (Fig. 2B) demonstrated a consistent interaction pattern among the 14 DEGs (Fig. 3B). Notably, our analysis using R revealed that nearly all of the 14 DEGs displayed significant correlations with each other (Fig. 3C).

Analysis of differential gene expression in different subtypes of lung cancer from GEO and TCGA datasets
Based on the GEO chip data GSE1037, we analyzed the expression levels of the 14 DEGs in AC, PTC, PLCC, PLCNC and PSCLC by box plot (Fig. 4) and found no significant difference in PDZD2 between the different subtypes of lung cancer and the normal group. The expression of MFAP4 was downregulated in all kinds of lung cancer except for PTC, and PTGER4 was downregulated in different subtypes of lung cancer, excluding PTC and PLCNC. Additionally, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, ED-NRB, CALCRL and ADRB1 were all significantly decreased in AC, PTC, PLCC, PLCNC and PSCLC compared to the normal group (Fig. 4). Similarly
Collectively, our data indicate that these 14 DEGs were downregulated in different subtypes of lung cancer.

Influence of DEGs on lung adenocarcinoma and lung squamous cell carcinoma survival
Next, we analyzed the relationship between the expression of the 14 DEGs and survival rates. As shown in Fig. 6, for LUAD, a low expression of MFAP4, PDZD2, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, EDNRB, CALCRL, PTGER4 and ADRB1 were related with poor survival. Comparatively, for LUSC, only MFAP4 and PTGER4 were positively correlated with survival rate, and there was no difference between other gene expressions and survival ( Fig. 6 and Supplementary Fig. 1).

Effects of upregulated MFAP4 on angiogenesis and MAPK pathway in lung cancer cells
According to the GO/KEGG enrichment analysis, differential genes were significantly enriched in angiogenesis, cell adhesion movement and MAPK pathway, further confirming the role of MFAP4 at the cell level. From the above-screened genes, MFAP4 had the most differential expression and was the most significant marker associated with patient survival and was thus selected for validation. As shown in Fig. 8A, F I G U R E 5. Analysis of differential gene expression in lung adenocarcinoma and lung squamous cell carcinoma based on the TCGA database. GEPIA (gene expression profiling interactive analysis) online software was used to analyze the expression of MFAP4 (microfibril-associated protein 4), PDZD2 (PDZ domain containing 2), FBLN1 (fibulin 1), FBLN5 (fibulin 5), EFEMP1 (EGF containing fibulin extracellular matrix protein 1), KDR (kinase insert domain receptor), S1PR1 (sphingosine-1-phosphate receptor 1), CAV1 (caveolin 1), GRK5 (G protein-coupled receptor kinase 5), EDNRA (endothelin receptor type A), EDNRB (endothelin receptor type B), CALCRL (calcitonin receptor-like receptor), PTGER4 (prostaglandin E receptor 4), and ADRB1 (adrenoceptor beta 1) in LUAD (lung adenocarcinoma) and LUSC (lung squamous cell carcinoma).
F I G U R E 6. The influence of differential genes on the survival and prognosis of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Kaplan-Meier plot analysis was performed to determine the relationship between key gene expression and survival in LUAD and LUSC. p < 0.05.
MFAP4 expression was decreased in A549 and SK-MES-1 cells compared with BEAS-2B cells. MFAP4 overexpression inhibited cell proliferation in A549 and SK-MES-1 cells compared to the control cell lines. The knockdown of MFAP4 induced cell proliferation in the indicated cell lines (Fig. 8B) [14]. Moreover, cell migration was decreased in A549 and SK-MES-1 cells following transfection with MFAP4, while it was increased in MFAP4 deficient A549 and SK-MES-1 cell lines (Fig. 8C). In addition, conditioned medium from MFAP4 overexpression reduced angiogenesis in HUVECs cell lines, and the knockdown of MFAP4 facilitated angiogenesis in HU-VECs cell lines (Fig. 8D). Mechanistically, the overexpression of MFAP4 suppressed the phosphorylation of ERK1/2, p38 and JNK (c-Jun N-terminal kinases) in A549 and SK-MES-1 cells. MFAP4 deficiency induced the expression of p-ERK1/2, p-p38 and p-JNK in A549 and SK-MES-1 cell lines (Fig. 8E). Collectively, these data suggest that MFAP4 negatively regulates cell proliferation, migration and angiogenesis.

Discussion
Lung cancer is a highly prevalent and deadly disease worldwide, causing an estimated 1.6 million deaths annually [15]. Most lung cancer cases are classified as NSCLC, which is associated with low survival rates [16]. The complexity, histopathology, and clinical features of lung cancer make it challenging to understand the underlying mechanisms of its development and progression [17]. In this study, we used bioinformatics methods and screened 14 significant DEGs, including MFAP4, PDZD2, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, EDNRB, CALCRL, PTGER4 and ADRB1, in different lung cancer subtypes, which was found to be involved in angiogenesis, cell adhesion and locomotion pathways. In addition, the correlated DEGs were downregulated in lung cancer and positively associated with patient survival.
Pulmonary neuroendocrine tumors are commonly encoun-F I G U R E 7. Significance of differential genes from the TCGA database in lung cancer diagnosis. ROC (receiver operating characteristic) curve analysis was used to assess the accuracy of each prognostic gene in different types of lung cancer. AUC (Area Under the ROC Curve) ≥0.9, CI (confidence interval): 0.8-0.9.
tered in surgical pathology and can be classified into four types, namely primary typical carcinoid (TC), atypical carcinoid, small cell lung cancer (SCLC) and large cell neuroendocrine carcinoma (LCNEC). Pulmonary neuroendocrine tumors account for approximately 20% of all primary lung tumors, encompassing classic carcinoid, atypical carcinoid, small cell carcinoma and large cell neuroendocrine carcinoma, and approximately 25% of neuroendocrine tumors [18]. The diagnosis of pulmonary neuroendocrine tumors presents challenges due to morphological similarities with other entities. Firstly, identifying their neuroendocrine characteristics can be challenging, especially in biopsies with limited tissue samples and intraoperative frozen sections. Secondly, distinguishing true mitosis and nuclear pyknosis can be challenging, and focal necrosis may be easily overlooked, leading to difficulties in classifying tumors into distinct categories. Thirdly, clinically differentiating primary tumors from metastases may not always be evident. The classification criteria are specifically designed for primary neuroendocrine tumors of the lung, and it remains unclear how these criteria apply to metastatic tumors. In this present study, bioinformatics analysis was performed to analyze the common intersected DEGs in five subtypes of lung cancers (AC, PTC, PLCC, PSCLC, PLCNC).
Cancer is characterized by its high viability, metastatic potential and easy relapse. Therefore, research efforts often focus on understanding the role of DEGS in cell proliferation, migration, invasion and apoptosis in different kinds of cancer. Similarly, we also demonstrated that the selected DEGs were involved in angiogenesis, cell adhesion and locomotion pathways, which indicated their important role in the progression and development of lung cancer. Here, MFAP4, PDZD2, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRA, EDNRB, CALCRL, PTGER4 and ADRB1 were associated with tumorigenesis by GO-KEGG enrichment analysis.
LUAD and LUSC are the two most common types of NSCLC, and LUSC is the main pathological type of lung malignancy [19][20][21]. Currently, less than 10% of patients with advanced metastatic LUSC can live more than 5 years [22]. A previous study demonstrated that CEP55 is a diagnostic marker and independent prognostic factor in LUAD and LUSC [23]. The downregulated expression of miR-126-3p was associated with the occurrence and progression of LUAD [24]. GPR115 strongly correlated with poor prognosis of LUAD but not LUSC [25]. High expression of GSDMD indicated a poor prognosis in LUAD but not in LUSC were cultured with a transfected medium of lung cancer cells to conduct tubule formation experiments with quantitative data attachment. (E) The related protein expression was measured by western blotting. Quantitative protein expression (MFAP4, p-ERK1/2, p-p38, and p-JNK (c-Jun N-terminal kinases)) data is attached. Error bars represent data from three independent experiments (mean ± SD). ##p < 0.01. **p < 0.01. #compared to the control group. *compared to the shNC group. [26]. Similarly, we found that MFAP4 and PTGER4 were associated with the survival rate of LUSC patients, while the expression of MFAP4, FBLN1, FBLN5, EFEMP1, KDR, S1PR1, CAV1, GRK5, EDNRB, CALCRL and ADRB1 was positively correlated with the survival of LUAD patients. However, the underlying function of these DEGs in LUSC and LUAD needs further research. Generally, MFAP4 was identified as a tumor suppressor in human cancers, with its overexpression significantly associated with better prognosis in breast cancer and reported to be a downstream target gene regulated by miR-147b, exerting anti-proliferation function in LUAD cell lines [14,27]. Moreover, Han et al. [28] reported that the C8orf34-as1/miR-671-5p/MFAP4 regulatory cascade played a critical role in LUAD cell migration. Here, we also confirmed the same role of MFAP4 in LUSA and LUAD. However, the underlying molecular mechanism of MFAP4 in lung cancer progression needs more research.

AVA IL AB ILI T Y OF DATA AN D M AT E R I A L S
The data are contained within this article and supplementary material.

A U TH OR CO NT RI BU TI ONS
DS and LS-designed the study, supervised the data collection, analyzed the data, interpreted the data, prepared the manuscript for publication and reviewed the draft. All authors have read and approved the manuscript.

E T H I CS A PPR OVA L AN D CONS E N T TO PA RT ICI PAT E
Not applicable.

AC K NOW LED G ME N T
Not applicable.

F U ND ING
This research received no external funding.

CO NFL ICT OF IN T ERE ST
The authors declare no conflict of interest.

S UP PLEMENTARY MATERIAL
Supplementary material associated with this article can be found, in the online version, at https://oss.jomh.org/ files/article/1685899010770911232/attachment/ Supplementary%20material.docx.