Knowledge structure and emerging trends of AR variants in prostate cancer: a bibliometric analysis based on CiteSpace and VOSviewer

Prostate cancer is a common malignancy in urology which often develops into castration-resistant prostate cancer (CRPC) after hormone therapy. Studies have shown that the mechanism of its occurrence is related to androgen receptor splice variants (AR splice variants). This work employs a bibliometric approach to explore the knowledge structure and emerging trends of AR variants in prostate cancer. The literature from 2000 to 2021 was obtained from Web of Science Core Collection (WoSCC)


Introduction
Prostate cancer (PCa) is a common malignant cancer among elderly men. PCa incidence increases yearly with the aging of population, which results in heavy disease burden [1]. PCa mortality rate is decreasing with the therapies development however it still poses threat to the health of middle-aged and elderly men [2]. In western countries, PCa remains a public health concern. PCa has strong hormone sensitivity at the early stage. Most PCa progress to the castration-resistance stage with androgen deprivation therapy (ADT) and/or androgen receptor signaling inhibitors [3]. Patients of this cancer stage develop primary or secondary resistance to androgen-or androgen receptor-targeted agents which are associated with multiple underlying mechanisms [4][5][6][7]. It is also reported that incidence of PCa is associated with germline mutations and polymorphisms of AR gene [8]. One of the mechanisms contributing to castration resistance is AR splice variants which are truncated isoforms of AR, lacking a ligand-binding domain in C-terminus [9]. AR variants have thus been focused regarding prostate cancer research. Studies have demonstrated the expression of AR variants, the correlation of their existence, and castration resistance progression in vitro and in vivo [10][11][12]. The overexpression of AR-V7 in metastatic castration-resistant prostate cancer (mCRPC) patients is associated with shorter survival and resistance toward enzalutamide and abiraterone treatments [7]. Patients with AR aberrations, including AR-V7, demonstrate worse outcomes compared to patients without AR aberrations upon treatment by ADT with or without apalutamide for metastatic castration-sensitive PCa [13]. No study based on bibliometric analysis has summarized the knowledge structure and research progress of AR variants in PCa from recent decades.
Bibliometrics analyze the current research status and future trends in a field. The method has been used in life science fields such as medicine, biology, and public health to provide ideas for exploring current status, content, and priority research hotspots [14][15][16]. The results from bibliometric analyses can help investigators in understanding the research field regarding current and future perspectives.
In this study, the publications related to AR variants in PCa are collected from 2000 to 2021. Bibliometric analysis is conducted for identifying the current research concerns, global trends and future challenges of this field.

Materials and methods
Web of Science Core Collection (WoSCC) was used as original data source, and "androgen receptor splice variant" and "prostate cancer" as search terms. The detailed search strategy was ((Title Set) TS = (androgen receptor variant*) OR TS = (AR variant*)) AND ((TS = (prostate cancer*) OR TS = (prostate carcinoma*) OR TS = (prostate neoplasm*)). The number of relevant studies in WoS was 1682 in the era from 01 January 2000 to 31 December 2021. Only original research and reviews were included in the literature research type to ensure data accuracy and validity. After screening, 1503 original studies were obtained for downloading in plain text format as "full records with cited references".
The data from WoSCC were imported to Excel, CiteSpace and VOSviewer(Excel, Microsoft Office LTSC Professional Plus 2021, Microsoft, Washington State, United States; CiteSpace, 6.1.R6, Redsell University, PA, United States; VOSviewer, 1.6.18, Science and Technology Research Centre, Leiden University, the Netherlands). These articles were analyzed regarding author, country, research institution, journal, keywords, and citations. In CiteSpace 6.1. R6, the time selection was set to 2000 to 2021 and time slice to 1 year. The number of citations or occurrences was set to top 10% per year, and the maximum number of studies in each time slice was set to 100. Pathfinder, pruning sliced networks and pruning merged networks were simultaneously selected in the pruning algorithm. The default settings were maintained in the construction algorithm in case there were uncountable cases. Author, institution, country, keyword, reference, cited author, and cited journal were analyzed separately for the node types. In the burst-word detection, γ (0,1) was set to 0.5, and minimum duration was set to 1. If fewer than 20 burst words are retrieved, the γ value is incrementally decreased by 0.1 until 20 words are retrieved. The graph generated by CiteSpace consisted of nodes and the corresponding lines. The node size is related to its frequency in corresponding analytical module; its color, its time; its links to the co-occurrence or co-citation between the nodes; the link thickness represents closeness of connection between the two nodes; and the link color represents time of connection between the nodes. The outer circle of nodes with centrality >0.1 is purple which indicates the node having significant influence in the module.
Data source in VOSviewer was the bibliographic data. The threshold was set to 5 articles, and only the linked nodes were retained in VOSviewer-generated graphs. The layout settings were set to "2" for attraction and "−1" for repulsion, and the rest were set to their defaults.

Publication distribution by year
Of 1503 papers, 1172 (77.98%) were the original articles and 331 (22.02%) the reviews. Based on the data analysis, publication numbers by year and trends regarding AR splice variants in PCa are shown in Fig. 1. The number of papers in this field has gradually increased since 2000 and the exponential growth from 2010 to 2016. This number was 4.8 times higher in 2017 than in 2010. The number of peaks in 2019 was 161. The trend line shows that the publication number fluctuated slightly from 2000 to 2021 but generally increased yearly. A binomial function from 2000 to 2021 was generated based on the correlation coefficient R 2 to further predict the publication trend of AR splice variants in PCa. The binomial function, y = 0.3307x 2 − 0.0036x + 11.318 (R 2 = 0.9067, where Y and X are the annual publications and year, respectively), was analyzed where the inflection point had not yet appeared. The results indicate rapid publication growth in recent years. The number of related citations has also increased annually. The frequency of this research has grown since 2018 resulting in the exponential increase of number of annual citations.

Countries and regions
These studies were published in 65 countries and regions. Table 1 lists the top 10 countries and regions according to the number of publications. Centrality, citation number, and total link strength are also included in the table. The United States had the most publications (n = 661, 43.98%), followed by China (n = 164, 10.91%), and Canada (n = 88, 8.09%). United States thus published the largest number of papers with the highest number of citations and the strongest total link strength which indicate the great influence of country's research within the field.
Collaborations between countries and regions are shown in Fig. 2 where there are 65 nodes and 402 links. The node size represents number of papers published in each country, and red circles outside the nodes represent number of papers published in that country in recent years. The purple circles outside nodes indicate that the node has high centrality which reveals its important position in cooperative relationship. The top five countries in terms of centrality are Australia (centrality = 0.56), England (0.47), Denmark (0.44), France (0.41), and Germany (0.40). Table 1 and Fig. 2 show that United States dominates the research on AR splice variants in PCa and exceeds other countries regarding publication number. However, in terms of collaborations between countries, Australia had the highest centrality and cooperations with multiple countries.

Organizations
According to the CiteSpace analysis, 503 organizations worldwide were involved in this field. Table 2 shows the top 10 organizations with the highest number of publications, centrality, citations, and total link strength. University of Washington had the highest publication number (n = 75, 4.99%), followed by University of British Columbia (n = 60, 3.99%), and the Fred Hutchinson Cancer Research Center (n = 49, 3.26%).  Two of the top 10 institutions with more than 5000 citations were University of Washington (n = 7987) and University of Minnesota (n = 6677). University of Washington had the highest total number of links (n = 287). University of Minnesota published less than 40 articles but its citations exceeded 6500, suggesting that this institution is in developmental stage but with significant impact. The top 10 organizations were from United States except the University of British Columbia from Canada. This finding is consistent with the results shown in  Co-cited journals refer to journals cited by publications in AR splice variant field that constitute the knowledge base of relevant studies. Top 10 co-cited journals in the field are listed in Table 4. Top three co-cited journals were Cancer Research (n = 940), Clinical Cancer Research (n = 696), and Proceedings of the National Academy of Sciences of the United States of America (n = 678). The journal with the highest IF was the New England Journal of Medicine (IF = 176.079). The analysis of centrality of cited journals showed that no journal had a centrality >0.1, indicating that this field has no representative journal regarding co-citations. TA B L E 3. Top 10 journals related to androgen receptor splice variants in prostate cancer from 2020 to 2021.

Authors and co-cited authors
In total, 691 authors contributed to 1503 publications. The collaborations between these authors are shown in Fig. 4A which include 691 nodes and 2247 internode links. The size of author nodes represents number of papers published by that author. Red circles outside the nodes represent number of publications by that author in recent years. The purple circles outside nodes represent author with the highest centrality. Nodes with purple circles indicate the author having important position in the collaborations. Martin Gleave had a node degree >0.1 (n = 0.11) which indicated that he collaborated closely with other authors. Two authors' articles were cited simultaneously in one study, revealing that they had a co-citation relationship. Fig. 4B shows a map of cited authors in related publications. Cluster analysis in VOSviewer identified five clusters with Dehm, S (red); Hu, R (yellow); Scher, Hi (purple); Beltran, H (blue); and Hsing, Aw (green) as the core authors. The authors in same cluster had similar research directions.

Keywords
Keywords are a high-level summary and condensation of literature. Their occurrence frequency reflects research directions and hotspots. Keywords with case differences, hyphen differences, abbreviations, or similar in meanings were merged.
A cluster analysis places the related keywords in the same cluster which indicates the current research status. A keyword timeline spectrum analysis shows the time of keyword appearance and the continuation time. Identifying keywords that change in frequency in a certain period can be used to reflect the hotspot phase of research field. Combining with the research zone keyword map to summarize hotspot evolution is significant for future research directions. In Fig. 5B, the module value (modularity Q) was 0.4509 > 0.3, and the average profile value (weight mean silhouette S value) was 0.748 > 0.7, indicating that the structure of this clustered association is significant and efficient. The keyword clustering map (Fig. 5B) and timeline spectrum (Fig. 5C) revealed the fol-lowing eight class clusters: "polymorphism", "growth", "binding", "neuroendocrine prostate cancer", "abiraterone", "differentiation", "androgen" and "gene expression". Keyword clustering and timeline spectrum analyses revealed that the research time of clusters ranged from 2000 to 2021. The clusters "polymorphism" and "growth" were concentrated before 2004, and the clusters "binding", "neuroendocrine prostate cancer", and "neuroendocrine prostate cancer" were concentrated after 2004. "Neuroendocrine prostate cancer" and "abiraterone" were mainly studied after 2010 and have continued to be studied, whereas the rest of clusters were studied more evenly over time.
Burst-word detection refers to the number of times a particular topic appears in a given period. It can be used to study past research hotspots and predict potential future directions. The top 25 keywords with the strongest citation bursts are shown in Fig. 6. "Androgen receptor gene" had the longest burst time from 2000 to 2008 with a burst strength of 8.31. The top two keywords with the strongest burst strength were "abiraterone" and "AR-V7", with strengths of 15.72 and 15.14, respectively, and the burst time was from 2018 to the present. Other recent keywords with citation bursts included degradation, liquid biopsy, and resistance.

Publications and co-cited references
An analysis of co-cited references reveals current research hotspots in the field. Table 6 lists 10 most-cited papers between 2000 and 2021. Table 7

Discussion
Bibliometrics is important for researchers and scholars as a discipline that studies research trends, directions and hotspots over time. PCa is the second most diagnosed malignancy and the fifth leading cause of cancer mortality in men worldwide [41]. It not only causes physical impairment but also increases society's economic burden and making it a public health issue in most countries. In recent years, AR variants have become an important hotspot in PCa research. In this study, the past and current research hotspots are elucidated in this field and future research trends and directions are predicted. It is the first study that summarizes the knowledge structure and research progress of AR variants in PCa through bibliometric analysis.
The publication distribution by years show increase in the number of papers in this research area over 20 years which peaked in 2019. The citation number of these publications has continued to increase from 2000 to 2021. The analysis of publication growth curve shows that the inflection point has yet not appeared which indicate that the number of publications in related fields will continue to increase. Eight of the top-10 countries publishing these articles are the Western countries which is consistent with the high incidence of PCa in these areas. United States published most publications with the highest number of citations as well as the strongest total link strength. The country with the second highest number of publications is China. The incidence and mortality of PCa are relatively low in China, however PCa incidence has been increasing in recent years [42]. Canada is contributing~8% of the publications in this field. Australia has the highest centrality having more collaborations with other countries. The author and co-cited author analysis helps in identifying the scholars with high influence on AR variants in PCa. Gleave Martin was the most productive author with 30 documents and collaborated most closely with other authors. Five clusters of co-cited authors were identified. Authors in the same clusters may have similar research direction or strong cooperative relationship. These authors have focused on molecular biology, immunology, genetics, cell biology mechanisms and related therapeutic aspects of prostate cancer.
The research focus related to AR variants in PCa can be explored from cooccurring keywords. The five most common cooccurring keywords are identified as: "androgen receptor", "prostate cancer", "expression", "progression" and "splice variant". "Expression" and "progression" indicate the research focus of over 20 years in this field. "Expression" related to AR variants includes two aspects. One aspect is the expression of AR splice variants in PCa tissue and cell lines. In the past few years, at least 20 isoforms of AR variant mRNAs have been expressed in PCa. Another aspect is the gene expression profiles identified in PCa-expressing AR variants. Many gene expression profiles have been identified due to different isoforms of AR variants and the heterogeneity [9]. Another research focus includes the role and mechanism of AR variants in PCa progression. AR splice variants lack Cterminal ligand-binding domain in their amino acid structure owing to the truncation by cryptic exon. They thus do not respond to conventional ADT and are correlated with PCa progression, especially CRPC. The keyword clustering and timeline analysis reveal "binding", "neuroendocrine prostate cancer" and "abiraterone" as the three clusters identified since 2010. AR binding with DNA is a fundamental mechanism by which AR modulates PCa development and progression. The differential binding mechanism of AR variants may promote CRPC progression and abiraterone resistance [43]. The most cited publications and co-cited references suggest the current research focus on AR variants in PCa. Antonarakis ES and Hu R published the most cited articles which were also the most co-cited references. In Antonarakis' article, AR-V7 as one of AR variant isoforms was detected in circulating tumor cells (CTCs) of CRPC patients. AR-V7 positive patients had lower prostate specific antigen (PSA) response and shorter PSA progression-free survival, clinical or radiographic progression-free survival, and overall survival which suggested that AR-V7 may be associated with the resistance toward enzalutamide and abiraterone [7]. Hu R et al. [17] identified seven AR variant transcripts and uncovered higher AR-V7 expression, predicting the biochemical recurrence after surgical treatment. AR-V7 was constitutively active in driv-ing the expression of canonical androgen-responsive genes, suggesting a novel mechanism for CRPC development. Both studies were from Dr. Jun Luo's team at Johns Hopkins University School of Medicine. These findings suggested the mechanism by which AR-V7 promoted CRPC progression, and the association of AR-V7 with enzalutamide and abiraterone resistance in mCRPC patients.
Predicting future research directions is vital for research. Citation burst detection can be used to explore research trends, and recent ongoing bursts can partially reveal future trends [44]. Abiraterone and AR-V7 were identified as the top two keywords with the strongest citation bursts of strengths >15. This finding suggests that the association of AR-V7 and resistance to abiraterone in PCa is an emerging trend. Abiraterone is an inhibitor of cytochrome P450, family 17, subfamily A, polypeptide 1 (CYP17A) which depletes adrenal and intertumoral androgen by suppressing androgen synthesis [45]. Studies have shown improved survival with abiraterone. The agents are approved by the Food and Drug Administration for treating mCRPC [35,46]. Abiraterone had made a breakthrough in mCRPC treatment, however some patients had primary re- sistance to the agent with no response to PSA levels, while other patients acquired secondary resistance after an initial response [7]. One plausible explanation for this abiraterone resistance may involve the presence of AR variants, a finding supported by some preclinical studies [47]. AR-V7, also known as AR3, was another identified citation burst keyword.
It has truncated C-terminal ligand binding domain (LBD) of AR protein encoded by novel cryptic exon 3 in mRNA [9,40]. It is the common isoform among androgen receptor-splicing variants and the only known variant encoding a functional protein product that is detectable in clinical samples [10,25]. Many studies have been reported investigating the association of AR-V7 and abiraterone resistance. In the past 20 years, reference with the strongest citation burst was Antonarakis ES, which was also the most cited paper and co-cited reference. The authors reported the association between AR-V7 and resistance to abiraterone and enzalutamide in mCRPC through a prospective study [7]. The recent burst keyword, "liquid biopsy" lasted till the end of 2021. A liquid biopsy refers to the non-invasive analysis of biomarkers in biological fluids (such as blood, plasma, urine, liquor, and saliva) to allow the detection and longitudinal follow up of cancers [48,49]. Compared with traditional tissue biopsies for prostate cancer, liquid biopsies are non-invasive, depict PCa heterogeneity, and convenient to follow up and evaluate efficacy during PCa treatment. The biomarkers obtained from liquid biopsy include cell-free tumor DNA (ctDNA), circulating cell-free tumor RNA (ctRNA), proteins, peptides, metabolites, CTCs and extracellular vesicles (EVs). In studies investigating the clinical significance of AR-V7 in PCa patients, the majority of measured AR-V7 mRNA or protein were from liquid biopsies, especially from CTCs [38,39,50,51]. In a systematic review and meta-analysis of 37 studies, the clinical significance was investigated for AR-V7 detected from liquid biopsies. The analysis found that AR-V7 positivity detected from liquid biopsy was associated with poor Overall Survival (OS), Progression-Free-Survival (PFS), and prostate specific antigen progression-free survival (PSA-F I G U R E 6. Top 25 keywords with the strongest citation bursts.

PFS).
A subgroup analysis showed that AR-V7 positivity was associated with poor prognosis of mCRPC treated with androgen receptor signaling inhibitors (ARSi) than with taxene [52].
Recent co-cited references, especially the ones still in bursting period, indicated the current trends and future directions. Antonarakis ES represented the strongest burst from 2018 to the end of 2021 (Fig. 7) [36]. This work explored the clinical significance of AR-V7 mRNA detection in CTCs of men with mCRPC. CTC+/AR-V7+ patients were more likely to have poor clinical characteristics, including Gleason scores ≥8 and metastatic disease at diagnosis. These results confirmed the negative prognostic impact of AR-V7 detection in CTCs of CRPC patients treated with abiraterone and enzalutamide therapy. The results further suggested that CTC/AR-V7 biomarker panel might be useful in predicting the response to first-and second-line hormonal therapy settings.
The second strongest burst citation by the end of 2021 was the study by sharp an in 2019 [37]. The authors performed cross-institutional study to determine nuclear AR-V7 protein expression in tissue biopsies and autopsies from primary and metastatic PCa rather than in liquid biopsies. AR-V7 expression was rare in primary PCa but emerged in response to primary ADT and further increased in response to abiraterone or enzalutamide therapy. AR-V7-negative disease was associated with better PSA response and overall survival, indicating it as an important marker of response to endocrine therapy and related to prognosis.
Another paper with high strength burst citation in past 3 years was Armstrong, 2019 [38]. The authors reported PROPHECY study as a multicenter, and prospective-blinded study of 118 men with high risk mCRPC starting abiraterone or enzalutamide treatment. The detection of AR-V7 in CTCs F I G U R E 7. Top 50 references with the strongest citation bursts.
by two blood-based assays was associated with shorter PFS and OS, and AR-V7 positive mCRPC had fewer PSA or soft tissue responses to abiraterone or enzalutamide. This study suggested that AR-V7 was a strong predictor of clinical outcomes in mCRPC treated with abiraterone or enzalutamide.
Scher H evaluated whether a nuclear localized AR-V7 protein in CTCs was a treatment selection marker for mCRPC in two articles having high strength burst citations in the past 3 years. First, the positivity criteria of AR-V7 were investigated. Patients who had AR-V7 proteins in cellular nuclei of CTCs were likely to survive longer on taxane-based chemotherapy; however, AR-V7 with unknown protein localization was less predictive. Thus, the nuclear-specific localization of AR-V7 was required to evaluate its predictive benefit. Scher H reported the association of AR-V7 status and prognosis of mCRPC treated with either ARSis or taxanes in multiinstitutional cohort study. Patients with AR-V7 positive CTCs had better overall survival with taxanes vs. ARSis, and vice versa. Nuclear-localized AR-V7 could be used to select an ARSi or taxane and provided individual patient benefits.
Abida W described a comprehensive integrative analysis of genomic and transcriptomic profiles, histology, and clinical outcomes for 429 mCRPC patients [53]. A high frequency was identified for genomic alterations in AR, including AR splice variants. AR-V7 levels were increased in tumors exposed to taxanes and ARSi therapy, but no association was identified between AR-V7 expression and shorter time of first-line ARSi or overall survival. This bibliometric study of prostate cancer-related androgen receptor splice variants between 2000 and 2021 analyzes research hotspots and future trends in this field, however limitations remain. Because of publication time limitations, the value of some newly published or high-impact literature may not have been disseminated, which has certain delaying effect on research analysis. Only Web of Science Core Collection database was selected as the data source. This database has considerable influence in academic journal databases, however some high quality papers may not be included. In future, more literature may be included for the detailed study in this field.

Conclusions
This bibliometric study of AR variants in PCa between 2000 and 2021 visualizes the literature from Web of Science Core Collection to better understand the research trends and directions. Over the 22 years investigated, the number of publications substantially increased and still its inflection point was not achieved. The most influential country is USA, and the most influential institution is The University of Washington. About 27% articles were published in 1% top-10 journals, most of which are specialized in urology and oncology. Moreover, the research foundation of this field is from high impact journals. Keyword co-occurrence analysis shows that research has been focused on "expression" and "progression" in the past two decades. The most cited papers and co-cited references are related to the clinical significance of AR-V7 and its mechanism of promoting CRPC. Citation bursts analysis indicate that the research focus has transferred from the bench to bedside, with keywords, abiraterone and AR-V7 having the strongest citation burst. The current citation burst suggests that the trend and future direction are focusing on the evaluation of AR-V7 status in PCa clinical samples and elucidating its contribution to CRPC progression.

AVA IL AB ILI T Y OF DATA AN D M AT E R I A L S
Data presented in this study are available on request from corresponding author.

A U TH OR CO NT RI BU TI ONS
LW, TTJ and SMC-conducted data collection. KY, FB and ZJL-were involved in data analysis. KY, JL and XLwrote the manuscript. JL and XL-were involved in project development and supervision.

E T H I CS A PPR OVA L AN D CONS E N T TO PA RT ICI PAT E
Not applicable.

AC K NOW LED G ME N T
We appreciate the support from Li Rong from Department of Plastic and Aesthetic Surgery, The First Hospital of Jilin University.

F U ND ING
This study was funded by Science and Technology Development Project of Jilin Province, China (20200201315JC), and Bethune Urological Oncology Special Grant, Beijing Bethune Charity Foundation (mnzl202022).

CO NFL ICT OF IN T ERE ST
The authors declare no conflict of interest.