Introduction

Non-coding RNAs (ncRNAs) are now recognized as functionally important molecules, rather than mere “junk sequences”. They have been confirmed to play regulatory roles in neural processes1. Many studies have demonstrated that ncRNAs are crucial participants in a wide range of biological processes, including cell differentiation, genomic imprinting2, and gene expression regulation3. Circular RNAs (circRNAs) are an important type of ncRNAs, which were first discovered in the 1970s4. In early time, circRNAs were deemed as the low abundance byproducts produced by RNA splicing5,6, which were not focused by investigators. This situation has changed with the advancement of biomedical research and development of RNA deep sequencing high-throughput technologies. Many studies have reported the biological significance of circRNAs7,8.

CircRNAs can act as RNA-binding proteins (RBPs) or miRNA sponges, thereby suppressing the biological effects of miRNA-targeted molecules9,10,11. Furthermore, aberrant expression of circRNAs has been closely associated with various pathological processes12,13 and its stability has help them as important biomarkers and targets of various diseases14,15,16,17. At present, circRNAs have become an important research field in RNA research. Similar to the study of messenger RNA (mRNA), a widely investigated RNA type containing the information for translating the genetic message from genomic DNA into proteins, it is essential to determine the functions of circRNAs. It is known that the subcellular localizations of circRNAs are important clues for inferring their functions, which is similar to mRNAs and proteins. In this era, traditional biochemical experiments are popular to determine the subcellular localizations of circRNAs. However, they are not perfect. For example, these methods are expensive and time-consuming. Designing efficient computational models is a feasible alternative, which can accelerate the determination of subcellular localizations of circRNAs and have low costs. To date, few computational models have been designed in this regard.

The prediction of subcellular localizations of proteins and mRNAs is a hot topic in bioinformatics. Lots of computational models have been proposed18,19,20,21,22,23,24,25,26. The early methods extract features from the essential properties of proteins and mRNAs. Then, a classification algorithm was employed to construct the model. The recently proposed models further add a procedure to generate high-level features, such as convolutional neural network (CNN), graph neural network, thereby improving the prediction efficiency. In recent years, some computational models for predicting subcellular localizations of ncRNAs are proposed, especially the models for miRNAs27,28,29,30,31,32,33,34,35,36. Although these models exhibit high performance, they cannot be directly used for predicting subcellular localizations of circRNAs as some differences exist between miRNAs and circRNAs. The models for predicting subcellular localizations of circRNAs are limited to our knowledge. Asim et al. proposed the first circRNA subcellular localization model, Circ-LocNet37. It extracted features from RNA sequences and adopted five traditional classification algorithms for prediction. Zeng et al. designed cell line-specific models, CellCircLoc, to detect circRNA subcellular localizations38. It was constructed using some deep learning technologies, such as CNNs, transformer blocks. Above two models only considered the circRNAs with exact one subcellular localization. In fact, several circRNAs have multiple subcellular localizations, reducing the application values of above models. Thus, it is still necessary to design efficient computational models for predicting subcellular localizations of circRNAs, especially for circRANs with multiple subcellular localizations. Inspired by the successful applications of deep learning algorithms and large language models (LLMs) in tackling various problems39,40,41,42,43,44,45,46, especially the classification problems, it is expected that efficient computational models for predicting subcellular localizations of circRNAs can be constructed based on various deep learning algorithms and LLMs.

In this study, we designed a deep learning- and LLM-based computational model, namely CircLoc, for the prediction of subcellular localizations of circRNAs. To obtain more complete representations of circRNAs, several feature types were extracted, which were derived from circRNA sequences and networks. A deep learning algorithm, graph attention auto-encoder (GATE)47, was employed to further generate high-level circRNA features. All features were processed by a self-attention layer and then fed into a fully connected layer to make predictions. The ten-fold cross-validation shown that the average AUC and AUPR were 0.7856 and 0.4055, respectively. Such performance was better than that of the models constructed by traditional multi-label classification algorithms and the miRNA subcellular localization prediction models, which were slightly transformed for predicting circRNA subcellular localizations. Some ablation tests were conducted to prove the reasonableness of CircLoc. Finally, the results on two test datasets proved that CircLoc had an ability to identify latent localizations of new circRNAs.

Materials and methods

Benchmark dataset

A well-defined dataset is essential for constructing efficient classification models. In this study, we employed the human circRNAs and their subcellular localizations reported in RNALocate (version 3.0, http://www.rnalocate.org/)48. The original dataset contains 131,208 human circRNAs involving 16 distinct subcellular localizations. Among these localizations, seven of them have few human circRNAs, which are insufficient for constructing reliable models. Therefore, these localizations were excluded. For the rest nine subcellular localizations, two localizations (Extracellular exosome and Extracellular vesicle) contained too many circRNAs (more than 50000 circRNAs), which were much more than other localizations. Inclusion of them yielded an extremely imbalanced dataset. Thus, they were also removed. Furthermore, we removed the circRNAs without sequence information. Finally, the constructed dataset, denoted as \(S\left(3.0\right)\), contained 1486 human circRNAs covering seven subcellular localizations, including Chromatin, Cytoplasm, Cytosol, Membrane, Nucleolus, Nucleus, and Nucleoplasm. The circRNA distribution across these seven localizations is shown in Table 1. For formulation, let us denote the circRNA sets for seven localizations as \({S}_{\left(\text{C}\text{h}\text{r}\text{o}\text{m}\text{a}\text{t}\text{i}\text{n}\right)}\),\({S}_{\left(\text{C}\text{y}\text{t}\text{o}\text{p}\text{l}\text{a}\text{s}\text{m}\right)}\), \({S}_{\left(\text{C}\text{y}\text{t}\text{o}\text{s}\text{o}\text{l}\right)}\), \({S}_{\left(\text{M}\text{e}\text{m}\text{b}\text{r}\text{a}\text{n}\text{e}\right)}\), \({S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{o}\text{l}\text{u}\text{s}\right)}\), \({S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{u}\text{s}\right)}\), and \({S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{o}\text{p}\text{l}\text{a}\text{s}\text{m}\right)}\). Then, the dataset \(S\left(3.0\right)\) can be represented by.

Table 1 Breakdown of circRNA datasets.
$$\text{S}\left(3.0\right)={S}_{\left(\text{C}\text{h}\text{r}\text{o}\text{m}\text{a}\text{t}\text{i}\text{n}\right)}\cup{S}_{\left(\text{C}\text{y}\text{t}\text{o}\text{p}\text{l}\text{a}\text{s}\text{m}\right)}\cup{S}_{\left(\text{C}\text{y}\text{t}\text{o}\text{s}\text{o}\text{l}\right)}\cup{S}_{\left(\text{M}\text{e}\text{m}\text{b}\text{r}\text{a}\text{n}\text{e}\right)}\cup{S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{o}\text{l}\text{u}\text{s}\right)}\cup{S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{u}\text{s}\right)}\cup{S}_{\left(\text{N}\text{u}\text{c}\text{l}\text{e}\text{o}\text{p}\text{l}\text{a}\text{s}\text{m}\right)}$$
(1)

It can be observed from Table 1 that the sum of the numbers of circRNAs in seven localizations was larger than the number of different circrNAs, meaning that some circRNAs belong to more than one localization. To illustrate this phenomenon, an upset graph was plotted, as shown in Fig. 1(A). It can be found that several circRNAs belong to two or more localizations. Thus, assigning localizations to circRNAs is a multi-label classification problem when localizations are termed as classes and circRNAs as samples.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Upset graph to show the intersections of two circRNA datasets for seven subcellular localizations. (A) Upset graph on the circRNA dataset S(3.0) retrieved from RNALocate 3.0; (B) Upset graph on the circRNA dataset S(2.0) retrieved from RNALocate 2.0.

To further test the model, we also employed the human circRNAs and their subcellular localizations collected in RNALocate (version 2.0)49. Through the same data cleaning process, 880 human circRNAs were obtained. They were also labelled by above seven subcellular localizations. The number of circRNAs in each localization is also listed in Table 1 and an upset graph was plotted, as illustrated in Fig. 1(B), to show the intersection of circRNAs on seven localizations. This dataset was denoted as \(S\left(2.0\right)\). This dataset and \(S\left(3.0\right)\) were served as training datasets to build and evaluate the model. Furthermore, based on the differences between \(S\left(3.0\right)\) and \(S\left(2.0\right)\), we constructed one test dataset, denoted as TD1. This dataset consisted of 606 newly added human circRNAs in \(S\left(3.0\right)\), i.e., \({TD}_{1}=S\left(3.0\right)-S\left(2.0\right)\). The model built on \(S\left(2.0\right)\) would be applied to this test dataset, thereby evaluating the generalization ability of the model. The numbers of circRNAs in TD1 on seven subcellular localizations are listed in Table 1. It can be found that there were no circRNAs on three localizations (Chromatin, Membrane, and Nucleoplasm), few circRNAs (less than 10) owned two localizations (Cytosol and Nucleolus), only two localizations (Cytoplasm and Nucleus) had considerable circRNAs.

As the completeness of TD1 was not very satisfied, we searched another comprehensive database, CSCD2 (http://geneyun.net/CSCD2/)50, which contains a large number of circRNAs (~ 2.9 millions). We extracted the circRNAs with subcellular localizations and compared them with circRNAs in \(S\left(3.0\right)\), resulting in 732 additional circRNAs. These circRNAs constituted another test dataset, denoted as TD2. The numbers of circRNAs in this dataset on seven subcellular localizations are listed in Table 1. It can be found that each localization was assigned to some circRNAs, suggesting the completeness of TD2.

In this study, we constructed and evaluated the model mainly on dataset \(S\left(3.0\right)\), including parameter optimalization, ablation study, and comparison with other models. The datasets \(S\left(2.0\right)\) and TD1 as well as \(S\left(3.0\right)\) and TD2 would be used to evaluate the generalization ability of the proposed model.

Construction of circRNA sequence features

The RNA sequences are always the first-hand material to investigate RNA-related problems. An RNA sequence with length L can be represented by

$$S={R}_{1},{R}_{2},{R}_{3},\cdots,{R}_{L}$$
(2)

where \({R}_{i}\) denotes the i-th residue in the sequence and L represents the length of the sequence. Some essential properties of RNAs are contained in this sequence. To date, several methods have been proposed to encode the properties of RNAs contained in the sequences into fixed-length vectors51,52,53,54. In this study, we first downloaded the sequences of human circRNAs from circBase (http://www.circbase.org/)55. Then, three methods or models were employed to extract circRNA sequence features, yielding three feature types.

Features yielded by k-mer

The k-mer is a widely used method for encoding DNA, RNA, and protein sequences56. The features yielded by this method partly reflect the main components in the sequence. Given a circRNA sequence with length L, as formulated by Eq. 2, k-mer subsequences can be extracted from this sequence using window sliding technique and its number is L-k + 1. The frequencies of possible k-mer subsequences are counted and constitute the features yielded by k-mer. It is clear that there are 4k possible k-mer subsequences as circRNA sequences consist of 4 bases (A, U, C, G). Here, we set k = 2, 3, 4, 5, thereby generating 1360 (42+43+44+45) sequence features. For convenience, these features were called k-mer features.

Features yielded by reverse compliment k-mer

Reverse compliment K-mer (k-RevcKmer) is a type of deformation of k-mer57,58, which considers the reverse complement of RNA sequences. The reverse complement k-length contiguous subsequences are discarded. For 2-mer, there are 16 possible k-mer sequences; whereas the 2-RevcKmer only keeps 10 subsequences. The numbers of different k-RevcKmer subsequences are reported in Liu et al.’s study51. In this study, we still set k to 2, 3, 4, and 5, obtaining 10, 32, 136, and 512, respectively, features. Accordingly, 690 (10 + 32+136 + 512) features were obtained. These features were called k-RevcKmer features.

Features yielded by RNAErnie model

In recent years, LLMs have been successfully applied to learn massive amounts of unlabeled natural language data. They can deeply investigate the large-scale data and generally yield informative embeddings. For RNA, some LLMs have been proposed59,60,61. Here, we employed a recently proposed pretrained RNA LLMs, namely RNAErnie60. It is built upon the Enhanced Representation through Knowledge Integration (ERNIE) framework and incorporates multi-layer and multi-head transformer blocks. The circRNA sequences were fed into this LLM for generating informative features. The dimension was set to 768. For convenience, these features were called RNAErnie features.

The detailed information of circRNA sequence features is listed in Table 2.

Table 2 Information on circRNA features.

Construction of circRNA network features

The circRNA sequence features mainly focused on the essential properties of a single circRNA, which cannot reflect all aspects of circRNAs. In recent years, network is deemed to be efficient in organizing research objects and provides an effective form to conduct investigations62,63,64. In this study, we extracted circRNA features from various networks containing circRNAs.

Network construction

Five networks were constructed in this section. Each network included circRNAs as nodes and some networks further contained other objects, such as drugs, diseases, miRNAs, and RBPs.

circRNA sequence similarity network

As mentioned above, RNA sequences are the first-hand material to investigate RNA-related problems. In this study, the circRNA sequences were adopted to measure the similarity of circRNAs. Smith-Waterman algorithm65 was employed to calculate the similarity score of two circRNA sequences. For circRNAs c1 and c2, their similarity score yielded by Smith-Waterman algorithm was denoted as \(\text{s}\text{p}\left({c}_{1},{c}_{2}\right)\). Let the sequences of c1 and c2 be \({a}_{1}{a}_{2}\cdots{a}_{n}\) and \({b}_{1}{b}_{2}\cdots{b}_{m}\). Waterman algorithm constructs a matrix \(H\in{R}^{(n+1)\times(m+1)}\) with \(H\left(k,0\right)=H\left(0,l\right)=0\ (0\le{k}\le{n}\ \:and\ \:0\le{l}\le{m})\). The value \(H(i,j)\) indicates the maximum similarity of two segments ending in \({a}_{i}\) and \({b}_{j}\). The matrix \(H\) is filled using a dynamic programming algorithm defined as

$$H\left(i,j\right)=\text{m}\text{a}\text{x}(H\left(i-1,j-1\right)+s\left({a}_{i},{b}_{j}\right),{max}_{k\ge1}\left(H\left(i-k,j\right)-{W}_{k}\right),{max}_{l\ge1}\left(H\left(i,j-l\right)-{W}_{l}\right),0)$$
(3)

where \(s\left({a}_{i},{b}_{j}\right)\) is the match or mismatch score of two bases, \({W}_{k}\) is the penalty of the gap of length k. After the matrix \(H\) is produced, \(\text{s}\text{p}\left({c}_{1},{c}_{2}\right)\) is defined as the maximum value in \(H\). This similarity score was refined using the following equation:

$$\text{S}\left({c}_{1},{c}_{2}\right)=\frac{\text{s}\text{p}\left({c}_{1},{c}_{2}\right)}{\sqrt{\text{s}\text{p}\left({c}_{1},{c}_{1}\right)\cdot\text{s}\text{p}\left({c}_{2},{c}_{2}\right)}}$$
(4)

where \(\text{S}\left({c}_{1},{c}_{2}\right)\) represented the final sequence similarity score between c1 and c2, which was between 0 and 1. After calculating the sequence similarity score of any two circRNAs, the circRNA sequence similarity network was built. The 1486 circRNAs were defined as nodes in this network and two circRNAs were connected if and only if their sequence similarity score was larger than zero, yielding 1,103,355 edges. Furthermore, this score was assigned to the corresponding edge as its weight. This obtained network was denoted as \({N}_{s}\).

circRNA-disease association network

In recent years, an increasing number of studies have demonstrated that circRNAs play crucial roles in the initiation and progression of various diseases66,67,68,69 and many prediction models have been built to predict circRNA-disease associations70,71,72,73,74,75. Thus, the circRNA-disease associations can partly describe the functions of circRNAs, thereby giving new materials to encode circRNAs. In this study, we retrieved circRNA-disease associations from circR2Disease (version 2.0, http://bioinfo.snnu.edu.cn/CircR2Disease_v2.0) database76. After restricting to 1486 circRNAs, we obtained 1006 circRNA-disease associations (not all circRNAs have associated diseases), covering 177 diseases. Then, the circRNA-disease association network was constructed, which defined 1486 circRNAs and 177 diseases as nodes. The edges connecting two circRNAs were same as those in \({N}_{s}\) and edges connecting circRNAs and diseases were determined by above-mentioned circRNA-disease associations. This network was denoted as \({N}_{ci-di}\).

circRNA-drug association network

Similar to diseases, recent studies have confirmed that circRNAs also play critical roles in various drug response mechanisms. For instance, circ-PVT1 has been shown to promote paclitaxel resistance in gastric cancer cells77 and high expression of circCELSR1 significantly reduces the sensitivity of ovarian cancer cells to paclitaxel78. These findings suggested that the circRNA-drug associations not only provide insights into their potential biological functions but also may serve as valuable features for predicting circRNA subcellular localizations. In view of this, we downloaded the circRNA-drug sensitivity associations from Deng et al.’s study79. The restricting operation on circRNAs yielded 850 circRNA-drug sensitivity associations covering 217 drugs. Then, the circRNA-drug association network was built. It set 1486 circRNAs and 217 drugs as nodes. The edges in \({N}_{s}\) were also included in this network and obtained circRNA-drug sensitivity associations formed the edges connecting circRNAs and drugs. This network was denoted by \({N}_{ci-dr}\).

circRNA-miRNA association network

MicroRNAs (miRNAs) are a well-studied class of ncRNAs whose subcellular localization patterns have been systematically characterized. In recent years, regulatory relationships between circRNAs and miRNAs have also been increasingly elucidated. CircRNAs can act as “miRNA sponges” by adsorbing miRNAs, thereby attenuating their regulatory effects on target genes and blocking miRNA-mediated repression80. This mechanism highlights the close relations between circRNAs and miRNAs. Employment of miRNAs may be helpful to predict the subcellular localizations of circRNAs. In view of this, we sourced the circRNA-miRNA associations from circBank database (www.circbank.cn)81. These associations were also restricted to 1486 circRNAs, resulting in 6893 circRNA-miRNA associations, which covered 1781 miRNAs. Then, a circRNA-miRNA association network was constructed, where 1486 circRNAs and 1781 miRNAs were defined as nodes. This network also included the edges in \({N}_{s}\) to show the relations between circRNAs. The above-obtained circRNA-miRNA associations defined the edges connecting circRNAs and miRNAs. We denoted this network as \({N}_{ci-mi}\).

circRNA-protein association network

RBPs participate in many biological processes. These proteins can be useful indictors to suggest the properties of RNAs. This study employed the RBPs of circRNAs collected in CircInteractome (https://circinteractome.irp.nia.nih.gov/)82, which were used as circRNA-protein associations. Likewise, these associations were restricted to 1486 circRNAs, yielding 2827 circRNA-protein associations, which covered 37 RBPs. Accordingly, a circRNA-protein association network was constructed. The 1486 circRNAs and 37 RBPs were defined as nodes. The restricted circRNA-protein associations determined the edges connecting circRNA and RBP nodes. Also, the edges in \({N}_{s}\) were also included in this network. Let us denote this network as \({N}_{ci-RBP}\).

This section constructed five circRNA networks. Their detailed information is listed in Table 3.

Table 3 Information of five networks.

Features yielded by node2vec from networks

Above-constructed networks contained abundant association information between circRNAs or between circRNAs and other objects. It is essential to convert this information into numeric vectors. The network embedding algorithms provide effective ways to tackle this problem. Here, the widely used network embedding algorithm, node2vec83, was adopted. This algorithm samples lots of paths in a given network and then adopts word2vec with SkipGram to yield node embeddings. The procedure of sampling paths is the core of this algorithm. Suppose there is a path starting at node \({u}_{0}=y\) and it has been extended to the i-th node \({u}_{i-1}=v\). The next node of this path is determined by the possibility, defined as below

$$P\left({u}_{i}=x\mid{u}_{i-1}=v\right)=\left\{\begin{array}{c}\frac{{\pi}_{vx}}{Z}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{i}\text{f}\text{ } v\text{}\text{ a}\text{n}\text{d }x\text{ are adjacent}\\0\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{o}\text{t}\text{h}\text{e}\text{r}\text{w}\text{i}\text{s}\text{e}\text{}\end{array}\right.$$
(5)

where \({\pi}_{vx}\) is the unnormalized transition probability between v and x, and Z is the normalization constant, which is defined as the sum of transition probabilities between v and other nodes. \({\pi}_{vx}\) is defined as \({\pi}_{vx}={\alpha}_{pq}(t,x)\cdot{w}_{vx}\), where \({w}_{vx}\) is the weight of edge \((v,x)\) and \({\alpha}_{pq}(t,x)\) is computed by

$${\alpha}_{pq}(t,x)=\left\{\begin{array}{c}\frac{1}{p}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{i}\text{f}\text{ }{d}_{tx}=0\\1\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{i}\text{f}\text{ }{d}_{tx}=1\\\frac{1}{q}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{i}\text{f}\text{ }{d}_{tx}=2\end{array}\right.$$
(6)

where t is the node prior to v, \({d}_{tx}\) represents the distance from t to x, p and q determine the probabilities of return t and moving away from t, which are generally called return and in-out parameters. The path stops extending until its length reaches the predefined length parameter (l). After the predefined number (m) of paths starting from each node have been sampled, paths are deemed as sentences and nodes are treated as words, which constitute the corpus. The word2vec with SkipGram is applied to this corpus for generating the node embeddings.

In this study, we applied node2vec to above five networks. From circRNA sequence similarity network, we obtained the circRNA similarity features. From other four association networks, the circRNA disease, drug, miRNA, and RBP network features were obtained. The dimensions of above features were all set to 128. The information of above circRNA features is listed in Table 2.

Features improved by GATE

Features yielded by node2vec always contain some noises. Thus, they should be refined. GATE47 is a type of graph learning algorithm. Different from the network embedding algorithms, it can fuse the raw node embeddings and topology of the given network to yield more informative embeddings. GATE contains encoder and decoder procedures. The encoder updates the raw embedding of each node by considering the embeddings of its neighbors. The decoder tries to recover the raw embeddings of nodes as perfect as possible.

Generally, several layers are contained in the encoder procedure. Let \({x}_{i}={h}_{i}^{\left(0\right)}\) be the raw embeddings of the i-th node and \({h}_{i}^{\left(k\right)}\) be the embeddings after the k-th layer. In the k-th layer, GATE calculates the relevance between the i-th node and its neighbor (j-th node for example) by

$${e}_{ij}^{\left(k\right)}=\text{S}\text{i}\text{g}\text{m}\text{o}\text{i}\text{d}\left({v}_{s}^{(k{)}^{T}}\sigma\left({W}^{\left(k\right)}{h}_{i}^{(k-1)}\right)+{v}_{r}^{(k{)}^{T}}\sigma\left({W}^{\left(k\right)}{h}_{j}^{(k-1)}\right)\right)$$
(7)

where \({W}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{\left(k\right)}\times{d}^{(k-1)}}\), \({v}_{s}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{\left(k\right)}}\), and \({v}_{r}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{\left(k\right)}}\) are the trainable parameters. \(\sigma\) is the activation function. Then, the softmax is adopted to normalize the relevance, formulated by

$${\alpha}_{ij}^{\left(k\right)}=\frac{\text{exp}\left({e}_{ij}^{\left(k\right)}\right)}{\sum_{l\in{\mathcal{N}}_{i}}\text{e}\text{x}\text{p}\left({e}_{il}^{\left(k\right)}\right)}$$
(8)

where \({\alpha}_{ij}^{\left(k\right)}\) represents the attention coefficient of the j-th node relative to i-th node in the k-th encoder layer, and \({\mathcal{N}}_{i}\) denotes the closed neighborhood of the i-th node. Then, the new embedding of the i-th node is updated by

$${h}_{i}^{\left(k\right)}=\sum_{j\in{\mathcal{N}}_{i}}{\alpha}_{ij}^{\left(k\right)}\sigma\left({W}^{\left(k\right)}{h}_{j}^{(k-1)}\right)$$
(9)

If there are L layers in the encoder procedure, the output of the L-th layer is deemed as the learnt embeddings through GATE, which is denoted as \({h}_{i}={h}_{i}^{\left(L\right)}\) for the i-th node.

The decoder procedure contains the same number of layers in the encoder procedure. It evaluates the quality of \({h}_{i}\) by recovering \({x}_{i}\) using the similar structure. \({h}_{i}\) is the input of the decoder procedure and denoted by \({\widehat{h}}_{i}^{\left(L\right)}\). Likewise, the attention coefficient of the j-th node relative to i-th node in the k-th layer is computed by

$${\widehat{e}}_{ij}^{\left(k\right)}=\text{S}\text{i}\text{g}\text{m}\text{o}\text{i}\text{d}\left({\widehat{v}}_{s}^{(k{)}^{T}}\sigma\left({\widehat{W}}^{\left(k\right)}{\widehat{h}}_{i}^{\left(k\right)}\right)+{\widehat{v}}_{r}^{(k{)}^{T}}\sigma\left({\widehat{W}}^{\left(k\right)}{\widehat{h}}_{j}^{\left(k\right)}\right)\right)$$
(10)
$${\widehat{\alpha}}_{ij}^{\left(k\right)}=\frac{\text{exp}\left({\widehat{e}}_{ij}^{\left(k\right)}\right)}{\sum_{l\in{\mathcal{N}}_{i}}\text{e}\text{x}\text{p}\left({\widehat{e}}_{il}^{\left(k\right)}\right)}$$
(11)

where \({\widehat{W}}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{(k-1)}\times{d}^{\left(k\right)}}\), \({\widehat{v}}_{s}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{(k-1)}}\), and \({\widehat{v}}_{r}^{\left(k\right)}\in{\mathbb{R}}^{{d}^{(k-1)}}\) are the trainable parameters in the k-th decoder layer, \(\sigma\) and sigmoid are same as those in Eq. 7, and \({\mathcal{N}}_{i}\) is same as that in Eq. 8. Accordingly, the embedding of the i-th node after the k-th decoder layer is updated by

$${\widehat{h}}_{i}^{(k-1)}=\sum_{j\in{\mathcal{N}}_{i}}{\widehat{\alpha}}_{ij}^{\left(k\right)}\sigma\left({\widehat{W}}^{\left(k\right)}{\widehat{h}}_{j}^{\left(k\right)}\right)$$
(12)

The output of the last layer in decoder procedure is collected as the reconstructed node embeddings. This embedding of the i-th node is represented by \({\widehat{x}}_{i}\).

As a self-supervised algorithm, GATE adopts the following loss function to assess the quality of the output of encoder procedure

$$\text{Loss\:}=\sum_{i=1}^{N}{\parallel{x}_{i}-{\widehat{x}}_{i}\parallel}_{2}-\lambda\sum_{j\in{\mathcal{N}}_{i}}\text{l}\text{o}\text{g}\left(\frac{1}{1+\text{e}\text{x}\text{p}\left(-{\text{h}}_{i}^{T}{\text{h}}_{j}\right)}\right)$$
(13)

where N is the number of nodes, \(\lambda\) is a parameter used to control the weights of two loss parts.

In this study, GATE was used to improve the circRNA disease, drug, miRNA, and RBP network features. In this procedure, the edges in circRNA sequence similarity network were first filtered by setting a binarization threshold, denoted by T. The edges representing weak circRNA-circRNA associations were discarded. Then, it was fed into GATE. In this way, we obtained the high-level circRNA disease, drug, miRNA, and RBP features.

Self-attention layer

Several features were extracted for each circRNA, as listed in Table 2. They were concatenated to constitute a representation of each circRNA. Then, a self-attention layer was adopted to learn the weights between features for better representing the internal structure of all features. This operation can help the model capture complex dependencies between features, thereby enhancing model’s performance.

The self-attention layer employs three weight matrices \({W}_{Q}\), \({W}_{K}\), and \({W}_{V}\). They can conduct linear transformations on the input features \(X\), as follows:

$$\left\{\begin{array}{c}Q=X\cdot{W}_{Q}\\K=X\cdot{W}_{K}\\V=X\cdot{W}_{V}\end{array}\right.$$
(14)

The outputs of Eq. 14 are used to calculate the attention score matrix \(A\)

$$A=\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}$$
(15)

where \({d}_{k}\) stands for the dimension of \(Q\) or \(K\). A softmax operation is applied to \(A\) to produce the attention weight matrix \(M\), i.e., \(M=\text{s}\text{o}\text{f}\text{t}\text{m}\text{a}\text{x}\left(A\right)\). Finally, the output \(Y\) of the self-attention layer is obtained by \(V\) and \(M\)

$$Y=MV$$
(16)

In this study, the representations of circRNAs were refined by a self-attention layer, yielding better representations of circRNAs.

Prediction and optimization

The circRNA representations processed by the self-attention layer were fed into a fully connected layer for making predictions. The hidden layers adopted ReLU as the activation function and sigmoid was employed as the activation function of the output layer. To evaluate the predicted results, a loss function was necessary. In this study, the loss function was set as the binary cross-entropy, which is defined as

$$L=-\frac{1}{N\times{C}}\sum_{i=1}^{N}\sum_{j=1}^{C}\left[{y}_{ij}\text{l}\text{o}\text{g}\left({\widehat{y}}_{ij}\right)+\left(1-{y}_{ij}\right)\text{l}\text{o}\text{g}\left(1-{\widehat{y}}_{ij}\right)\right]$$
(17)

where \({y}_{ij}\)and \({\widehat{y}}_{ij}\) represent the observed label and predicted probability for the i-th sample on the j-th label, respectively, N and C denote the number of samples and labels, respectively. Several trainable parameters were included when training the model, these parameters were optimized through Adam optimizer84.

Evaluation metrics

Cross-validation is widely used to assess the performance of classification models85,86,87,88,89. In this method, training samples are randomly and equally divided into multiple parts. Each part is singled out as test set one by one, whereas other parts constitute the training set. The average performance on all parts is generally used to evaluate the overall performance of the model. In this study, we adopted ten-fold cross-validation to measure model’s performance.

In classification, receiver operating characteristic (ROC) and precision-recall (PR) curves are commonly used to display model’s performance90,91,92,93,94. Different from some classic metrices (e.g., accuracy), they can show the performance of models under various thresholds. Thus, they can give a complete evaluation on model’s performance. Furthermore, the area under these two curves, denoted as AUC and AUPR, are the key quantitative metrics. They are between 0 and 1. The higher they are, the higher the performance. In this study, we adopted ROC and PR curves to display model’s performance on each localization and computed corresponding AUC and AUPR values. The average AUC and AUPR were further computed to show the overall performance of models.

In addition to average AUC and AUPR for evaluating overall performance of models, we further employed four metrics, including hamming loss, ranking loss, macro F1, and micro F1. The addition of these metrics can give a more complete exhibition on model’s performance.

Outline of CircLoc

In this study, a new computational model was designed for predicting subcellular localization of circRNAs. The entire procedures are illustrated in Fig. 2. The model first extracted circRNA features from two aspects: circRNA sequences and networks. The methods of k-mer, k-RevcKmer, and RNAErine were applied to the sequences for extracting sequence features. Node2vec was adopted to extract circRNA features from the networks and GATE was used to improve these features. All features were concatenated to represent circRNAs. After features were processed by a self-attention layer, a fully connected layer was employed to make predictions. For easy descriptions, we called this model as CircLoc. The model was implemented by TensorFlow 2.11.0 and Scikit-learn95.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Flowchart of CircLoc. The circRNA sequences and networks are employed to generate circRNA features. From sequences, k-mer, k-RevcKmer, and RNAErine features are extracted. The node2vec is applied to five networks to extract features, among which four are refined by graph attention auto-encoder. All features are concatenated and processed by a self-attention layer. Finally, a fully connected layer is used to make predictions.

Results and discussion

Parameter setting of CircLoc

Generally, tuning the main hyperparameters is necessary to build efficient computational models. For CircLoc, the parameters in node2vec, GATE, self-attention layer, and fully connected layer may play key roles in determining its efficiency. They were tuned based on the cross-validation results on dataset S(3.0).

In node2vec, several parameters were important. The hyperparameters to determine the path length (l) and number (m) were tuned in a small range due to our limited computing resources. The parameter l was set to 80 and 90, whereas m was set to 150 and 160. The cross-validation results suggested that when l and m were set to 80 and 150, respectively, the model provided the best performance. The return and in-out parameters (p and q, respectively) were set to their default values, which were all 1.

In GATE, the number of encoder and decoder layers were all set to two as suggested in the original research on GATE47 and some studies79,96. The numbers of neurons in these layers were selected from 32, 64, 128, 256, and 512. The cross-validation results shown that when the first and second encoder layers contained 256 and 128 neurons, the model yielded the best performance. The learning rate was set to 10− 2 and the parameter \(\lambda\) (Eq. 13) for computing the loss was set to 1. Finally, the binarization threshold (T) for obtaining reliable circRNA sequence similarity network was set to 0.7.

In self-attention layer, the sizes of three matrices \({W}_{Q}\), \({W}_{K}\), and \({W}_{V}\) were all set to 324×324. As for the fully connected layer, we first set the number of hidden layers as 2, 3, and 4 and found that four hidden layers yielded better performance. The numbers of neurons in the four hidden layers were set to various values among 32, 64, 128, 256, 512, and 1024. The test results shown that when the numbers of neurons in the first, second, third, and fourth layers were set to 512, 512, 256, and 128, respectively, the model provided the best performance. The fully connected layer also contained an output layer, which contained seven neurons representing the probabilities of seven subcellular localizations.

Above hyperparameters setting to multiple candidate values were optimized by grid search. The detailed hyperparameter setting of CircLoc is listed in Table 4. The epoch was set to 100 and we adopted early stopping strategy to control overfitting.

Table 4 Hyperparameter setting of CircLoc.

Performance of CircLoc

The CircLoc adopted the hyperparameter setting listed in Table 4. It was evaluated on S(3.0) by ten-fold cross-validation. The ROC and PR curves are displayed in Fig. 3(A) and 3(B), respectively. The AUC values for seven subcellular localizations were 0.8103, 0.8188, 0.7161, 0.7791, 0.8326, 0.7294, and 0.8127. These values yielded the average AUC of 0.7856. The seven AUPR values were 0.2182, 0.7656, 0.4856, 0.2980, 0.2534, 0.6538, and 0.1637. The average AUPR was 0.4055. Clearly, the AUPR values were smaller than the AUC values. When the ROC and PR curve analysis was conducted on each subcellular localization, the circRNAs in this localization were deemed as positive samples, whereas the rest were treated as negative samples. In this case, the negative samples were much more than positive samples, resulting in an imbalanced dataset. As the PR curve is more sensitive to the imbalanced problem than the ROC curve, the fact that AUPR values were smaller than the AUC values was acceptable. Besides average AUC and AUPR, we also employed four metrics to show the overall performance of CircLoc, which are listed in Table 5. The micro F1, macro F1, hamming loss, and ranking loss were 0.5779, 0.4494, 0.1720, and 0.1698., respectively.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

ROC and PR curves on seven subcellular localizations to show the performance of CircLoc on these localizations. (A) ROC curves on the circRNA dataset S(3.0) retrieved from RNALocate 3.0; (B) PR curves on the circRNA dataset S(3.0) retrieved from RNALocate 3.0; (C) ROC curves on the circRNA dataset S(2.0) retrieved from RNALocate 2.0; (D) PR curves on the circRNA dataset S(2.0) retrieved from RNALocate 2.0.

Table 5 Overall performance of CircLoc on two training datasets under ten-fold cross-validation and one test dataset.

Ablation tests on features

Eight feature types were extracted for representing circRNAs, as displayed in Table 2. As some feature types were similar, we clustered eight feature types into four groups. The first group contained k-mer and k-RevcKmer features, denoted as α feature. RNAErnie features constituted the second group, indicated as β feature. The third group contained the circRNA similarity features, denoted by γ feature. The last group included the circRNA disease, drug, miRNA, and RBP features, denoted as η feature. The model using any combination of above feature groups was built and evaluated on S(3.0) by ten-fold cross-validation. The average AUC, average AUPR, micro F1, macro F1, hamming loss, and ranking loss for each model are listed in Table 6. It can be observed that all fourteen models yielded lower average AUC, average AUPR, micro F1, and macro F1 and higher ranking loss than the CircLoc, which used all feature groups. The hamming loss yielded by CircLoc was only higher than that of the model using γ and η features. Evidently, the CircLoc was better than the models using part of feature groups. The detailed performance of models using different feature groups on seven subcellular localizations is shown in TableS1. CircLoc provided the highest AUC and AUPR values on most localizations. We can conclude that all feature groups provided positive contributions to CircLoc because the model using part of four feature groups yielded lower performance. To further confirm this fact, we counted the average AUC and AUPR values of models using same number of feature groups on seven localizations, as illustrated in Fig. 4. It can be found that the models using more feature groups generally yielded higher AUC and AUPR, suggesting that using more feature groups was helpful to improve model’s performance. It was also indicated that each feature group provided positive contributions to CircLoc.

Table 6 Overall performance of the models using different feature groups.
Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Performance of models using different numbers of feature groups. (A) Performance measured by AUC; (B) Performance measured by AUPR. The models using more feature groups generally yield better performance.

Although all four feature groups played essential roles in CircLoc, their importances were not identical. From the performance of models using a single feature group, we can find that the models using α or β features yielded similar performance but evidently lower performance than the models using γ or η features. Thus, we can conclude that γ and η features were more important than α and β features. The γ and η features were extracted from five networks containing the relations among circRNAs and other objects (drugs, diseases, miRNAs, and RBPs), showing informative associations of circRNAs. In protein subcellular localization prediction, the fact that interacting proteins always share similar functions (subcellular localizations are highly related to protein functions) is widely accepted97,98,99. This fact may also hold for circRNA subcellular localization prediction. Accordingly, network information, showing the relationships between circRNAs, was essential for subcellular localization prediction. The α and β features were derived from circRNA sequences, indicating the isolated properties of circRNAs. As the relations between sequence information and subcellular localizations are not clear at present, they played minor roles in determining circRNA subcellular localizations. Furthermore, γ features were more important than η features as the model using γ feature was slightly superior to the model using η feature. As the γ feature contained direct relationships between circRNAs, they can provide more useful clues for correct prediction of subcellular localizations. Therefore, the importance of four feature groups for the prediction of circRNA subcellular localization from the highest to lowest was γ > η >> α ≈ β.

In addition, we tested the model using a single feature type to uncover the importance of each feature type. These models were also evaluated on S(3.0) by ten-fold cross-validation. The overall metrics of these models are listed in Table 7. It can be found that circRNA similarity feature yielded the best AUC, AUPR, and macro F1, whereas circRNA miRNA feature provided the best micro F1, hamming loss, and ranking loss. As the AUC and AUPR values yielded by circRNA similarity feature had evident advantages compared with those of circRNA miRNA feature. It was believed that circRNA similarity feature was more essential than circRNA miRNA feature. As for the rest six feature types, the circRNA drug, disease, and RBP features yielded evident higher performance than k-mer, k-RevcKmer, and RNAErnie features. These results were compatible with the above-mentioned importances of feature groups as the γ represented circRNA similarity feature, η contained circRNA disease, drug, miRNA, and RBP features, α consisted of k-mer and k-RevcKmer features, β indicated the RNAErnie features.

Table 7 Overall performance of the models using a single feature type.

Ablation tests on modules

There were several modules in CircLoc, such as GATE and self-attention layer. Here, some ablation tests were conducted to elaborate their importances. For GATE, it was used to refine the features extracted from four networks. The model was constructed by removing this module. The features extracted from four networks via node2vec were directly fed into the following procedures. This model was called CircLoc-GATE. On the other hand, the self-attention layer was applied to all features to yield more informative features. Another model was built by removing this module. All features were directly fed into the fully connected layer. This model was termed as CircLoc-SAL. Above two models were also assessed on S(3.0) by ten-fold cross-validation. The overall performance is listed in Table 8 and their performance on seven localizations is shown in Fig. 5. From Table 8, the average AUC and AUPR of CircLoc-GATE were 0.6577 and 0.2747, respectively. The micro F1, macro F1, hamming loss, and ranking loss were 0.3796, 0.2535, 0.2832, and 0.2661, respectively. The above six metrices for CircLoc-SAL were 0.7349, 0.3569, 0.5622, 0.4453, 0.1965, and 0.1697. Compared with the metrics of CircLoc (Table 5), CircLoc-GATE and CircLoc-SAL were inferior to CircLoc. As for the AUC and AUPR values on seven localizations, CircLoc yielded the highest AUC and AUPR on most localizations, as shown in Fig. 5. Thus, we can conclude that CircLoc outperformed the CircLoc-GATE and CircLoc-SAL, suggesting that GATE and self-attention layer played key roles in CircLoc for determining circRNA subcellular localizations. Furthermore, the importances of GATE and self-attention layer were not same. As the performance of CircLoc-GATE was lower than CircLoc-SAL, removal of GATE induced more decrease in performance than the removal of self-attention layer. Thus, GATE was more important than self-attention layer for CircLoc.

Table 8 Performance of the variants of CircLoc.
Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Performance of the variants of CircLoc on seven subcellular localizations. (A) Performance measured by AUC; (B) Performance measured by AUPR. The CircLoc yielded higher AUC and AUPR on most subcellular localizations.

With above arguments, GATE provided key contributions for CircLoc. To uncover its specific contributions, we analyzed the attention coefficients of GATE in two encoder layers. Three circRNAs (hsa_circ_0061259, hsa_circ_0077855, and hsa_circ_0013126) were selected. Their subcellular localizations are provided in Table S2. The circRNAs with top five attention coefficients to above three circRNAs under four different feature types and two encoder layers are extracted. Table S2 lists these circRNAs and their subcellular localizations. It can be found that the circRNAs with high attention coefficients had at least one common subcellular localization with the selected circRNA. For example, hsa_circ_0001235 was assigned the top attention coefficient to hsa_circ_0061259 under the circRNA miRNA feature in the first encoder layer. Its subcellular localization (Cytoplasm) is one of the localizations of hsa_circ_0061259 (Nucleolus, Membrane, Nucleus, Cytosol, and Cytoplasm). According to the principle of GATE, the circRNAs with high attention coefficients can give a high influence on the representation of each circRNA. The representation of hsa_circ_0061259 contained the features of hsa_circ_0001235, which was helpful to predict the localization of Cytoplasm. This partly uncovered the essential contributions of GATE in CircLoc.

Comparisons with other methods

To date, there are limited computational models for prediction of circRNA subcellular localizations. Existing models can only deal with circRNAs with exact one subcellular localization37,38. Thus, we employed three multi-label miRNA subcellular localization prediction models for comparing with CircLoc, including MiRLoc32, MirLocPredictor28, and PMLocMSCAM36. Furthermore, we employed two classic multi-label classification algorithms Binary Relevance (BR)100 and RAndom k-labELsets (RAkEL)101 to construct the traditional machine learning-based models, which were implemented by MEKA (http://waikato.github.io/meka/)102. The eight feature types were fed into these algorithms to train the models. For convenience, the models based on BR and RAkEL were still called BR and RAkEL. After trying different classification algorithms, including decision tree, random forest, bayes, support vector machine, and k-nearest neighbor algorithm, we selected the best classification algorithm to construct BR and RAkEL models. The performance of above models on S(3.0) under ten-fold cross-validation is listed in Tables 9 and 10. For easy comparisons, the performance of CircLoc was also listed in these two tables. It can be found that CircLoc yielded the highest average AUC and AUPR. For AUC, the advantage was at least 0.05, whereas the advantage on AUPR was at least 0.03. Furthermore, CircLoc provided the highest AUC and AUPR values on five localizations. Clearly, CircLoc was superior to above five models. This result proved the effectiveness of CircLoc in predicting circRNA subcellular localizations.

Table 9 AUC values of different models in determining circRNA subcellular localizations.
Table 10 AUPR values of different models in determining circRNA subcellular localizations.

Generalization ability of CircLoc

As mentioned in Sect. 2.1, we employed the human circRNA subcellular localization data in RNALocate 2.0 and constructed one training dataset S(2.0) and one test dataset TD1. The CircLoc was trained on S(2.0) and applied to TD1.

We first tested the CircLoc on S(2.0) by ten-fold cross-validation. The ROC and PR curves on seven subcellular localizations are displayed in Fig. 3(C) and 3(D), respectively. The overall metrics are listed in Table 5. It can be observed that the performance on S(2.0) was similar to that on S(3.0). On some metrics (e.g., micro F1), CircLoc on S(2.0) had even evident advantages. Then, CircLoc was applied to TD1. As there were only four localizations (Cytoplasm, Cytosol, Nucleolus, and Nucleus) containing circRNAs, we only counted the AUC and AUPR on these localizations and the overall metrics were not counted. The AUC and AUPR values on above four localizations are listed in Table 11. The four AUC values were 0.6887, 0.6074, 0.7541, and 0.7974, whereas the four AUPR values were 0.9672, 0.0097, 0.0867, and 0.2283. The AUC values were more stable than AUPR values. As there were only seven and two circRNAs on Cytosol and Nucleolus, the AUPR values on these two localizations were very low. The AUPR values on other two localizations (Cytoplasm and Nucleus) were much higher as there were enough circRNAs on these two localizations.

Table 11 Performance of CircLoc on two test datasets.

Further, we also constructed another test dataset TD2 using the human circRNA subcellular localization data in CSCD2. The CircLoc was trained on S(3.0) and applied to TD2. The test results are listed in Tables 5 and 11. Six overall metrics (average AUC, average AUPR, micro F1, macro F1, hamming loss, and ranking loss) were 0.7097, 0.3619, 0.3377, 0.2134, 0.2947, and 0.2805. Compared with the cross-validation results on the training dataset S(3.0), the average AUC, average AUPR, micro F1, and macro F1 reduced and hamming loss and ranking loss increased, suggesting that the performance of the model on TD2 was lower than that on the training dataset. In detail, the average AUC and average AUPR reduced by less than 0.1, other four metrics decreased or increased by more than 0.1. As the circRNAs in TD2 and their localization information were retrieved from CSCD2 rather than RNALocate, this information was strictly isolated from the circRNAs in the training dataset, which can explain the lower performance of the model on the test dataset on one hand. On the other hand, during the test procedure, we found that most associated disease and drug information of test circRNAs was not available, influencing the predicted results of test circRNAs. As for the AUC and AUPR values on seven localizations (Table 11), some values were higher than those on the training dataset and others were lower. However, the differences were not very large, suggesting the normal results on this test.

According to the results on two test datasets, CircLoc has the ability to predict subcellular localizations of new circRNAs. However, its ability was not very strong and had spaces for improvement.

Limitations of this study

In this study, a computational model was built for predicting circRNA subcellular localizations. Although the model exhibited good performance, there still existed some limitations. First, as a deep learning-based model, the interpretability of the model is poor. This model can be a tool to identify latent subcellular localizations of circRNAs. However, it cannot provide clear insights for uncovering the essential differences of different circRNA subcellular localizations. Second, only seven subcellular localizations were considered in this study. Some important localizations (e.g. Extracellular vesicle) were not included because of the extremely imbalanced problem and lack of perfect computational methods for tackling this problem, losing a biological importance. Third, although the localizations with few or numerous circRNAs were removed, the dataset was still imbalanced. The current model did not employ any computational methods to deal with this problem, reducing its performance. Fourth, various circRNA properties collected from multiple public databases were used for building the model. This operation enriched the representations of circRNAs. However, it also reduced the practical value of the model because not all circRNAs shared all used properties. The model may produce unreliable results for circRNAs with partial properties. Fifth, the performance of CircLoc had spaces for improvement, especially its generalization ability. Finally, this study did not report the latent subcellular localizations of some circRNAs and validated them using wet experiments. The model still needed solid tests to confirm its ability in predicting circRNA subcellular localizations. In future, we will continue this work to overcome above limitations.

Conclusion

This study developed a circRNA subcellular localization prediction model, which fused the circRNA sequence and network information. The node2vec, GATE and self-attention layer were integrated in the model for extracting informative circRNA features. The model has good performance and outperforms the models for predicting miRNA subcellular localizations and those using traditional multi-label classification algorithms. Each circRNA feature and some modules in CircLoc provided positive contributions in predicting circRNA subcellular localizations. As the computational models for predicting circRNA subcellular localizations are in initial stage, we hope that this study can attract investigators to design efficient models on this problem. The data and codes used in this study are available at https://github.com/CNHanFei/CircLoc.