An interactive multi-omics data mining tool for Idiopathic Pulmonary Fibrosis
About IPF
Idiopathic Pulmonary Fibrosis is a progressive pulmonary disease of unknown etiology characterized, among others, by extensive lung scarring and fibrosis. It manifests mainly into older adults, half of which usually decease due to respiratory failure 3-5 years post diagnosis.
Despite many years of relentless attempts, the scientific community has not yet found a cure for IPF. Currently, the discovery and approval of two new anti-fibrotic treatments, nintedanib and pirfenidone, was able to delay disease progression, but not to increase overall survival, while several side effects have also been reported.
Quickly after their first introduction to the everyday laboratory practice, omics technologies have greatly boosted biomedical research leading to several major discoveries and the inevitable accumulation of a great number of datasets. Due to the heterogeneity of the latter, their scattering across the web and the different approaches used for their analysis, a fruitful meta-analysis is a time-consuming and laborious process even for those possessing the required computational skills. Consequently, centralized repositories/web servers dedicated to the carefull collection, selection and communication are urgently needed.
About Fibromine
Fibromine is a Shiny-based web server that aims to accelerate IPF research by providing functionalities for three point-of-view integration and exploration of numerous IPF and IPF-related transcriptomic and proteomic datasets. In addition, Fibromine includes a custom datasets benchmarking strategy in an attempt to reveal otherwise "invisible" datasets characteristics.
All Fibromine functionalities are highly user-tunable and results, as well as Fibromine included data, can be easily downloaded.
Site map
- Dataset explorer facilitates transcriptomic and proteomic datasets review and integration. Consensus differentially expressed genes and proteins can be fetched along with exploratory heatmaps and volcano plots for the transcriptomic datasets.
- Gene explorer enables the interrogation of gene expression patterns for either single or multiple genes at the same time. Apart from basic gene annotation, further data regarding respective gene ontology, RefSeq sequences and miRNA-mRNA interactions are provided.
- Protein explorer's main functionality is to plot condition-specific protein-protein interaction networks revolving around the protein queried. In addition, protein differential expression pattern is fetched from Fibromine included proteomic datasets.
- Datasets benchmarking tab displays the results of the deployed custom strategy for Fibromine datasets benchmarking.
- Download data tab enables on-click download of normalized transcriptomics data and differential expression analysis results for proteomics datasets.
For more in detail information about Fibromine please refer to the application Docs, or search for the green "About" buttons in each of the explorers.
Transcriptomic datasets explorer
Proteomics datasets explorer
General information
Map to single cell data
miRNA - target DEGs shortlist
miRNA expression summary
Potential targets expression summary
Datasets benchmarking
Benchmarking backstage
IPF scRNA-seq datasets
Single cell RNA-seq is a quickly rising omics technology with several scientific articles already published in the context of IPF, enabling the study of gene expression patterns pathological divergence at an unprecedented resolution level. While this tab currently holds merely a catalogue of the IPF-related scRNA-seq publications, Fibromine's Gene explorer 'Map to single cell data' feature can map any of the queried protein coding gene(-s) to the single cell level.
Important IPF and lung-related single cell online resources can be found at Fibromine's Home tab --> 'Useful links'
DEGs at single cell level
Download normalized transcriptomics data
Download proteomics data
FAQ
Go to Dataset explorer –> Transcriptomic datasets –> Datasets tab and choose (via clicking) the dataset/comparison of interest from the displayed table as shown below. To begin integration press the Search button (1) and you will be redirected to the results tab (3). To reset analysis input it is recommended to reset parameters via the homonym button (2). Note: Every column of the table can be dynamically filtered.
If more than one datasets have been selected then consensus differentially expressed genes are presented at the Transcriptomics summary tab (2), with log2FcAve (6) column holding the consensus fold change values. In the case of a single dataset interrogation the aforementioned tab displays the same data presented at the Transcriptomics analytically tab (4), namely all the DEA results of the dataset. Proteomics summary tab (3) presents the differentially expressed proteins, if any, coded by the genes of the Transcriptomics summary tab.
Analysis can be further fine-tuned using the Out of … Datasets (5) column (see Docs for mor details), as well as by changing the default p-value and FC thresholds (7).
The user can also perform pathway analysis (8) using the concensus differentially expressed genes presented in (2). The gene groups used as background for the analysis can be filtered from within the results table (9).
Dataset plots tab (1) gives to the user the ability to plot an interactive heatmap and volcano plot for any of the datasets queried in step 1. To plot any of the two, select a dataset from the tab's table (2) and then press the respective Plot button, (3-4).
Go to Dataset explorer –> Transcriptomic datasets –> Datasets tab and choose (via clicking) the dataset/comparison of interest from the displayed table as shown below. To begin integration press the Search button (1) and you will be redirected to the results tab (3). To reset analysis input it is recommended to reset parameters via the homonym button (2). Note: Every column of the table can be dynamically filtered.
Across species consensus differentially expressed genes can be found at the Transcriptomics summary tab (2). For data display, only the human homologues are presented in (2), with log2FcAve (6) column holding the consensus fold change values among the human sampling datasets. The murine counterparts are of the same direction of deregulation (up/down) and can be inspected from the Transcriptomics analytically tab (4). Proteomics summary tab (3) presents the differentially expressed proteins, if any, coded by the genes of Transcriptomics summary tab.
Analysis can be further fine-tuned using the Out of … Datasets (5) column (see Docs for more details), as well as by changing the default p-value and FC thresholds (7).
The user can also perform pathway analysis (8) using the concensus differentially expressed genes presented in (2). The gene groups used as background for the analysis can be filtered from within the results table (9).
Go to Dataset explorer –> Proteomic datasets –> Datasets tab and choose (via clicking) the dataset/comparison of interest from the displayed table as shown below. To begin analysis press the Search button (1) and you will be redirected to the results tab (3). To reset analysis input it is recommended to reset parameters via the homonym button (2). Note: Every column of the table can be dynamically filtered.
If more than one datasets have been selected then consensus differentially expressed proteins are presented at the Proteomics summary tab (2). In the case of a single dataset interrogation the aforementioned tab displays the same data presented at the Proteomics analytically tab (3), namely all the DE proteins of the queried datasets. Out of … Datasets (4) column enables a better control on the reported differentially expressed features just like in the transcriptomic datasets, while ExpressionDirection (5) column holds the consensus direction of deregulation. Columns (6-7) host data regarding the genes coding the queried proteins.
Navigate to the Gene explorer and type your gene(-s) of interest into the search box (1). Then select in which of the supported species you want to search using the radio buttons (2) below the search box and finally press the Search button (3). The results presented into the following screenshots can be replicated using the Example button (4). For more details please hover over any of the aforementioned buttons/boxes or even press on the About […] buttons of the tab.
The results of the respective query are constructed as thus: (i) the Queried genes (4) box displays some general information regarding the queried genes, (ii) differential expression statistics for the gene(-s) queried are visualized into the Expression data tab –> DEG statistics table (6), (iii) if there are any differentially expressed proteins (in the proteomics datasets) coded by the queried genes, these are presented at the Expression data tab –> DEP statistics table (7) at the bottom of the page, (iv) Gene ontology terms associated with the gene(-s) of interest are presented at the Gene ontology tab (8) and finally, (v) any RefSeq related sequences or miRNA-mRNA potential interactions as described in miRDB are presented at the RefSeq - miRNA tab (9). Finally, to map queried genes to the NU-Pulmonary online resource in a species-specific fashion use the Map to single cell data (5) feature hyperlinks.
Navigate to Single cell data > Search data tab, select from the drop-down list (1) a gene of interest and press the “Search” button.
avgLogFC (2) is color coded according to the direction of gene expression deregulation (red shadows for positive fold changes and green shadows for negative ones). pVal (3) and pValAdj (4) columns are colored red each time the respective value is < 0.05. Columns pct1/ pct2 (5) record the percentage of cells from each comparison part that express the queried gene.
For a visual inspection of feature expression, please, use the Map to single cell data tool of Gene explorer. See FAQ 4.
Navigate to the Protein explorer, type the name of the gene coding your protein of interest into the search box (1) at the upper left corner of the explorer and then press the Search (2) button below. The results presented are constructed as thus:
- the info boxes (3) at the top of the page summarise some important information regarding the protein queried,
- the Search results (4) tab displays some general information sourced from UniProt/SwissProt,
- the differential expression results, if any, are presented at the homonym table (5)
Note: the results presented in the following example screenshots can be recreated by pressing the Example button below the Search button of the explorer.
To construct a condition specific protein-protein interaction (PPI) network, first search for a specific protein as described in step 1 of 3. Search for the expression pattern of a specific protein..
Then, move to the PPI network (1) tab of the Protein explorer and press the Plot (2) button. The network should be created within a few seconds.
To annotate the network select tissue (3) and then comparison (4) from the homonym dropdown menus at the right of the network. Finally press the Annotate (5) button and wait for a few seconds. In order to annotate the network using data from different conditions, press the Reset button next to Annotate (5). To further fine-tune network annotation, change the p-value and fold change thresholds used.
For transparency reasons, the data used to annotate the network are presented into the Data used (6) tab.
IPF_vs_Ctrl lung (1) and BleomycinD14_vs_Ctrl (2) gene co-expression networks (GCN) can be created via the Gene coexpression tab.
Select any of the genes from the drop-down menu (3) and then press the “Plot” button. Queried gene is colored red. Note!!! Per species first network creation can take some seconds more. Please, be patient.
Depicted network members belong to the same co-expression module with the queried gene and have by default a module membership (MM) and gene significance (GS) value above the 60th percentile of the respective distribution. These values can be tuned using the panel at the left (4).
Network edges represent TOM (Topology Overlap Measure) values above the 3rd quartile of the respective module's distribution. No node is allowed not to have an edge, apart from the queried gene.
Navigate to the miRNA explorer, choose any of the miRNAs found differentially expressed in IPF compared to Ctrl lung (1) and press the “Search” button.
miRNA expresion summary (2) returns expression statistics for the queried gene. Potential targets expression summary (3) records the same statistics for potential mRNA interactors (miRDB-sourced) having the opposite direction of deregulation from the miRNA selected.
Fibromine documentation page
Note: for a visual guide about Fibromine's use, please, refer to the FAQ tab
Dataset explorer
Dataset explorer enables the exploration and integration of all Fibromine datasets. By clicking on any combination of the datasets displayed at the Transcriptomic datasets > Datasets tab or Proteomic datasets > Datasets tab tables and then pressing the Search button, the user can interrogate the consensus deregulated genes and proteins found on these datasets, respectively. Results are presented at the DEA statistics tab for the transcriptomics datasets and at the DEA data tab for the proteomic datasets.
Currently, there are two supported integration schemes for transcriptomics datasets:
- integrating single-species datasets
- integrating datasets across species
Proteomic datasets can be at the moment integrated using only the first scheme, as there are still no murine proteomic datasets included.
1. Integrating single-species transcriptomic/proteomic datasets
In this case, a feature (gene/protein) is defined as consensus deregulated if it has been found significantly differentially expressed in at least half of the user selected datasets but not significantly differentially expressed towards the other direction in the rest. These features are summarized at Transcriptomic datasets > DEA statistics > Consensus DEGs and Proteomic datasets > DEA data > Consensus DEPs tables, respectively.
log2FcAve column of Consensus DEGs table holds the consensus direction and size of deregulation. For its calculation we perform the following per-gene computations:
- Define the most frequent direction of deregulation across the selected datasets (up or down regulation)
- Isolate those datasets presenting the most frequent direction of deregulation
- Calculate and present the average fold change value across the datasets selected at step 2.
Proteins' consensus direction of deregulation is displayed at the Expression direction column of Consensus DEPs table. It is calculated as the most frequent among the integrated datasets.
If a feature is found differentially expressed in only two datasets with inconsistent direction of deregulation between the two, this feature is not included into respective summary table, but can still be reviewed in the Transcriptomics analytically, Proteomics analytically tables.
Note: If a single transcriptomic or proteomic dataset is selected by the user, no data integration is performed resulting in the Consensus DEGs/DEPs and […] analytically tables being identical to each other.
Note: Differential gene expression is defined using, by default, the thresholds of |FC| > 1.2 and p-value < 0.05. The user can change these thresholds from the control panel at the left part of the Transcriptomic datasets > DEA statistics tab.
Note: Although half of the user-selected datasets are used as the minimum threshold for a feature to be identified as consensus deregulated, the user can further toughen this threshold via the Out of n Datasets column of the Consensus DEGs/Consensus DEPs tables, thus limiting the number of reported consensus deregulated features.
Finally, through Dataset explorer one can plot per a dataset heatmap and volcano plot. For the heatmap plot, the top 1000 most variable differentially expressed genes are taken into consideration, while for the volcano plot, a |FC| threshold of 1.2 and a p-value threshold of 0.05 are used.
2. Integrating transcriptomic datasets across species
In this case, a gene is defined as consensus deregulated if:
- there is an 1:1 human:mouse homology,
- it has been found significantly differentially expressed in at least half of each species datasets but not significantly deregulated towards the other direction in none of the rest (again per species)
- it has a consistent direction of deregulation in both species.
These features are presented as in the case of single species integration, with only two exceptions:
- Consensus DEGs table summarizes and presents only the human member of across-species consensus deregulated genes
- “Out of … datasets” column refers to the number of human datasets
Across-species consistently deregulated genes (both human and murine) can be inspected analytically at Transcriptomic datasets > DEA statistics > Transcriptomics analytically tab.
Gene explorer
Gene explorer supports the examination of the expression patterns for a single or multiple protein coding and non-coding genes of human and/or mouse origin. For the protein coding genes, apart from the transcriptomic data, a proteomic signature of deregulation fetched from the included proteomic datasets is also returned. Finally, Gene ontology and RefSeq - miRNA tabs report useful information regarding the queried genes.
Note: Gene search is case-insensitive and accepts either an Ensembl gene ID or the respective valid HGNC symbol.
Note: miRNA search is also supported to retrieve data from “Non-coding RNA expression arrays”. User can search for an miRNA using the respective miRBase accession or ID (e.g. MIMAT0000446 or hsa-miR-127-3p).
Protein explorer
Note: Search in the Protein explorer must follow official HGNC nomenclature or Ensembl gene IDs and in contrast to Gene explorer is case sensitive.
Protein explorer enables the exploration of protein expression data and some general annotation regarding the protein of interest, provided as input the name of the respective coding gene.
The main feature of the explorer is the creation of context-specific protein-protein interaction (PPI) networks, based on STRINGdb and transcriptomics expression data of Fibromine. Note: available proteomic data are able to cover only a few proteins and thus were not used for PPI networks annotation.
Currently, the networks created consist of two shells in which only high confidence interactions are included (interaction score > 700). Briefly, the first shell includes maximum 9 interactors and the queried protein, while the second 18 interactors maximum (the two most confident interactors for each of the 9 proteins of the first shel interacting with the queried protein). Technically, a DrL layout is applied on the network, with protein interaction scores used as edge weights. Note: PPI interaction network visualizations can be downloaded with right click on the plot and then choosing the “Save image as…” option.
To annotate a general PPI network the user should use the panel at the right of the “PPI network tab”. The nodes of the resulting network are colored based on whether the respective genes are:
- differentially expressed (red for the upregulated or blue for the downregulated ones)
- or not (grey).
Nodes maintaining the previously used color (also named by their STRINGdb ID) represent TrEMBL proteins that are currently not supported by Fibromine, and thus their expression status cannot be assayed. Data used for each context-specific annotation are displayed for inspection at the last tab of the explorer, Data used tab.
The internal process applied to annotate the network is described in Workflow 1.
Workflow 1: Back-end process for context-specific PPI networks creation.
Note: To re-annotate an already annotated PPI network, please, reset the parameters using the homonym button
Datasets benchmarking tab
Dataset benchmarking tab presents the results of a custom datasets benchmarking process that is implemented to facilitate the user in the selection of the most “interesting” datasets to work with.
The basic concept of the applied strategy is to interrogate all transcriptomic datasets using a set of seven metrics, post to datasets/comparisons separation based on species and technology (microarrays/RNA-seq). For each of the dataset/comparison groups we shape a per metric distribution and each dataset receives a star for every metric value lying within a pre-specified range of the respective metric. The bigger number of stars a dataset has collected, namely the more metric values were found within the range, the more “trustworthy” it is considered.
The seven metrics along with the pre-defined ranges in which a dataset receives a star are:
Metric | Range | |
---|---|---|
1. | Number of detected genes | Median to 90th percentile |
2. | Number of differentially expressed genes | Inter-quartile range |
3. | Representation of a number of already known | |
pro-fibrotic genes among the differentially | Above the median | |
expressed ones | ||
4. | The number of differentially expressed genes | |
falling in each of three absolute FC bins: | Inter-quartile range | |
low (1.2,2) , intermediate [2,5) and high [5,) | ||
5. | Up/Down differentially expressed genes ratio | Inter-quartile range |
6. | The area under the p-value curve | Inter-quartile range |
7. | The area under the adjusted p-value curve | Inter-quartile range |
Download data tab
Normalized gene expression data and preomics differential expression data can be downloaded from the drop-down menus of the Download data tab using the ID of each dataset. The latter can be reviewed via the Transcriptomic datasets > Datasets and Proteomic datasets > Datasets tables of Dataset explorer, respectively.