java軟件運行常見錯誤_GSEA原理以及軟件的運行以及常見的錯誤及解決辦法

meinan 2025-04-11 案例展示 13 次瀏覽 0個評論

第一部分 GSEA原理

目標：預先定義的基因集S是否隨機的分布在排序的基因list

1. 表達譜,樣品分為兩類,以1/2定義

GSEA considers experiments with genomewide expression profiles from samples belonging to two classes, labeled

1 or 2.

2. 基因按照表達與分類的相關性排序

Genes are ranked based on the correlation between their expression and the class distinction by using?any suitable metric

3. 計算富集打分(ES)

Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom. We expect?that sets related to the phenotypic distinction will tend to show the latter distribution.

Step 1: Calculation of an Enrichment Score.

We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L.

The score is calculated by walking down the list L, increasing?a running-sum statistic?when we encounter a gene in S and decreasing it when we encounter genes not in S.

The magnitude of the increment depends on the correlation of the gene with the phenotype. The enrichment score is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov–Smirnov-like statistic

a running-sum statistic，

4. 評估ES的顯著性(p值)

采用permutation ：可以選擇1000次，500次等

5. 多重檢驗校正(FDR值)

ref：

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

http://www.pnas.org/content/102/43/15545

https://blog.csdn.net/qq_29300341/article/details/52956052

第二部分? 軟件的運行

下載鏈接：http://software.broadinstitute.org/gsea/downloads.jsp

需要事先安裝JAVA，此軟件是基于JAVA運行的

1、軟件界面

2，文件準備

2.1.? Expression dataset file (res, gct, pcl, or txt) ?? ?樣品表達文件

一般是以鍵保存為.txt 格式，然后將后綴.txt改為.gct就可以了

#表格中的二列描述一定要有，寫成na列也行，但是必須有，我之前就沒有這一列，折騰了好久一直報錯不知道問題出在哪里

2.2? Phenotype labels file (cls) ?? ?樣品表型分類文件

用文本文件寫成.cls結尾的就行，同樣是tab分割

2.3.? Gene sets file (gmx or gmt) ?? ?預定義基因集(非必須)

這個文件可以自己根據上面的格式生成，如之前的KEGG本地化就可以生成這樣的文件

也可以選擇軟件中定義的數據庫

2.4.? Chip (array) annotation file (chip) ?? ?芯片注釋文件(非必須)

軟件上可以選擇

3、run

3.1 加載數據，將上面準備好的數據加載

3.2 選擇參數

1) collapse dataset to gene symbolstrue?? ?芯片數據

false?? ?測序的基因表達矩陣

2) Chip platform非芯片數據可不選

芯片數據則按照芯片類型選擇

3) permutation typephenotype推薦，要求每組樣品至少7個

gene_set 適用樣品少

4) 顯著性參數若選擇phenotype，FDR可設置0.25

若選擇gene_set, FDR需低于0.05

5) metric for ranking genes一般可以選擇log2_Ratio_of_classes，就是logFC

還可以根據自己需要選擇另外的參數

6) gene set database可以選擇軟件中的如KEGG，GO，以及GO里面的cc，bp，mf等等

也可以是用戶自己定義的gmt文件

7) 用戶還可以選擇自己的結果保存路徑

4、點擊下面的Run按鈕

5、結果解讀

第三部分? 常見的錯誤及解決辦法

1、第一種錯誤Java heap space ,OutOfMemoryError

目前就遇到這個最頭疼的錯誤，折騰了好久

意思就是運行GSEA的時候OutOfMemoryError,運行內存不足

如這張圖的右下角，你會看到運行的內存，這里是84M，用了43M

那就改運行java的運行內存吧，我自己的笨辦法是下載了一個eclipse軟件https://www.eclipse.org/downloads/

然后按照下面的教程改然后就可以運行了，你再次運行的時候可以看到上面的那個84M會變大很多

https://jingyan.baidu.com/article/5d6edee2f5efff99ebdeec63.html

https://blog.csdn.net/tomorrow13210073213/article/details/53031818

可以更改的大一些

對基因進行排序的各種參數解釋

Metrics for Ranking Genes

For categorical phenotypes, GSEA determines a gene’s mean expression value for each phenotype and then uses one of the following metrics to calculate the gene’s differential expression with respect to the two phenotypes. To use median rather than mean expression values, set the Median for class metrics parameter to True, as described above.

●??????Signal2Noise(default) uses the difference of means scaled by the standard deviation. Note: You must have at least three samples for each phenotype to use this metric.

where μ is the mean and σ is the standard deviation; σ has a minimum value of .2 * absolute(μ), where μ=0 is adjusted to μ=1. The larger the signal-to-noise ratio, the larger the differences of the means (scaled by the standard deviations); that is, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●?????tTestuses the difference of means scaled by the standard deviation and number of samples. Note: You must have at least three samples for each phenotype to use this metric.

where μ is the mean, n is the number of samples, and σ is the standard deviation; σ has a minimum value of

.2 * absolute(μ), where μ=0 is adjusted to μ=1. The larger the tTest ratio, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●?????????Ratio_of_Classes?(also referred to as fold change) uses the ratio of class means to calculate fold change for natural scale data:

where μ is the mean. The larger the fold change, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●???????Diff_of_Classes?uses the difference of class means to calculate fold change for log scale data:

where μ is the mean. The larger the fold change, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●????log2_Ratio_of_Classes?uses the log2 ratio of class means to calculate fold change for natural scale data:

where μ is the mean. This is the recommended statistic for calculating fold change for natural scale data.

來源于：丁香園夏木1220

轉載請注明來自青島峻峰水處理設備有限公司，本文標題：《java軟件運行常見錯誤_GSEA原理以及軟件的運行以及常見的錯誤及解決辦法》

meinan 446篇文章站點微博

每一天，每一秒，你所做的決定都會改變你的人生！

? 2025年4月 ?
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

chenyahui管理員

最新文章

網站收藏

java軟件運行常見錯誤_GSEA原理以及軟件的運行以及常見的錯誤及解決辦法

最近發表

友情鏈接

文章目錄

chenyahui管理員

最新文章

網站收藏

java軟件運行常見錯誤_GSEA原理以及軟件的運行以及常見的錯誤及解決辦法

西昌開年規劃最新消息，全面升級城市品質，打造西部創新高地，西昌全面升級，打造西部創新高地，2024開年城市品質新篇章

商業騙局套路最新消息，揭秘最新商業騙局套路，警惕投資風險！

中國雜技團2023年最新招聘信息全面發布，夢想舞臺等你綻放！，2023中國雜技團招聘啟事，夢想舞臺，等你璀璨綻放

靈寶市最新醫保信息公示，政策解讀與民眾福利一覽，靈寶市醫保政策全解讀，最新信息與民眾福利一覽

仁鼎金湖最新樓盤信息，仁鼎金湖最新樓盤資訊速遞

濱湖勝利最新消息，濱湖勝利最新動態概述

廈門最新中小學招生信息，2023年廈門中小學招生政策及信息匯總

臨淄最新樓盤信息網，全方位解讀臨淄樓市動態，助您輕松購房，臨淄樓市全景解析，最新樓盤信息一網打盡

“2025年正版資料免費大全生肖版”·精準評估_青島峻峰水處理設備有限公司

“2025新澳門天天官方免費大全”·最新分析_青島峻峰水處理設備有限公司

“新奧彩908008網站”·深度分析_青島峻峰水處理設備有限公司

“澳門天天彩期期精準單雙波色”·靈敏反饋_青島峻峰水處理設備有限公司

“澳門一肖一特一中中什么號碼”·深度洞察_青島峻峰水處理設備有限公司

“今日澳彩09086開獎結果是什么”·高效解讀_青島峻峰水處理設備有限公司

“馬會傳真,澳門免費資料十年”·相繼追蹤_青島峻峰水處理設備有限公司

“2025澳門天天開好彩大全開獎結果”·瞬間追蹤_青島峻峰水處理設備有限公司

最近發表

友情鏈接

文章目錄