Analysis
analysis
This module provides functions for analyzing exclusive k-mers obtained using the maximum entropy principle.
Contents
- mutations_analysis: Perform k-mers analysis and optionally generate a report.
- variants_analysis: Perform variants analysis based on intersection selection.
Todo
- Implement tests.
message = Messages()
module-attribute
Set the Message class for logging.
mutations_analysis(seq_path, ref_path, seq_kmers_exclusive, kmers_positions, word, step, snps_max, annotation_dataframe, sequence_interval, mode='snps', create_report=False, chunk_size=100)
Perform k-mers analysis and optionally generate a report.
This function performs k-mers analysis on the provided sequence data, exclusive k-mers, and annotations. It calculates exclusive adjacencies, checks differences, and returns results in a tuple. If 'create_report' is set to True, a report is generated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq_path
|
str
|
The path to the file containing the sequences in FASTA format. |
required |
ref_path
|
str
|
The path to the reference sequence data file. |
required |
seq_kmers_exclusive
|
list[str]
|
A list of exclusive k-mers. |
required |
word
|
int
|
The length of each k-mer. |
required |
step
|
int
|
The step size for moving the sliding window. |
required |
snps_max
|
int
|
The maximum number of SNPs allowed. |
required |
annotation_dataframe
|
DataFrame
|
DataFrame containing sequence annotations. |
required |
sequence_interval
|
Series
|
Series containing sequence intervals. |
required |
create_report
|
bool
|
Whether to generate a report. Default is False. |
False
|
chunk_size
|
int
|
The chunk size for loading sequences. Default is 100. |
100
|
Returns:
| Type | Description |
|---|---|
tuple[defaultdict[str, list[str]], ndarray] | tuple[None, None] | tuple[defaultdict[str, list[str]], None]
|
tuple[defaultdict[str, list[str]], np.ndarray]: A tuple containing results of k-mers analysis and optionally a generated report. |
Source code in python/gramep/analysis.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
variants_analysis(save_path, intersection_seletion='ALL')
Perform variants analysis based on intersection selection.
This function performs variants analysis based on the specified intersection selection criteria. It reads variant data from the provided file and returns a defaultdict containing analysis results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_path
|
str
|
The path to the file containing variant data. |
required |
intersection_seletion
|
str
|
Criteria for selecting which variants to intersect. To specify the variants for intersection, provide them separated by '-'. For example: 'variant1-variant2-variant3'. Default is 'ALL'. |
'ALL'
|
Returns:
| Type | Description |
|---|---|
defaultdict[str, list[str]]
|
defaultdict[str, list[str]]: A defaultdict containing analysis results. |
Source code in python/gramep/analysis.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |