Skip to content

Mutations

mutations module.

This module contains the functions used to get the mutates from sequences using the maximum entropy principle.

Contents
  • get_mutations: Perform mutation analysis on sequence data.
  • get_variants_intersection: Get the intersection of variants.
Todo
  • Implement tests.

message = Messages() module-attribute

Set the Message class for logging.

get_mutations(reference_path, sequence_path, save_path, word, step, annotation_path=None, mode='snps', snps_max=1, dictonary='DNA', create_report=False, chunk_size=100)

Perform mutation analysis on sequence data.

This function performs mutation analysis on the provided sequence data. It calculates variations, k-mer frequencies, and other relevant information based on the input parameters.

Parameters:

Name Type Description Default
reference_path str

The path to the reference sequence data file.

required
sequence_path str

The path to the sequence data file.

required
save_path str

The path to save the generated results.

required
word int

The length of each k-mer.

required
step int

The step size for moving the sliding window.

required
annotation_path str

The path to the annotation data file.

None
snps_max int

The maximum number of allowed SNPs within an exclusive k-mer. Default is 1.

1
dictonary str

The DNA dictionary for k-mer analysis. Default is 'DNA'.

'DNA'
create_report bool

Whether to create a report. Default is False.

False
chunk_size int

The chunk size for loading sequences. Default is 100.

100

Returns:

Type Description

Message class: A message confirming the analysis was completed.

Source code in python/gramep/mutations.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
def get_mutations(
    reference_path: str,
    sequence_path: str,
    save_path: str,
    word: int,
    step: int,
    annotation_path: str | None = None,
    mode: str = 'snps',
    snps_max: int = 1,
    dictonary: str = 'DNA',
    create_report: bool = False,
    chunk_size: int = 100,
):
    """
    Perform mutation analysis on sequence data.

    This function performs mutation analysis on the provided sequence data.\
    It calculates variations, k-mer frequencies, and other relevant\
    information based on the input parameters.

    Args:
        reference_path (str): The path to the reference sequence data file.
        sequence_path (str): The path to the sequence data file.
        save_path (str): The path to save the generated results.
        word (int): The length of each k-mer.
        step (int): The step size for moving the sliding window.
        annotation_path (str): The path to the annotation data file.
        snps_max (int, optional): The maximum number of allowed SNPs within \
        an exclusive k-mer. Default is 1.
        dictonary (str, optional): The DNA dictionary for k-mer analysis. \
        Default is 'DNA'.
        create_report (bool, optional): Whether to create a report. Default is False.
        chunk_size (int, optional): The chunk size for loading sequences. \
        Default is 100.

    Returns:
        Message class: A message confirming the analysis was completed.
    """

    message.info_start_objetive('get-mutations method')

    # Check if report will be generated
    annotation_df, sequence_interval = None, None
    if create_report:
        if annotation_path is not None:
            annotation_df, sequence_interval = annotation_dataframe(
                annotation_path=annotation_path
            )
        else:
            message.warning_annotation_file()
            annotation_df, sequence_interval = None, None

    seq_kmers = load_sequences(
        file_path=sequence_path,
        word=word,
        step=step,
        dictonary=dictonary,
        reference=False,
        chunk_size=chunk_size,
    )
    ref_kmers = load_sequences(
        file_path=reference_path,
        word=word,
        step=step,
        dictonary=dictonary,
        reference=True,
        chunk_size=chunk_size,
    )

    seq_kmers_exclusive = kmers_difference(seq_kmers, ref_kmers)
    seq_kmers_intersections = kmers_intersections(seq_kmers, ref_kmers)

    save_exclusive_kmers(
        sequence_path=sequence_path,
        seq_kmers_exclusive=seq_kmers_exclusive,
        save_path=save_path,
    )
    save_intersection_kmers(
        sequence_path=sequence_path,
        seq_kmers_intersections=seq_kmers_intersections,
        save_path=save_path,
    )

    del ref_kmers
    # Analize kmers
    message.info_founded_exclusive_kmers(len(seq_kmers_exclusive))
    message.info_get_kmers()
    message.info_wait()

    diffs_positions, report = mutations_analysis(
        seq_path=sequence_path,
        ref_path=reference_path,
        seq_kmers_exclusive=seq_kmers_exclusive,
        kmers_positions=seq_kmers,
        word=word,
        step=step,
        snps_max=snps_max,
        annotation_dataframe=annotation_df,
        sequence_interval=sequence_interval,
        mode=mode,
        create_report=create_report,
        chunk_size=chunk_size,
    )

    if diffs_positions is None:
        variations = []
        save_diffs_positions(
            sequence_path=sequence_path,
            ref_path=reference_path,
            variations=variations,
            save_path=save_path,
        )
        write_report(
            report=[], sequence_path=sequence_path, save_path=save_path
        )
        message.error_no_exclusive_kmers()
        exit(1)

    freq_kmers, variations = get_freq_kmers(diffs_positions)

    save_diffs_positions(
        sequence_path=sequence_path,
        ref_path=reference_path,
        variations=variations,
        save_path=save_path,
    )

    if create_report:
        write_report(
            report=report, sequence_path=sequence_path, save_path=save_path
        )

    write_frequencies(
        freq_kmers=freq_kmers, sequence_path=sequence_path, save_path=save_path
    )

    if len(variations) > 100:
        message.warning_no_plot()
        return message.info_done()

    plot_graphic(
        variations=variations,
        reference_path=reference_path,
        freq_kmers=freq_kmers,
        sequence_name=sequence_path,
        save_path=save_path,
    )
    return message.info_done()

get_only_kmers(reference_path, sequence_path, word, step, save_path, dictonary='DNA', chunk_size=100)

Extract only exclusive k-mers from sequences.

This function extracts only exclusive k-mers from the provided sequence data. It calculates exclusive k-mers based on the input parameters.

Parameters:

Name Type Description Default
reference_path str

The path to the reference sequence data file.

required
sequence_path str

The path to the sequence data file.

required
word int

The length of each k-mer.

required
step int

The step size for moving the sliding window.

required
dictonary str

The DNA dictionary for k-mer analysis. Default is 'DNA'.

'DNA'
save_path str | None

The path to save the generated results. Default is None.

required
chunk_size int

The chunk size for loading sequences. Default is 100.

100

Returns:

Type Description
list[str]

list[str]: A list of exclusive k-mers.

Source code in python/gramep/mutations.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
def get_only_kmers(
    reference_path: str,
    sequence_path: str,
    word: int,
    step: int,
    save_path: str,
    dictonary: str = 'DNA',
    chunk_size: int = 100,
) -> list[str]:
    """
    Extract only exclusive k-mers from sequences.

    This function extracts only exclusive k-mers from the provided sequence data.\
    It calculates exclusive k-mers based on the input parameters.

    Args:
        reference_path (str): The path to the reference sequence data file.
        sequence_path (str): The path to the sequence data file.
        word (int): The length of each k-mer.
        step (int): The step size for moving the sliding window.
        dictonary (str, optional): The DNA dictionary for k-mer analysis. \
        Default is 'DNA'.
        save_path (str|None, optional): The path to save the generated results. Default is None.
        chunk_size (int, optional): The chunk size for loading sequences. \
        Default is 100.

    Returns:
        list[str]: A list of exclusive k-mers.
    """
    message.info_start_objetive('get-only-kmers method')
    seq_kmers = load_sequences(
        file_path=sequence_path,
        word=word,
        step=step,
        dictonary=dictonary,
        reference=False,
        chunk_size=chunk_size,
    )
    ref_kmers = load_sequences(
        file_path=reference_path,
        word=word,
        step=step,
        dictonary=dictonary,
        reference=True,
        chunk_size=chunk_size,
    )

    seq_kmers_exclusive = kmers_difference(seq_kmers, ref_kmers)
    message.info_founded_exclusive_kmers(len(seq_kmers_exclusive))

    save_exclusive_kmers(
        sequence_path=sequence_path,
        seq_kmers_exclusive=seq_kmers_exclusive,
        save_path=save_path,
    )
    message.info_done()

    return seq_kmers_exclusive

get_variants_intersection(save_path, intersection_seletion='ALL')

Get variants intersection based on selection criteria.

This function retrieves variants intersection data based on the specified selection criteria. The function reads data from the provided save path and performs intersection calculations according to the chosen selection option.

Parameters:

Name Type Description Default
save_path str

The path to the directory containing data to process.

required
intersection_seletion str

The selection criteria for variants intersection. Options: 'ALL' (default).

'ALL'

Returns:

Type Description
defaultdict[str, list[str]]

defaultdict[str, list[str]]: A dictionary mapping sequence IDs to lists of variants based on the specified selection criteria.

Todo
  • Rewrite in Rust.
Source code in python/gramep/mutations.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
def get_variants_intersection(
    save_path: str, intersection_seletion: str = 'ALL'
) -> defaultdict[str, list[str]]:
    """
    Get variants intersection based on selection criteria.

    This function retrieves variants intersection data based on the specified \
    selection criteria.
    The function reads data from the provided save path and performs intersection \
    calculations
    according to the chosen selection option.

    Args:
        save_path (str): The path to the directory containing data to process.
        intersection_seletion (str, optional): The selection criteria for variants \
        intersection. Options: 'ALL' (default).

    Returns:
        defaultdict[str, list[str]]: A dictionary mapping sequence IDs to lists of \
        variants based on the specified selection criteria.

    Todo:
        * Rewrite in Rust.
    """

    message.info_start_objetive('get_variants_intersection method')
    variants_intersections = variants_analysis(
        save_path, intersection_seletion
    )
    message.info_done()
    return variants_intersections