Skip to content

Grid search

grid_search module.

This module contains the functions used to perform a grid search to suggest a value for word and step size.

Contents
  • grid_search: Perform grid search to suggest a value for word and step size.

message = Messages() module-attribute

Set the Message class for logging.

Perform grid search to suggest a value for word and step size.

Parameters:

Name Type Description Default
reference_path str

Path to reference sequence.

required
sequence_path str

Path to sequence.

required
min_word int

Min word size.

required
max_word int

Max word size.

required
min_step int

Min step size.

required
max_step int

Max step size.

required
dictonary str

DNA dictionary. Defaults to 'DNA'.

'DNA'
chunk_size int

The chunk size for loading sequences. Default is 100.

100

Returns:

Type Description

Message class

Source code in python/gramep/grid_search.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def grid_search(
    reference_path: str,
    sequence_path: str,
    min_word: int,
    max_word: int,
    min_step: int,
    max_step: int,
    dictonary: str = 'DNA',
    chunk_size: int = 100,
):
    """
    Perform grid search to suggest a value for word and step size.

    Args:
        reference_path (str): Path to reference sequence.
        sequence_path (str): Path to sequence.
        min_word (int): Min word size.
        max_word (int): Max word size.
        min_step (int): Min step size.
        max_step (int): Max step size.
        dictonary (str, optional): DNA dictionary. Defaults to 'DNA'.
        chunk_size (int, optional): The chunk size for loading sequences. \
        Default is 100.

    Returns:
        Message class
    """

    selected_word = 0
    selected_step = 0
    selected_result = 0
    for word in range(min_word, max_word + 1):
        for step in range(min_step, max_step + 1):
            message.info_grid_running(word, step)
            seq_kmers = load_sequences(
                file_path=sequence_path,
                word=word,
                step=step,
                dictonary=dictonary,
                reference=False,
                chunk_size=chunk_size,
            )
            ref_kmers = load_sequences(
                file_path=reference_path,
                word=word,
                step=step,
                dictonary=dictonary,
                reference=True,
                chunk_size=chunk_size,
            )

            result = len(
                list(
                    dedupe(
                        kmers_difference(seq_kmers, ref_kmers),
                        threshold=70,
                        scorer=fuzz.ratio,
                    )
                )
            )

            if result > selected_result:
                selected_word = word
                selected_step = step
                selected_result = result

    message.info_selected_parameters(selected_word, selected_step)
    message.info_done()