bwa
Methods and Classes to run and interact with BWA¶
The BwaAlnInteractive class is used to map
a list of queries to the reference genome. A single query may be mapped using
map_one(), while multiple queries may
be mapped using map_all(). The latter
is more efficient than mapping queries one-by-one.
The queries are provided to these methods using the Query
class, which contains the unique identifier for the query and the bases to query. These methods
return a BwaResult, which represents zero or more hits
(or alignments) found by BWA for the given query. Each hit is represented by a
BwaHit object. In some cases, BWA will write fewer (or no)
hits in the "XA" tag than the total number hits reported in the "HN". This occurs when BWA finds more
hits than max_hits (see bwt aln -X).
Use of this module requires installation of a custom version of BWA named bwa-aln-interactive.
See:
- https://github.com/fulcrumgenomics/bwa-aln-interactive
- https://bioconda.github.io/recipes/bwa-aln-interactive/README.html
Example¶
>>> from pathlib import Path
>>> ref_fasta = Path("./tests/offtarget/data/miniref.fa")
>>> query = Query(bases="TCTACTAAAAATACAAAAAATTAGCTGGGCATGATGGCATGCACCTGTAATCCCGCTACT", id="NA")
>>> bwa = BwaAlnInteractive(ref=ref_fasta, max_hits=1)
>>> result = bwa.map_one(query=query.bases, id=query.id)
>>> result.hit_count
1
>>> len(result.hits)
1
>>> result.hits[0]
BwaHit(refname='chr1', start=61, negative=False, cigar=Cigar(elements=(CigarElement(length=60, operator=<CigarOp.M: (0, 'M', True, True)>),)), edits=0)
>>> query = Query(bases="AAAAAA", id="NA")
>>> bwa.map_all(queries=[query])
[BwaResult(query=Query(id='NA', bases='AAAAAA'), hit_count=3968, hits=[])]
>>> bwa.close()
True
Attributes¶
BWA_AUX_EXTENSIONS
module-attribute
¶
The file extensions for BWA index files
BWA_EXECUTABLE_NAME
module-attribute
¶
The executable name for the interactive build of bwa aln.
MAX_GAP_EXTENSIONS
module-attribute
¶
The default maximum number of gap extensions allowed in the full query sequence
MAX_GAP_OPENS
module-attribute
¶
The default maximum number of gap opens allowed in the full query sequence
MAX_MISMATCHES
module-attribute
¶
The default maximum number of mismatches allowed in the full query sequence
MAX_MISMATCHES_IN_SEED
module-attribute
¶
The default maximum number of mismatches allowed in the seed region
Classes¶
BwaAlnInteractive ¶
Bases:
Wrapper around a novel mode of 'bwa aln' that allows for "interactive" use of bwa to keep the process running and be able to send it chunks of reads periodically and get alignments back without waiting for a full batch of reads to be sent.
See:
- https://github.com/fulcrumgenomics/bwa-aln-interactive
- https://bioconda.github.io/recipes/bwa-aln-interactive/README.html
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
the maximum number of hits to report - if more than this number of seed hits are found, report only the count and not each hit. |
|
|
reverse complement each query sequence before alignment. |
|
|
if True include hits to references with names ending in _alt, otherwise do not include them. |
|
the SAM alignment header. |
Source code in prymer/offtarget/bwa.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | |
Functions¶
__init__ ¶
__init__(
ref: Path ,
max_hits: int ,
executable: str | Path = BWA_EXECUTABLE_NAME,
max_mismatches: int = 3,
max_mismatches_in_seed: int = 3,
max_gap_opens: int = 0,
max_gap_extensions: int = -1,
min_indel_to_end_distance: int = 3,
seed_length: int = 20,
reverse_complement: bool = False,
include_alt_hits: bool = False,
threads: Optional [int ] = None,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref |
|
the path to the reference FASTA, which must be indexed with bwa. |
required |
max_hits |
|
the maximum number of hits to report - if more than this number of seed hits are found, report only the count and not each hit. |
required |
executable |
|
string or Path representation of the |
|
max_mismatches |
|
the maximum number of mismatches allowed in the full query sequence |
3
|
max_mismatches_in_seed |
|
the maximum number of mismatches allowed in the seed region |
3
|
max_gap_opens |
|
the maximum number of gap opens allowed in the full query sequence |
0
|
max_gap_extensions |
|
the maximum number of gap extensions allowed in the full query sequence |
-1
|
min_indel_to_end_distance |
|
do not place an indel within this many bp of the ends of the query sequence |
3
|
seed_length |
|
the length of the seed region |
20
|
reverse_complement |
|
reverse complement each query sequence before alignment |
False
|
include_alt_hits |
|
if true include hits to references with names ending in _alt, otherwise do not include them. |
False
|
threads |
|
the number of threads to use. If |
None
|
Source code in prymer/offtarget/bwa.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | |
__signal_bwa ¶
Signals BWA to process the queries.
Source code in prymer/offtarget/bwa.py
close ¶
Gracefully terminates the underlying subprocess if it is still running.
Returns:
| Name | Type | Description |
|---|---|---|
True |
|
if the subprocess was terminated successfully |
False |
|
if the subprocess failed to terminate or was not already running |
Source code in prymer/offtarget/bwa.py
map_all ¶
Maps multiple queries and returns the results. This is more efficient than using
map_one on each query one-by-one as it batches reads to BWA.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries |
|
the queries to map with BWA |
required |
Returns:
| Type | Description |
|---|---|
|
one |
Source code in prymer/offtarget/bwa.py
map_one ¶
Maps a single query to the genome and returns the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
|
the query to map with BWA |
required |
Returns:
| Type | Description |
|---|---|
|
a |
Source code in prymer/offtarget/bwa.py
to_hits ¶
Extracts the hits from the given alignment. Beyond the current alignment additional alignments are parsed from the XA SAM tag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec |
|
the given alignment |
required |
Source code in prymer/offtarget/bwa.py
BwaHit
dataclass
¶
Represents a single hit or alignment of a sequence to a location in the genome.
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
the reference name of the hit |
|
|
the start position of the hit (1-based inclusive) |
|
|
whether the hit is on the negative strand |
|
|
the cigar string returned by BWA |
|
|
the number of edits between the read and the reference |
Source code in prymer/offtarget/bwa.py
Attributes¶
Functions¶
__str__ ¶
Returns the string representation in bwa's XA tag format.
Source code in prymer/offtarget/bwa.py
build
staticmethod
¶
build(
refname: str ,
start: int ,
negative: bool ,
cigar: str ,
edits: int ,
revcomp: bool = False,
) -> BwaHit
Generates a hit object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
refname |
|
the reference name of the hit |
required |
start |
|
the start position of the hit (1-based inclusive) |
required |
negative |
|
whether the hit is on the negative strand |
required |
cigar |
|
the cigar string returned by BWA |
required |
edits |
|
the number of edits between the read and the reference |
required |
revcomp |
|
whether the reverse-complement of the query sequence was fed to BWA, in which case various fields should be negated/reversed in the Hit |
False
|
Returns:
| Type | Description |
|---|---|
|
A Hit that represents the mapping of the original query sequence that was supplied |
Source code in prymer/offtarget/bwa.py
BwaResult
dataclass
¶
Represents zero or more hits or alignments found by BWA for the given query.
The number of hits found may be more than the number of hits listed in the hits attribute.
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
the query (as given, no RC'ing) |
|
|
the total number of hits found by bwa (this may be more than len(hits)) |
|
|
the subset of hits that were reported by bwa |
Source code in prymer/offtarget/bwa.py
Query
dataclass
¶
Represents a single query sequence for mapping.
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
the identifier for the query (e.g. read name) |
|
|
the query bases |
Source code in prymer/offtarget/bwa.py
Functions¶
to_fastq ¶
Returns the multi-line FASTQ string representation of this query