Documentation

isweep is a Python package and a series of automated workflows to study natural selection with identity-by-descent (IBD) segments. The Python package simulates IBD segments around a locus and estimates selection coefficients. The automated workflows are:

Selection scan: detects selected loci with rigorous multiple testing thresholds
Modeling hard sweeps: estimates location, allele frequency, and selection coefficient of sweep
Case-control scan: detects loci where IBD rates differ between binary cases and controls
Phasing and ancestry: supports haplotype phasing, local ancestry, and kinship inference

Each automated workflow has a dedicated page under Usage. The general way to run these methods is:

Navigate to the appropriate workflow directory
Modify parameters in YAML configuration files
Send the jobs to a cluster with nohup snakemake [...] &

I made a Zenodo repository with some simulated data to test the workflows. See Testing workflows.

The source code is here

Installation

git clone https://github.com/sdtemple/isweep.git
mamba env create -f isweep-environment.yml
bash get-software.sh
pip install isweep

Note

This project is in a stable state. I am commited to providing quick support via GitHub Issues at least into 2026.

Data Requirements

The main requirements are a tab-separated genetic recombination map and enough samples to detect more than 0 IBD segments at all positions.

Phased haplotypes (VCF files)
Samples with a similar ancestry
No close relatives
Recombining autosomes

If you don’t already have your data phased or cohort selected, we support Phasing and ancestry with another workflow.

In humans, more than 1000 is enough samples, but more than 3000 samples is recommended.

At some point, there are no gains in statistical power with more samples. I do not recommend analyzing more than 100k samples in a biobank. See our Temple and Browning (2025+) publication.

The tree of life is messy. Email or make a GitHub Issue for analysis advice about the nuances of your sample population.

Vignette

Outside of the running the workflows, the main functions are:

isweep.coalescent.simulate_ibd_isweep: generate long IBD segments around a locus (w/ selection)
isweep.coalescent.chi2_isweep: use a uniroot finder with this to estimate the selection coefficient
isweep.utilities.read_Ne: load in recent effective population sizes

For instance:

from isweep import *
# parameter settings
s = 0.03
p=float(0.5) # allele freq
Ne=read_Ne('constant-10k.ne') # demo history
model='m'
long_ibd=3.0
ab=[long_ibd,np.inf]
nsamples=200
# calculate denominator
ploidy=2
msamples=ploidy*nsamples
N=msamples*(msamples-1)/2-nsamples
# simulate data
out=simulate_ibd_isweep(
 nsamples,
 s,
 p,
 Ne,
 long_ibd,
 long_ibd,
 one_step_model=model,
 ploidy=ploidy,
)
ibd=out[0][0]
# estimating the selection coefficient
se = minimize_scalar(
 chi2_isweep,
 args=(p,Ne,N,(ibd,),ab,model,0,-0.01,ploidy),
 bounds=(0,0.5),
 method='bounded'
).x
print('true selection coefficient')
print(s)
print('estimate selection coefficient')
print(se)

There are also some IPython notebooks as examples in vignettes/.

Citations

This software and its methods are the basis of six publications.

The software Beagle, ibd-ends, hap-ibd, and flare are also used and should be cited.

workflow/scan-selection: hap-ibd and ibd-ends
workflow/scan-case-control: hap-ibd and ibd-ends
workflow/model-selection: hap-ibd
workflow/phasing-ancestry: Beagle, flare, and hap-ibd

Contents

Contact

Seth Temple (sethtem@umich.edu) or GitHub Issues