Command Line Interface
usage: metaloci [-h] {sniffer,prep,bts,layout,lm,gene_selector,figure,scan,compressor,test} ...
Sub-commands
sniffer
Takes a .gft file or a .bed file and parses it into a region list, with a specific resolution and extension. Each gene will be a point of interest and the region will be centered around it. Human/mouse gtf files can be downloaded from the GENCODE website. For other species, please refer to the UCSC website. BED files can be used to create a custom region list, using the following format: chromosome, start, end, gene_symbol, gene_id. Strandness can be added to the bed file by adding a 6th column; if not, the script will consider the gene to be on the positive strand.
metaloci sniffer -w PATH -s PATH -g PATH -r INT -wi INT [-h] [-n STR] [--ucsc] [--strand]
Input arguments
- -w, --work-dir
Path to working directory.
- -s, --chrom-sizes
Full path to a file that contains the name of the chromosomes in the first column and the ending coordinate of the chromosome in the second column. This can be found in UCSC Genome Browser website for your species.
- -g, --gene-file
Path to the gene annotation file. GTF or bed files.
- -r, --resolution
Resolution at which to split the genome, in bp. This should be the resolution you have in your Hi-C data, but it will not be checked.
- -wi, --window
Size of the regions in bp. The region will be centered around the point of interest. Regions close to the telomeres will be smaller, but still centered around the point of interest.
Optional arguments
- -n, --name
Name of the file.
- --ucsc
Flag if the gene file you are using is in UCSC format.
Default:
False- --strand
The file has strand information in the last column. ONLY FOR BED FILES.
Default:
False
prep
Processes signal .bed files or .bedGraph files. This will bin them at a given resolution, merge all signals in the same dataframe and subset it by chromosomes.
metaloci prep -w PATH -c PATH -d [PATH ...] -r INT -s PATH [-h] [-t STR]
Input arguments
- -w, --work-dir
Path to working directory.
- -c, --hic
Path to the cool/mcool/hic file.
- -d, --data
Path to file to process. The file must contain titles for the columns, being the first 3 columns coded as chrom, start, end. The following columns contain the name of the signal. For single signal files, if header is omited, the signal name will be the name of the file. Names of the chromosomes must be the same as in the Hi-C and the chromosome sizes file.
- -r, --resolution
Resolution of the bins, to bin the signal (in bp). Hi-C file must contain this resolution.
- -s, --coords
Full path to a file that contains the name of the chromosomes in the first column and the ending coordinate of the chromosome in the second column. This can be found in UCSC Genome Browser website for your species.
Optional arguments
- -t, --summarize_type
Possible choices: median, mean, min, max, count
Type of summarization to use when merging signal in a single bin. Options: [‘median’, ‘mean’, ‘min’, ‘max’, ‘count’]. Default: ‘median’.
Default:
'median'
bts
Find the best combination persistence length and cut-off for your Hi-C, given a Hi-C resolution.
Check first the maximum resolution your Hi-C allows, and supply it to this script. This script will then determine, by computing layouts on a sample of regions, the best combination of persistence length and cut-off for your Hi-C. You can then use these parameters to run ‘metaloci layout’ on your regions of interest.
This can take from minutes to a few hours, depending on the resolution, the size of the regions and the number of cpus available in your machine. Once you run it for a specific Hi-C and a specific resolution, the parameters will remain the same for other runs with other signals.
metaloci bts -w PATH -c PATH -r INT [INT ...] [-h] [-g PATH] [-s INT] [-n int] [-o FLOAT [FLOAT ...]]
[-l FLOAT [FLOAT ...]] [-t--threads INT]
Input arguments
- -w, --work-dir
Path to working directory
- -c, --hic
Complete path to the cool/mcool/hic file
- -r, --resolution
List of Hi-C resolutions to be tested (in bp).
Optional arguments
- -g, --region
Path to METALoci region file. This is a file with coords in chrN:start-end_midpoint format, the ‘symbol’ and ‘id’, one region per line, tab separated. This file can be generated from a gtf file using ‘metaloci sniffer’
- -s, --seed
Random seed for region sampling. (default: 1)
Default:
1- -n, --sample_num
Number of regions to sample from .txt file. (default: 100)
Default:
100- -o, --cutoffs
- Percent of top interactions to use from HiC.
METALoci Default [0.15, 0.175, 0.2, 0.225, 0.25]
Default:
[0.15, 0.175, 0.2, 0.225, 0.25]- -l, --pls
Persistence length; usual values are between 9 and 12, although this can vary a lot depending on the absolute values of your Hi-C matrix.
- -t--threads
Number of threads to use in multiprocessing. (default: 0)
Default:
0
layout
Creates a Kamada-Kawai layout from a Hi-C for a given region.
metaloci layout -w PATH -c PATH -r INT [-g PATH] [-h] [-o [CUTOFF ...]] [-a] [-l FLOAT] [-i] [-p] [-rp] [-m] [-t INT]
[-f]
Input arguments
- -w, --work-dir
Path to working directory.
- -c, --hic
Path to the cool/mcool/hic file.
- -r, --resolution
Resolution of the Hi-C files to be used (in bp).
- -g, --region
Region to apply LMI in format chrN:start-end_poi or file containing the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, The metaloci region file can be generated with ‘metaloci sniffer’. ‘poi’ is the point of interest in the region (its bin number).
Optional arguments
- -o, --cutoff
Fraction of top interactions to keep as restraints for the layout. If more than one is selected, space separated, all of them will be computed and plotted but not saved to an object. (default: 0.2)
- -a, --absolute
Treat the cut-off as an absolute value instead of a fraction of top interactions to keep. If this flag is set, the cut-off value must be provided in the -o flag. The cut-off value must be a positive number.
Default:
False- -l, --pl
Set a persistence length for the Kamada-Kawai layout. This represents the distance between two consecutive points in the layout. The lower the value, the more distance between consecutive points.
- -i, --optimise
Automatically optimise the cut-off and/or the persistence length for the Kamada-Kawai layout.
Default:
False- -p, --plot
Plot the matrix, density plot and Kamada-Kawai plots.
Default:
False- -rp, --remove-poi
Remove the point of interest from the Kamada-Kawai layout.
Default:
False- -m, --mp
Flag to set use of multiprocessing.
Default:
False- -t, --threads
Number of threads to use in multiprocessing. (default: 0)
Default:
0- -f, --force
Force METALoci to rewrite existing data. This will remove the object and re-create it.
Default:
False
lm
Adds signal data to a Kamada-Kawai layout and calculates Local Moran’s I for every bin in the layout. Outputs a .mlo file with the LMI data for each signal. It can also output a .csv with info for each signal and bed files with the metalocis found, depending on the flags you set.
metaloci lm -w PATH -s [FILE ...] [-g PATH] [-h] [-p INT] [-v FLOAT] [-a PATH] [-i] [-m] [-t THREADS] [-f] [-b]
[-q INT [INT ...]] [-po]
Input arguments
- -w, --work-dir
Path to working directory.
- -s, --signal
Space-separated list of signals to plot or path to the file with the list of signals to plot, one per line.
- -g, --region
Region to apply LMI in format chrN:start-end_poi or file containing the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, The metaloci region file can be generated with ‘metaloci sniffer’. ‘poi’ is the point of interest in the region (its bin number).
Optional arguments
- -p, --permutations
Number of permutations to calculate the Local Moran’s I p-value (default: 9999).
Default:
9999- -v, --pval
P-value significance threshold (default: 0.05).
Default:
0.05- -a, --aggregated
Use the file with aggregated signals. This file has 2 columns: first column the name of the original signal, second column the new name of the aggregate, separated by tabs.
- -i, --info
Flag to unpickle LMI info.
Default:
False- -m, --mp
Flag to set use of multiprocessing.
Default:
False- -t, --threads
Number of threads to use in multiprocessing. Recommended value is one third of your total cpu count, although increasing this number may improve performance in machines with few cores. (default: 0)
Default:
0- -f, --force
Force METALoci to rewrite existing data.
Default:
False- -b, --bed
Flag to save the bed file with the metalocis location.
Default:
False- -q, --quadrants
Space-separated list with the LMI quadrants to highlight (default: [1, 3]). 1: High-high, top right (signal in bin is high, signal for neighbours is high). 2: Low-High, top left (signal in bin is low, signal for neighbours is high). 3: Low-Low, bottom left (signal in bin is low, signal for neighbours is low). 4: High-Low, bottom right (signal in bin is high, signal for neighbours is low).
Default:
[1, 3]- -po, --poi_only
Flag to only save the point of interest row in the LMI dataframes. Useful for large datasets. You will not be able to use ‘metaloci figure’ with this option.
Default:
False
gene_selector
This script parses the LMI information files created by METALoci.
The output file will contain regions that pass the quadrant and p-value threshold for a given signal. In case it doesn’t pass these filters, the script will output NA.
metaloci gene_selector -w PATH -o PATH -g PATH -s STR [-h] [-q INT [INT ...]] [-p FLOAT] [-r] [--name STR] [-t INT]
Input arguments
- -w, --work-dir
Path to the working directory where LMI data is stored.
- -o, --output-dir
Path to the directory where LMI data of the POI for the regions will be stored.
- -g, --gene-file
Path to the region file from where to search the POI.
- -s, --signals
Space separated list of signal names to use.
Optional arguments
- -q, --quadrants
Possible choices: 1, 2, 3, 4
List of quadrant to select. Default: 1, 3. Choices: [1, 2, 3, 4].
- -p, --pval
P-value significance threshold (default: 0.05).
Default:
0.05- -r, --region_file
Select wheter or not to store a metaloci region file with the significant regions.
Default:
False- --name
Name of the file with the selected regions/genes (default: ‘gene_selector_table’).
Default:
'gene_selector_table'- -t, --threads
Number of threads for the multiprocessing (default: 0).
Default:
0
figure
Outputs METALoci output. It creates the following plots: Hi-C matrix, Signal plot, Kamada-Kawai layout, Local Moran’s I scatterplot, Gaudí plot for signal, Gaudí plot for LMI quadrant, and a composite image with all the above.
metaloci figure -w PATH -s [STR ...] [-g PATH] [-h] [-e] [-C] [-q [INT ...]] [-v FLOAT] [-z] [-m] [-t INT] [-M]
[-k PATH] [-n]
Input arguments
- -w, --work-dir
Path to working directory.
- -s, --signals
Space-separated list of signals to plot or path to the file with the list of signals to plot, one per line.
- -g, --region
Region to apply LMI in format chrN:start-end_poi or file with the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, tab separated.
Optional arguments
- -e, --preserve
Preserve temporary .png image files that are used for making the composite figure (default: True).
Default:
True- -C, --clean-matrix
Flag to plot the ‘clean’ Hi-C matrix (the one METALoci uses to calculate the Kamada-Kawai layout) instead of the ‘original’ Hi-C matrix (default: False).
Default:
False- -q, --quarts
Space-separated list with the LMI quadrants to highlight (default: [1, 3]). 1: High-high, top right (signal in bin is high, signal for neighbours is high). 2: Low-High, top left (signal in bin is low, signal for neighbours is high). 3: Low-Low, bottom left (signal in bin is low, signal for neighbours is low). 4: High-Low, bottom right (signal in bin is high, signal for neighbours is low).
Default:
[1, 3]- -v, --pval
P-value significance threshold (default: 0.05).
Default:
0.05- -z, --zscore
Flag to use z-score transformed signal values for the scatter plot (default: False).
Default:
False- -m, --mp
Flag to set use of multiprocessing.
Default:
False- -t, --threads
Number of threads to use in multiprocessing. (default: 0)
Default:
0
Style arguments
- -M, --metalocis
Flag to select highlighting of the signal plots. If True, only the neighbouring bins from the point of interest will be highlighted (independently of the quadrant and significance of those bins, but only if the point of interest is significant and in a quadrant of interest). If False, all significant regions that correspond to the quadrant selected with -q will be highlighted (default: False).
Default:
False- -k, --mark_regions
(experimental) Path to a file to mark certain regions on the gaudí plots. The file must have the following columns (tab-separated): region_metaloci chr start end label. The label will be used to mark the region on the plot.
- -n, --neighbourhood
Flag to plot the neighbourhood extension on the Kamada-Kawai and Gaudí plots. This is the influence radius around the point of interest that will be considered for the local spatial autocorrelation of the point of interest. (default: False).
Default:
False
scan
Creates several METALoci models by iteratively deleting stetches of bins in the region. If asked to, it generates a video with all possible deletions.
metaloci scan -w PATH -c PATH -r INT [-g PATH] -s [FILE ...] [-h] [-n INT] [-p FLOAT] [-m] [-t INT] [-gif INT]
[-fd FLOAT] [-l FLOAT] [-o FLOAT [FLOAT ...]] [-f]
Input arguments
- -w, --work-dir
Path to working directory.
- -c, --hic
Path to the cool/mcool/hic file.
- -r, --resolution
Resolution of the Hi-C files to be used (in bp).
- -g, --region
Region to apply LMI in format chrN:start-end_poi. ‘poi’ is the point of interest in the region (its bin number).
- -s, --signal
Name of the signal to process.
Optional arguments
- -n, --num-bins-to-delete
Number of bins to delete in each iteration. Default: 5
Default:
5- -p, --signipval
Significance p-value threshold for the LMI signal. Default: 0.05.
Default:
0.05- -m, --mp
Flag to set use of multiprocessing.
Default:
False- -t, --threads
Number of threads to use in multiprocessing. (default: 0)
Default:
0- -gif, --gif
Flag to create a gif of the plots. If set, the value is the point of interest to highlight (bin index).
- -fd, --frame-duration
Duration of each frame in the gif in seconds. Default: 0.5.
Default:
0.5- -l, --persistence-length
Persistence length to use. If not set, it will be optimised.
- -o, --cutoffs
Cutoffs to use for the Kamada-Kawai algorithm. If not set, it will be optimised.
- -f, --force
Flag to force overwrite of existing files.
Default:
False
compressor
Utility for compressing and uncompressing METALoci working directories, as a genome-wide run can take a lot of space.
metaloci compressor -w PATH (-c | -u) [-h]
Input arguments
- -w, --work-dir
Path to working directory
- -c, --compress
Flag to compress the Hi-C file.
Default:
False- -u, --uncompress
Flag to uncompress the Hi-C file.
Default:
False
test
Undocumented
metaloci test