Command Line Interface

usage: metaloci [-h] {sniffer,prep,bts,layout,lm,gene_selector,figure,scan,compressor,test} ...

Sub-commands

sniffer

Takes a .gft file or a .bed file and parses it into a region list, with a specific resolution and extension. Each gene will be a point of interest and the region will be centered around it. Human/mouse gtf files can be downloaded from the GENCODE website. For other species, please refer to the UCSC website. BED files can be used to create a custom region list, using the following format: chromosome, start, end, gene_symbol, gene_id. Strandness can be added to the bed file by adding a 6th column; if not, the script will consider the gene to be on the positive strand.

metaloci sniffer -w PATH -s PATH -g PATH -r INT -wi INT [-h] [-n STR] [--ucsc] [--strand]

Input arguments

-w, --work-dir

Path to working directory.

-s, --chrom-sizes

Full path to a file that contains the name of the chromosomes in the first column and the ending coordinate of the chromosome in the second column. This can be found in UCSC Genome Browser website for your species.

-g, --gene-file

Path to the gene annotation file. GTF or bed files.

-r, --resolution

Resolution at which to split the genome, in bp. This should be the resolution you have in your Hi-C data, but it will not be checked.

-wi, --window

Size of the regions in bp. The region will be centered around the point of interest. Regions close to the telomeres will be smaller, but still centered around the point of interest.

Optional arguments

-n, --name

Name of the file.

--ucsc

Flag if the gene file you are using is in UCSC format.

Default: False

--strand

The file has strand information in the last column. ONLY FOR BED FILES.

Default: False

prep

Processes signal .bed files or .bedGraph files. This will bin them at a given resolution, merge all signals in the same dataframe and subset it by chromosomes.

metaloci prep -w PATH -c PATH -d [PATH ...] -r INT -s PATH [-h] [-t STR]

Input arguments

-w, --work-dir

Path to working directory.

-c, --hic

Path to the cool/mcool/hic file.

-d, --data

Path to file to process. The file must contain titles for the columns, being the first 3 columns coded as chrom, start, end. The following columns contain the name of the signal. For single signal files, if header is omited, the signal name will be the name of the file. Names of the chromosomes must be the same as in the Hi-C and the chromosome sizes file.

-r, --resolution

Resolution of the bins, to bin the signal (in bp). Hi-C file must contain this resolution.

-s, --coords

Full path to a file that contains the name of the chromosomes in the first column and the ending coordinate of the chromosome in the second column. This can be found in UCSC Genome Browser website for your species.

Optional arguments

-t, --summarize_type

Possible choices: median, mean, min, max, count

Type of summarization to use when merging signal in a single bin. Options: [‘median’, ‘mean’, ‘min’, ‘max’, ‘count’]. Default: ‘median’.

Default: 'median'

bts

Find the best combination persistence length and cut-off for your Hi-C, given a Hi-C resolution.

Check first the maximum resolution your Hi-C allows, and supply it to this script. This script will then determine, by computing layouts on a sample of regions, the best combination of persistence length and cut-off for your Hi-C. You can then use these parameters to run ‘metaloci layout’ on your regions of interest.

This can take from minutes to a few hours, depending on the resolution, the size of the regions and the number of cpus available in your machine. Once you run it for a specific Hi-C and a specific resolution, the parameters will remain the same for other runs with other signals.

metaloci bts -w PATH -c PATH -r INT [INT ...] [-h] [-g PATH] [-s INT] [-n int] [-o FLOAT [FLOAT ...]]
             [-l FLOAT [FLOAT ...]] [-t--threads INT]

Input arguments

-w, --work-dir

Path to working directory

-c, --hic

Complete path to the cool/mcool/hic file

-r, --resolution

List of Hi-C resolutions to be tested (in bp).

Optional arguments

-g, --region

Path to METALoci region file. This is a file with coords in chrN:start-end_midpoint format, the ‘symbol’ and ‘id’, one region per line, tab separated. This file can be generated from a gtf file using ‘metaloci sniffer’

-s, --seed

Random seed for region sampling. (default: 1)

Default: 1

-n, --sample_num

Number of regions to sample from .txt file. (default: 100)

Default: 100

-o, --cutoffs
Percent of top interactions to use from HiC.

METALoci Default [0.15, 0.175, 0.2, 0.225, 0.25]

Default: [0.15, 0.175, 0.2, 0.225, 0.25]

-l, --pls

Persistence length; usual values are between 9 and 12, although this can vary a lot depending on the absolute values of your Hi-C matrix.

-t--threads

Number of threads to use in multiprocessing. (default: 0)

Default: 0

layout

Creates a Kamada-Kawai layout from a Hi-C for a given region.

metaloci layout -w PATH -c PATH -r INT [-g PATH] [-h] [-o [CUTOFF ...]] [-a] [-l FLOAT] [-i] [-p] [-rp] [-m] [-t INT]
                [-f]

Input arguments

-w, --work-dir

Path to working directory.

-c, --hic

Path to the cool/mcool/hic file.

-r, --resolution

Resolution of the Hi-C files to be used (in bp).

-g, --region

Region to apply LMI in format chrN:start-end_poi or file containing the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, The metaloci region file can be generated with ‘metaloci sniffer’. ‘poi’ is the point of interest in the region (its bin number).

Optional arguments

-o, --cutoff

Fraction of top interactions to keep as restraints for the layout. If more than one is selected, space separated, all of them will be computed and plotted but not saved to an object. (default: 0.2)

-a, --absolute

Treat the cut-off as an absolute value instead of a fraction of top interactions to keep. If this flag is set, the cut-off value must be provided in the -o flag. The cut-off value must be a positive number.

Default: False

-l, --pl

Set a persistence length for the Kamada-Kawai layout. This represents the distance between two consecutive points in the layout. The lower the value, the more distance between consecutive points.

-i, --optimise

Automatically optimise the cut-off and/or the persistence length for the Kamada-Kawai layout.

Default: False

-p, --plot

Plot the matrix, density plot and Kamada-Kawai plots.

Default: False

-rp, --remove-poi

Remove the point of interest from the Kamada-Kawai layout.

Default: False

-m, --mp

Flag to set use of multiprocessing.

Default: False

-t, --threads

Number of threads to use in multiprocessing. (default: 0)

Default: 0

-f, --force

Force METALoci to rewrite existing data. This will remove the object and re-create it.

Default: False

lm

Adds signal data to a Kamada-Kawai layout and calculates Local Moran’s I for every bin in the layout. Outputs a .mlo file with the LMI data for each signal. It can also output a .csv with info for each signal and bed files with the metalocis found, depending on the flags you set.

metaloci lm -w PATH -s [FILE ...] [-g PATH] [-h] [-p INT] [-v FLOAT] [-a PATH] [-i] [-m] [-t THREADS] [-f] [-b]
            [-q INT [INT ...]] [-po]

Input arguments

-w, --work-dir

Path to working directory.

-s, --signal

Space-separated list of signals to plot or path to the file with the list of signals to plot, one per line.

-g, --region

Region to apply LMI in format chrN:start-end_poi or file containing the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, The metaloci region file can be generated with ‘metaloci sniffer’. ‘poi’ is the point of interest in the region (its bin number).

Optional arguments

-p, --permutations

Number of permutations to calculate the Local Moran’s I p-value (default: 9999).

Default: 9999

-v, --pval

P-value significance threshold (default: 0.05).

Default: 0.05

-a, --aggregated

Use the file with aggregated signals. This file has 2 columns: first column the name of the original signal, second column the new name of the aggregate, separated by tabs.

-i, --info

Flag to unpickle LMI info.

Default: False

-m, --mp

Flag to set use of multiprocessing.

Default: False

-t, --threads

Number of threads to use in multiprocessing. Recommended value is one third of your total cpu count, although increasing this number may improve performance in machines with few cores. (default: 0)

Default: 0

-f, --force

Force METALoci to rewrite existing data.

Default: False

-b, --bed

Flag to save the bed file with the metalocis location.

Default: False

-q, --quadrants

Space-separated list with the LMI quadrants to highlight (default: [1, 3]). 1: High-high, top right (signal in bin is high, signal for neighbours is high). 2: Low-High, top left (signal in bin is low, signal for neighbours is high). 3: Low-Low, bottom left (signal in bin is low, signal for neighbours is low). 4: High-Low, bottom right (signal in bin is high, signal for neighbours is low).

Default: [1, 3]

-po, --poi_only

Flag to only save the point of interest row in the LMI dataframes. Useful for large datasets. You will not be able to use ‘metaloci figure’ with this option.

Default: False

gene_selector

This script parses the LMI information files created by METALoci.

The output file will contain regions that pass the quadrant and p-value threshold for a given signal. In case it doesn’t pass these filters, the script will output NA.

metaloci gene_selector -w PATH -o PATH -g PATH -s STR [-h] [-q INT [INT ...]] [-p FLOAT] [-r] [--name STR] [-t INT]

Input arguments

-w, --work-dir

Path to the working directory where LMI data is stored.

-o, --output-dir

Path to the directory where LMI data of the POI for the regions will be stored.

-g, --gene-file

Path to the region file from where to search the POI.

-s, --signals

Space separated list of signal names to use.

Optional arguments

-q, --quadrants

Possible choices: 1, 2, 3, 4

List of quadrant to select. Default: 1, 3. Choices: [1, 2, 3, 4].

-p, --pval

P-value significance threshold (default: 0.05).

Default: 0.05

-r, --region_file

Select wheter or not to store a metaloci region file with the significant regions.

Default: False

--name

Name of the file with the selected regions/genes (default: ‘gene_selector_table’).

Default: 'gene_selector_table'

-t, --threads

Number of threads for the multiprocessing (default: 0).

Default: 0

figure

Outputs METALoci output. It creates the following plots: Hi-C matrix, Signal plot, Kamada-Kawai layout, Local Moran’s I scatterplot, Gaudí plot for signal, Gaudí plot for LMI quadrant, and a composite image with all the above.

metaloci figure -w PATH -s [STR ...] [-g PATH] [-h] [-e] [-C] [-q [INT ...]] [-v FLOAT] [-z] [-m] [-t INT] [-M]
                [-k PATH] [-n]

Input arguments

-w, --work-dir

Path to working directory.

-s, --signals

Space-separated list of signals to plot or path to the file with the list of signals to plot, one per line.

-g, --region

Region to apply LMI in format chrN:start-end_poi or file with the regions of interest. If a file is provided, it must contain as a header ‘coords’, ‘symbol’ and ‘id’, and one region per line, tab separated.

Optional arguments

-e, --preserve

Preserve temporary .png image files that are used for making the composite figure (default: True).

Default: True

-C, --clean-matrix

Flag to plot the ‘clean’ Hi-C matrix (the one METALoci uses to calculate the Kamada-Kawai layout) instead of the ‘original’ Hi-C matrix (default: False).

Default: False

-q, --quarts

Space-separated list with the LMI quadrants to highlight (default: [1, 3]). 1: High-high, top right (signal in bin is high, signal for neighbours is high). 2: Low-High, top left (signal in bin is low, signal for neighbours is high). 3: Low-Low, bottom left (signal in bin is low, signal for neighbours is low). 4: High-Low, bottom right (signal in bin is high, signal for neighbours is low).

Default: [1, 3]

-v, --pval

P-value significance threshold (default: 0.05).

Default: 0.05

-z, --zscore

Flag to use z-score transformed signal values for the scatter plot (default: False).

Default: False

-m, --mp

Flag to set use of multiprocessing.

Default: False

-t, --threads

Number of threads to use in multiprocessing. (default: 0)

Default: 0

Style arguments

-M, --metalocis

Flag to select highlighting of the signal plots. If True, only the neighbouring bins from the point of interest will be highlighted (independently of the quadrant and significance of those bins, but only if the point of interest is significant and in a quadrant of interest). If False, all significant regions that correspond to the quadrant selected with -q will be highlighted (default: False).

Default: False

-k, --mark_regions

(experimental) Path to a file to mark certain regions on the gaudí plots. The file must have the following columns (tab-separated): region_metaloci chr start end label. The label will be used to mark the region on the plot.

-n, --neighbourhood

Flag to plot the neighbourhood extension on the Kamada-Kawai and Gaudí plots. This is the influence radius around the point of interest that will be considered for the local spatial autocorrelation of the point of interest. (default: False).

Default: False

scan

Creates several METALoci models by iteratively deleting stetches of bins in the region. If asked to, it generates a video with all possible deletions.

metaloci scan -w PATH -c PATH -r INT [-g PATH] -s [FILE ...] [-h] [-n INT] [-p FLOAT] [-m] [-t INT] [-gif INT]
              [-fd FLOAT] [-l FLOAT] [-o FLOAT [FLOAT ...]] [-f]

Input arguments

-w, --work-dir

Path to working directory.

-c, --hic

Path to the cool/mcool/hic file.

-r, --resolution

Resolution of the Hi-C files to be used (in bp).

-g, --region

Region to apply LMI in format chrN:start-end_poi. ‘poi’ is the point of interest in the region (its bin number).

-s, --signal

Name of the signal to process.

Optional arguments

-n, --num-bins-to-delete

Number of bins to delete in each iteration. Default: 5

Default: 5

-p, --signipval

Significance p-value threshold for the LMI signal. Default: 0.05.

Default: 0.05

-m, --mp

Flag to set use of multiprocessing.

Default: False

-t, --threads

Number of threads to use in multiprocessing. (default: 0)

Default: 0

-gif, --gif

Flag to create a gif of the plots. If set, the value is the point of interest to highlight (bin index).

-fd, --frame-duration

Duration of each frame in the gif in seconds. Default: 0.5.

Default: 0.5

-l, --persistence-length

Persistence length to use. If not set, it will be optimised.

-o, --cutoffs

Cutoffs to use for the Kamada-Kawai algorithm. If not set, it will be optimised.

-f, --force

Flag to force overwrite of existing files.

Default: False

compressor

Utility for compressing and uncompressing METALoci working directories, as a genome-wide run can take a lot of space.

metaloci compressor -w PATH (-c | -u) [-h]

Input arguments

-w, --work-dir

Path to working directory

-c, --compress

Flag to compress the Hi-C file.

Default: False

-u, --uncompress

Flag to uncompress the Hi-C file.

Default: False

test

Undocumented

metaloci test