Running¶
pisces index¶
$ pisces index
Builds the default PISCES configuration file format. This will build salmon index files for human, mouse, and human-mouse sample types.
Note
If index output folders exist, they will not be overwritten, as PISCES assumes the index has already been built.
You can also pass in a custom --config
file:
$ pisces --config config.json index
pisces run¶
Note
Most users will want to use pisces submit
to process an entire experiment on either a local machine or an
HPC cluster. Please see submit_example for usage.
$ pisces run -fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
-fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz
In the most basic form, you can specify only the fastq files (as a list
of forward and reverse reads) and other parameters will be auto-detected
or selected from default values. Either paired or unpaired libraries are
allowed. If the data are unpaired, just pass fastq files using -fq1
.
Data and program paths are defined using a default
configuration file format which can be specified at runtime using the
--config
argument.
$ pisces --config config.json run \
-fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
-fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz
Sample name (-n, --name
), output directory (-o, --out
) and total
number of CPU threads to utilize (-p, --threads
) may be specified
explicitly, or default to automatic values.
$ pisces run -fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
-fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz \
-p 12 \
-o PISCES_output_sample1
$ pisces run --help
usage: pisces run [-h] (-fq1 [FQ1 ...] | -sra [SRA ...]) [-fq2 [FQ2 ...]]
[-n NAME] [-o OUT] [-p THREADS] [-t SAMPLE_TYPE]
[-i [SALMON_INDICES ...]] [-s [SALMON_ARGS ...]]
[-l {IU,ISF,ISR,A}] [--scratch-dir SCRATCH_DIR]
[--overwrite] [--make-bam] [--no-salmon] [--no-fastqp]
[--no-vcf] [--sra-enc-dir SRA_ENC_DIR]
optional arguments:
-h, --help show this help message and exit
required arguments:
-fq1 [FQ1 ...] space-separated list of gzipped FASTQ read 1 files
-sra [SRA ...] NCBI sequence read archive accessions in the form
SRR#######
optional arguments:
-fq2 [FQ2 ...] space-separated list of gzipped FASTQ read 2 files
-n NAME, --name NAME sample name used in output files. default=auto
-o OUT, --out OUT path to output directory. default=/path/to/$FQ1/PISCES
-p THREADS, --threads THREADS
total number of CPU threads to use default=2
-t SAMPLE_TYPE, --sample-type SAMPLE_TYPE
type of the library (defined in --config file)
default=auto, choices: dict_keys(['human', 'mouse',
'human-mouse'])
-i [SALMON_INDICES ...], --salmon-indices [SALMON_INDICES ...]
salmon indices to use (defined in --config file)
defaults={'human': 'gencode_basic', 'mouse':
'gencode_basic', 'human-mouse': 'gencode_basic'}
-s [SALMON_ARGS ...], --salmon-args [SALMON_ARGS ...]
extra arguments to pass to salmon (default=None)
-l {IU,ISF,ISR,A}, --libtype {IU,ISF,ISR,A}
library geometry for Salmon (http://salmon.readthedocs
.org/en/latest/salmon.html#what-s-this-libtype)
default=auto
--scratch-dir SCRATCH_DIR
path to scratch directory default='$(--out)'
--overwrite overwrite existing files default=False
--make-bam make a BAM file for visualization
--no-salmon do not run salmon
--no-fastqp do not generate read-level qc metrics
--no-vcf do not generate vcf file
--sra-enc-dir SRA_ENC_DIR
path to NCBI SRA project directory for encrypted dbGaP
data
pisces submit¶
PISCES contains a command for running multiple pisces run
jobs on a DRMAA-aware
compute cluster (sge, uge, slurm). Jobs are specified using the metadata.csv
table
by adding data locations for the FASTQ files. Extra arguments to pisces run
are passed to
pisces submit
and appended to each job before submission to the cluster. The DRMMA library
needs to be accessible in your environment: export DRMAA_LIBRARY_PATH=/path/to/libdrmaa.so
.
$ pisces submit --metadata metadata.csv [pisces run args]
After job submission, pisces submit
will monitor the progress of submitted
jobs. If you want to exit this command, pressing Ctrl+C
will prompt whether
to delete the current jobs. Job progress (running, completion, or failure) can
be checked at any time by re-running pisces submit
in the directory where
pisces submit
was originally run. If you need to later re-run pisces submit
in
the same directory you must first delete the .pisces
directory.
$ pisces submit --help
usage: pisces submit [-h] [--metadata METADATA] [--workdir WORKDIR] [--local]
[--batch] [--runtime RUNTIME]
[--max-memory {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}]
[--dry-run]
optional arguments:
-h, --help show this help message and exit
--metadata METADATA, -m METADATA
metadata.csv file containing (at minimum) SampleID,
Fastq1, Fastq2, Directory columns
--workdir WORKDIR, -w WORKDIR
directory where PISCES will create the .pisces folder
to contain jobs scripts, logs, and run information
--local run jobs on the local machine
--batch after submitting jobs using DRMAA, exit without
monitoring job status
--runtime RUNTIME, -rt RUNTIME
runtime in seconds for each cluster job
--max-memory {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}, -mm {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}
memory in GB required per job
--dry-run, -n print job commands and then exit
pisces summarize-expression¶
$ pisces summarize-expression Sample1/PISCES Sample2/PISCES Sample3/PISCES ...
or
$ pisces summarize-expression -m metadata.csv
You can summarize transcript-level expression to gene-level and make TPM
and counts matrices using pisces summarize-expression
. Required arguments are
the directories specified as --out
from pisces run
. Optionally
you can supply a metadata matrix in CSV format similar to this
example:
SampleID |
Treatment |
Timepoint |
---|---|---|
Sample1 |
DMSO |
1h |
Sample2 |
DMSO |
1h |
Sample3 |
DMSO |
1h |
Sample4 |
Dox |
1h |
Sample5 |
Dox |
1h |
Sample6 |
Dox |
1h |
Sample7 |
DMSO |
4h |
Sample8 |
DMSO |
4h |
Sample9 |
DMSO |
4h |
Sample10 |
Dox |
4h |
Sample11 |
Dox |
4h |
Sample12 |
Dox |
4h |
When supplying a --metadata
file you can specify the --group-by
option to group samples (e.g. Timepoint) before normalizing using the
--norm-by
variable (e.g. Treatment) with the --control-factor
(e.g. DMSO) as the set of control samples to normalize to. You can also
pass a formula for differential expression using DESeq2 by specifying
--deseq-formula
such as
--deseq-formula "~ Treatment + Treatment:Timepoint"
. The
--spotfire-template
option copies a template Spotfire file useful
for visualizing the resulting data matrices.
By default pisces summarize-expression
matches metadata to input sample
directories based on the order of directories passed as positional
arguments. E.g:
pisces summarize-expression -m metadata.csv /Sample1 /Sample2 ...
. Sometimes
this is cumbersome, so there are two options for encoding input
locations in the metadata file:
As paths to pisces run
output directories:
SampleID |
Treatment |
Directory |
---|---|---|
Sample1 |
DMSO |
/path/to/PISCES_run1 |
Sample2 |
DMSO |
/path/to/PISCES_run2 |
As paths to salmon “quant.sf” files:
SampleID |
Treatment |
QuantFilePath |
---|---|---|
Sample1 |
DMSO |
/path/to/PISCES_run1/quant.sf |
Sample2 |
DMSO |
/path/to/PISCES_run2/quant.sf |
$ pisces summarize-expression
Error in loadNamespace(name) : there is no package called ‘renv’
Calls: :: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
pisces summarize-qc¶
QC tables are created using the pisces summarize-qc
command. PISCES samples
are discovered recursively for each directory passed to the tool.
$ pisces summarize-qc . \
--spotfire-template QC.dxp \
--tab QC.table.txt \
--tall QC.skinny.txt \
--fingerprint fingerprint_identities.txt
or
$ pisces summarize-qc --metadata metadata.csv \
--spotfire-template QC.dxp \
--tab QC.table.txt \
--tall QC.skinny.txt \
--fingerprint fingerprint_identities.txt
Note that directories are searched recursively and so it is sufficient to pass in the top level directory when all PISCES runs in the directory are desired.
$ pisces summarize-qc
12-08 20:10 INFO Searching for directories.
0.00File [00:00, ?File/s]
12-08 20:10 INFO Found 0 samples.
12-08 20:10 INFO Finished.