Running¶

pisces index¶

$ pisces index

Builds the default PISCES configuration file format. This will build salmon index files for human, mouse, and human-mouse sample types.

Note

If index output folders exist, they will not be overwritten, as PISCES assumes the index has already been built.

You can also pass in a custom --config file:

$ pisces --config config.json index

pisces run¶

Note

Most users will want to use pisces submit to process an entire experiment on either a local machine or an HPC cluster. Please see submit_example for usage.

$ pisces run -fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
             -fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz

In the most basic form, you can specify only the fastq files (as a list of forward and reverse reads) and other parameters will be auto-detected or selected from default values. Either paired or unpaired libraries are allowed. If the data are unpaired, just pass fastq files using -fq1.

Data and program paths are defined using a default configuration file format which can be specified at runtime using the --config argument.

$ pisces --config config.json run \
         -fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
         -fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz

Sample name (-n, --name), output directory (-o, --out) and total number of CPU threads to utilize (-p, --threads) may be specified explicitly, or default to automatic values.

$ pisces run -fq1 lane1_R1_001.fastq.gz lane1_R1_002.fastq.gz \
             -fq2 lane1_R2_001.fastq.gz lane1_R2_002.fastq.gz \
             -p 12 \
             -o PISCES_output_sample1

$ pisces run --help
usage: pisces run [-h] (-fq1 [FQ1 ...] | -sra [SRA ...]) [-fq2 [FQ2 ...]]
                  [-n NAME] [-o OUT] [-p THREADS] [-t SAMPLE_TYPE]
                  [-i [SALMON_INDICES ...]] [-s [SALMON_ARGS ...]]
                  [-l {IU,ISF,ISR,A}] [--scratch-dir SCRATCH_DIR]
                  [--overwrite] [--make-bam] [--no-salmon] [--no-fastqp]
                  [--no-vcf] [--sra-enc-dir SRA_ENC_DIR]

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  -fq1 [FQ1 ...]        space-separated list of gzipped FASTQ read 1 files
  -sra [SRA ...]        NCBI sequence read archive accessions in the form
                        SRR#######

optional arguments:
  -fq2 [FQ2 ...]        space-separated list of gzipped FASTQ read 2 files
  -n NAME, --name NAME  sample name used in output files. default=auto
  -o OUT, --out OUT     path to output directory. default=/path/to/$FQ1/PISCES
  -p THREADS, --threads THREADS
                        total number of CPU threads to use default=2
  -t SAMPLE_TYPE, --sample-type SAMPLE_TYPE
                        type of the library (defined in --config file)
                        default=auto, choices: dict_keys(['human', 'mouse',
                        'human-mouse'])
  -i [SALMON_INDICES ...], --salmon-indices [SALMON_INDICES ...]
                        salmon indices to use (defined in --config file)
                        defaults={'human': 'gencode_basic', 'mouse':
                        'gencode_basic', 'human-mouse': 'gencode_basic'}
  -s [SALMON_ARGS ...], --salmon-args [SALMON_ARGS ...]
                        extra arguments to pass to salmon (default=None)
  -l {IU,ISF,ISR,A}, --libtype {IU,ISF,ISR,A}
                        library geometry for Salmon (http://salmon.readthedocs
                        .org/en/latest/salmon.html#what-s-this-libtype)
                        default=auto
  --scratch-dir SCRATCH_DIR
                        path to scratch directory default='$(--out)'
  --overwrite           overwrite existing files default=False
  --make-bam            make a BAM file for visualization
  --no-salmon           do not run salmon
  --no-fastqp           do not generate read-level qc metrics
  --no-vcf              do not generate vcf file
  --sra-enc-dir SRA_ENC_DIR
                        path to NCBI SRA project directory for encrypted dbGaP
                        data

pisces submit¶

PISCES contains a command for running multiple pisces run jobs on a DRMAA-aware compute cluster (sge, uge, slurm). Jobs are specified using the metadata.csv table by adding data locations for the FASTQ files. Extra arguments to pisces run are passed to pisces submit and appended to each job before submission to the cluster. The DRMMA library needs to be accessible in your environment: export DRMAA_LIBRARY_PATH=/path/to/libdrmaa.so.

$ pisces submit --metadata metadata.csv [pisces run args]

After job submission, pisces submit will monitor the progress of submitted jobs. If you want to exit this command, pressing Ctrl+C will prompt whether to delete the current jobs. Job progress (running, completion, or failure) can be checked at any time by re-running pisces submit in the directory where pisces submit was originally run. If you need to later re-run pisces submit in the same directory you must first delete the .pisces directory.

$ pisces submit --help
usage: pisces submit [-h] [--metadata METADATA] [--workdir WORKDIR] [--local]
                     [--batch] [--runtime RUNTIME]
                     [--max-memory {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}]
                     [--dry-run]

optional arguments:
  -h, --help            show this help message and exit
  --metadata METADATA, -m METADATA
                        metadata.csv file containing (at minimum) SampleID,
                        Fastq1, Fastq2, Directory columns
  --workdir WORKDIR, -w WORKDIR
                        directory where PISCES will create the .pisces folder
                        to contain jobs scripts, logs, and run information
  --local               run jobs on the local machine
  --batch               after submitting jobs using DRMAA, exit without
                        monitoring job status
  --runtime RUNTIME, -rt RUNTIME
                        runtime in seconds for each cluster job
  --max-memory {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}, -mm {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}
                        memory in GB required per job
  --dry-run, -n         print job commands and then exit

pisces summarize-expression¶

$ pisces summarize-expression Sample1/PISCES Sample2/PISCES Sample3/PISCES ...

$ pisces summarize-expression -m metadata.csv

You can summarize transcript-level expression to gene-level and make TPM and counts matrices using pisces summarize-expression. Required arguments are the directories specified as --out from pisces run. Optionally you can supply a metadata matrix in CSV format similar to this example:

SampleID	Treatment	Timepoint
Sample1	DMSO	1h
Sample2	DMSO	1h
Sample3	DMSO	1h
Sample4	Dox	1h
Sample5	Dox	1h
Sample6	Dox	1h
Sample7	DMSO	4h
Sample8	DMSO	4h
Sample9	DMSO	4h
Sample10	Dox	4h
Sample11	Dox	4h
Sample12	Dox	4h

When supplying a --metadata file you can specify the --group-by option to group samples (e.g. Timepoint) before normalizing using the --norm-by variable (e.g. Treatment) with the --control-factor (e.g. DMSO) as the set of control samples to normalize to. You can also pass a formula for differential expression using DESeq2 by specifying --deseq-formula such as --deseq-formula "~ Treatment + Treatment:Timepoint". The --spotfire-template option copies a template Spotfire file useful for visualizing the resulting data matrices.

By default pisces summarize-expression matches metadata to input sample directories based on the order of directories passed as positional arguments. E.g: pisces summarize-expression -m metadata.csv /Sample1 /Sample2 .... Sometimes this is cumbersome, so there are two options for encoding input locations in the metadata file:

As paths to pisces run output directories:

SampleID	Treatment	Directory
Sample1	DMSO	/path/to/PISCES_run1
Sample2	DMSO	/path/to/PISCES_run2

As paths to salmon “quant.sf” files:

SampleID	Treatment	QuantFilePath
Sample1	DMSO	/path/to/PISCES_run1/quant.sf
Sample2	DMSO	/path/to/PISCES_run2/quant.sf

$ pisces summarize-expression
Error in loadNamespace(name) : there is no package called ‘renv’
Calls: :: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted

pisces summarize-qc¶

QC tables are created using the pisces summarize-qc command. PISCES samples are discovered recursively for each directory passed to the tool.

$ pisces summarize-qc . \
            --spotfire-template QC.dxp \
            --tab QC.table.txt \
            --tall QC.skinny.txt \
            --fingerprint fingerprint_identities.txt

$ pisces summarize-qc --metadata metadata.csv \
            --spotfire-template QC.dxp \
            --tab QC.table.txt \
            --tall QC.skinny.txt \
            --fingerprint fingerprint_identities.txt

Note that directories are searched recursively and so it is sufficient to pass in the top level directory when all PISCES runs in the directory are desired.

$ pisces summarize-qc

12-08 20:10 INFO     Searching for directories.

0.00File [00:00, ?File/s]
                         
12-08 20:10 INFO     Found 0 samples.

12-08 20:10 INFO     Finished.