Batch Import of FASTQ Files (Tutorial Part 4)

Uploading and staging many FASTQ files simultaneously

If you’re managing Next-Generation Sequencing (NGS) or omics datasets, you know the importance of having a robust system to store, organize, and access your raw data and processed efficiently. ReadStore Basic is a straightforward solution designed to help researchers and labs manage NGS datasets.

This post will show you how you can use Excel or .csv templates to prepare and import many FASTQ files at once and perform a Batch Check-In to generate corresponding datasets without checking each dataset individually. We will also import FASTQ files from a template using the ReadStore Command Line Interface (CLI).

If you are new to ReadStore, check the previous tutorials for setting up a server and gaining familiarity with its features.

Preparing a FASTQ Template File

  1. Log into ReadStore: Log into the ReadStore web app and navigate to the Staging page. Ensure your user account has the required Staging Permissions. If you lack these permissions, contact your ReadStore server administrator for assistance.
  2. Download a Template:
    • On the top panel, click the More button, then select Import From File.
    • In the dialog that appears, select Download Templates. You can choose either an Excel or .csv file with the required column names for preparing your local FASTQ files for import.
  3. Prepare the Template: Open the template file. There are three required columns for each FASTQ file:
    • FASTQFileName: The name of the FASTQ file to upload. By default, this should be the basename of the FASTQ file (without the file extension). For example, for /home/user/fastq/tumor_rep1_r1.fastq.gz, the FASTQFileName would be tumor_rep1_r1. The ReadStore server will clip the read type (e.g., _r1) and set the Dataset name to tumor_rep1. If you change the FASTQFileName, the inferred Dataset name will also change. Ensure the name contains only alphanumeric characters, underscores (_), hyphens (-), periods (.), and at signs (@), with no spaces.
    • ReadType: The type of sequencing read, which can be: R1/R2 for Read1/2 or I1/I2 for Index1/2
    • UploadPath: The full path of the file to upload to the ReadStore server. This path must be aligned and accessible from the ReadStore Basic server. For the example, this would be /home/user/fastq/tumor_rep1_r1.fastq.gz
UI Overview  For File Import
Excel Template Example

Import and Stage FASTQ Files

  1. Import the Template:
    • In the Staging page of the web app, click More → Import From File.
    • Upload the prepared Excel or .csv file. If everything is correct, you should see a table displaying the files ready for staging.
  2. Start the QC Step:
    • Click the Confirm button to start the Quality Control (QC) step in the background. Once completed, all FASTQ files will appear in the staging overview.

Note: If you upload a very large number of FASTQ files at once, you may hit the QC job queue limit (default is 250 files). In such cases, split the import into smaller batches. For details on adjusting this limit, refer to the appendix below.

Batch Check-In of Imported FASTQs

  1. Review Staged Files:
    • Ensure all FASTQ files from the import appear in the Staging page overview and have passed the QC checks.
  2. Perform Batch Check-In:
    • Click More → Batch Check-In.
    • In the dialog, select the datasets to check in by:
      • Clicking individual checkboxes in the left table, or
      • Selecting all datasets by clicking the checkbox in the top left corner.
    • Use the arrow to move the datasets to the selected batch. Optionally, assign one or more projects to these datasets during this step.
    • Confirm the batch check-in by clicking the Confirm button. The new datasets will now appear in the Dataset page overview.

Delete All FASTQs from Staging

If you accidentally upload the wrong FASTQ files, you can easily delete them using the Delete All method.

  1. Delete All Files:
    • Click More → Delete All.
    • In the dialog, confirm the delete operation. This will remove all FASTQ files from the staging overview.
  2. Delete Individual Files:
    • Alternatively, delete individual FASTQ files directly in the staging overview.

Import FASTQ Files from the ReadStore CLI

You can also import FASTQ files using the Command Line Interface (CLI):

  1. Prepare the Template File:
    • Create a .csv file as described earlier, ensuring all required columns are present.
  2. Run the Import Command:
    • Use the following command to trigger a batch import and QC step:
readstore import fastq /path/to/template.csv
  1. Replace /path/to/template.csv with the actual path to your template file.

Note: Ensure the ReadStore CLI is installed and configured before attempting this step. More information on this in Tutorial 1

This guide covers the available methods for batch importing FASTQ files into ReadStore. For additional assistance, contact info@evo-byte.com.

Appendix: QC Job Queue

When uploading FASTQ files, each file undergoes a QC check for format validation and key metrics extraction (e.g., read count, read length, and quality metrics). To protect server performance, the number of files allowed in the QC job queue is capped (default is 250 files). For large imports, split files into smaller batches or contact your server administrator to adjust the limit.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top