Tufts Galaxy
What is Galaxy?
Galaxy is a web-based portal to many applications for data-intensive biological research, such as NGS sequencing, genomics, and much more. Galaxy is an open platform enabling researchers to retrieve data from local and remote sources, create workflows, and share analyses with other researchers. The Tufts Galaxy server runs on the Tufts High Performance Compute Cluster.
Accessing Galaxy
Tufts Galaxy is currently restricted to a few users and is planned to be released to the Tufts Community at a future date. If you would like access to the Tufts Galaxy instance and have a Tufts email address ( *.tufts.edu), please contact us.
NOTE: Galaxy requires a VPN connection to the Tufts domain in order to be used remotely.
What Can I Do With Galaxy?
From the Web Interface
- Data can be uploaded into Galaxy from several sources.
- Small files (< 2Gb) can be directly uploaded through the web interface.
- Use the upload icon on the upper left corner of the Galaxy web interface and select “Choose local files”
- Data can be uploaded from a remote site using a URL (http:// or ftp:// ).
- In addition, Galaxy has portals to several genomic repositories, e.g. UCSC, Ensembl, which allow searching and retrieving datasets directly into Galaxy.
- See more information on the Public Galaxy Upload page.
From the HPC cluster
- Files that are located on the HPC cluster can be directly imported into Galaxy
- Copy desired files to your corresponding Galaxy "upload directory", located at at /cluster/tufts/galaxy/xfer/username/, where you would replace "username" with your tufts username, which has the format five letters followed by two numbers.
- Use the upload icon on the upper left corner of the Galaxy web interface and select “Choose FTP file” and your files should appear.
- After the upload is successful, the files will be removed from the upload directory, so be sure to copy (not move) your data into the upload directory.
Galaxy comes with a large set of applications for manipulating sequences and biological data (Filtering, Sorting,Text Manipulation , Genomic Interval Data). Following mapping of NGS data to a reference genome, much of the resulting data is columnar (BED, GFF, intervals) and Galaxy has a robust set of tools for manipulating and formatting this type of data.
Galaxy has an extensive toolshed of bioinformatics tools available for processing NGS datasets (Mapping, QC, Variant Detection) as well as Motif tools, Metagenomic analysis, and others.
Doing an analysis using Galaxy creates a series of steps and intermediates which result from each step. These are called histories and can be saved and shared with others. Multiple histories are possible within an account. A workflow can be generated from a history track and used to analyze new data sets in a consistent manner. Galaxy has a GUI tool for creating new workflows or uploading existing workflows that have been shared.
User Quotas
The Tufts Galaxy is not designed for storing files. There is a quota for each user (250 GB). If you reach this quota, you will be unable to use Galaxy for analysis. If this point is reached, you must delete files and/or histories from within Galaxy. This might take a while, so it is good practice to download final data, plus any intermediate data of interest when an analysis has finished.
Where Do I Go For Help?
- The Galaxy team has assembled an array of videos and other tutorials under Learn Galaxy.
- Research Technology Bioinformatics has several Galaxy tutorials.
- See the Galaxy support for links to resources as well as mailing lists and other support venues.
- Watch the most current Galaxy videos here.
- If you encounter issues or need help using Tufts Galaxy, or need general help, please let us know!