Data Retrieval

Genomics Next-Gen Sequence Retrieval

The RTSF Genomics Core FTP server now requires secure (FTPS) connections.

To protect the integrity and security of the MSU network and its systems, and to permit users from outside of the MSU network to once again access the RTSF Genomics Core file distribution server we will be enforcing secure (encrypted) connections only. This change ensures that usernames and password are no longer passed between the client and host in clear text.

For you this will mean using an FTP client application capable of Secure FTP (FTPS) using TLS and configuring it to use an explicit TLS connection. Nearly every current FTP client software is capable of FTPS using TLS. When configuring your software for FTPS used Explicit TLS only, never Implicit TLS. We cannot provide configurations instructions for every FTP client software but below are examples for two popular ones.

FileZilla (Windows/Mac/Linux)

Open the Site Manager and create a New Site for your RTSF FTPS account.

Enter the host name                 titan.bch.msu.edu

Select Protocol                     FTP-File Transfer Protocol

Select Encryption                   Require explicit FTP over TLS

Enter the username and password provided to you. The settings should look like:

filezillaTLSConfig

Transmit (Mac)

Click on the FTP tab in a new connection window to get the “Connect to FTP Server” dialog.

Enter the host name              titan.bch.msu.edu

Select the radio button          FTP with TLS/SSL

Enter the username and password provided to you. The settings should look like:

transmitTLSConfig

After clicking Connect with whatever client software you are using you may be presented with a notice that it is unable to verify the server’s certificate. This is normal and you may proceed be clicking OK or Connect, etc.

 

Transferring data from the RTSF FTP server using the command line program wget

The program wget may be used to transfer data from the RTSF FTP server using the now required encrypted FTP with TLS/SSL (FTPS). There are specific requirements for wget to support FTPS and the wget provided with most standard Linux distributions does NOT support FTPS. Only versions of wget 1.17 or later are capable of supporting FTPS and then only if that support was included when the program was compiled. If you need to install a newer version of wget on your system download the latest version (1.19.1 as of this writing) from the GNU Source Repository (http://ftp.gnu.org/gnu/wget/). When running the configure program prior to compiling be sure to include the option "--with-ssl".

If you attempt to run wget as described below and you receive an error message which states: 'Unsupported scheme "ftps"' it means that the version of wget you are using does not support FTPS.


Special considerations for using wget on the MSU HPCC cluster:

-To load the appropriate version of wget (version 1.19) on the HPCC, use the following command:

[user@gateway-01]$ module load wget

-FTPS transfer from the RTSF FTP server to the HPCC only works if you are currently working from the gateway (login) node. Do not move from the gateway to one of the development nodes prior to transferring your data. Once you have transferred your data to your home directory on the HPCC it will of course be available anywhere on the HPCC.

 

The command syntax to transfer sequence data from the RTSF FTP to your remote server is:

$ wget -r -np -nH --ask-password ftps://<username>@<hostname>/<directory_name>

Enter your password when prompted.

Username, password and hostname are the ones provided to you (usually via email) when you first received data from the MSU RTSF Genomics Core. directory_name is the name of the subdirectory in you FTP account which contains the data for your current run(s) that you want to download. This subdirectory name is provided in the email notifying you that your data is available.

This command will create a new subdirectory named <directory_name> at your current working directory containing all of the FastQ files for that run.