Data Transfer

We describe some typical tools to transfer small to large amounts of data from one host to another.

Small transfers

SMB / sFTP / FileZilla / CyberDuck

  • User-friendly graphical interface instead of command line
  • Cross-platform

The easiest solution is to mount the groupshare on one computer and manually copy the files with drag-n-drop. This is only recommended for transfers of up to several gigabytes.

rsync

  • Best all-rounder solution that works for small and large transfers.
  • Scans files in target location, only copies what is not yet present.
  • Allows to resume transfer after interruption, without having to re-copy all files.

Typical usage:

rsync -avP /path/to/local/folder/ dphysuser@login.phys.ethz.ch:/home/groupname/subfolder/

See man rsync for a full documentation of all available options.

Large transfers

Globus Online & GridFTP

  • Best solution to transfer several terabytes of data
  • Uses the high-performance data transfer protocol GridFTP
  • May not be supported by all universities.
    • CSCS support command line (GridFTP with SSH authentication)
    • CSCS support Globus.org, Endpoint: CSCS Globus Online Endpoint

via Globus web interface

  • Open a Browser and go to Globus.org
  • It is possible to login with your n.ethz account credentials
    • Select ETHZ - ETH Zurich
  • or create an globus.org account
  • Select File Transfer
  • Use our endpoint
    • D-PHYS ETH Zurich

via command line

Usage of globus-url-copy to copy data from CSCS to D-PHYS:

ssh <cscsuser>@ela.cscs.ch

globus-url-copy -rst -cd -r -p 4 -cc 4 \
  sshftp://<cscsuser>@gridftp.cscs.ch/path/to/folder/ \
  sshftp://<dphysuser>@login.phys.ethz.ch/home/<groupshare>/path/to/subfolder/

Further reading: Documentation by CSCS, Parameter descriptions

multirsync

  • GitHub
  • Spawns multiple rsync processes (one for each subfolder) for faster transfers
  • Speedup depends on folder structure