Data File Storage (Aurora)¶
Aurora, EMSL’s scientific data archive, is a dedicated computer system specifically designed for long-term storage of data collected by EMSL instruments. It is available at no cost to EMSL users who are part of an active EMSL user project, and follows EMSL’s Data Management policies.
Aurora is safe
Designed to safely & securely hold data on a long-term basis
Stores multiple copies of data to protect from data loss in case of media failure
Expertly maintained by skilled information systems professionals from EMSL’s Molecular Sciences Computing Group
Monitored continuously for optimum performance
Aurora is free
Petabytes of available storage
Aurora is easy to use
Research groups own the data
Researchers have full control to share files as needed to best serve their research projects
Simple, straightforward access is available through Windows file sharing, FTP, and SCP protocols
About the Aurora File System (
Aurora is based on IBM’s High Performance Storage System (HPSS) with customizations specific to EMSL. The EMSL customizations provide simple access methods for users who are not accustomed to HPSS.
HPSS uses a combination of high-performance disk storage and high-capacity tape storage. This allows a good tradeoff between performance and expense for high-capacity storage systems. New files are written first to disk then copied to tape shortly thereafter. After a period of time, the disk copy will be deleted to save space, leaving the data on tape.
Aurora is intended for storage of important data that has long-term intrinsic value or data that is too costly or difficult to reproduce. Although no restrictions are placed on the format of data archived, the archive is never to be used to store:
Backups of PCs and workstations
Data for which the storage is regulated or required for regulatory compliance
Data where human health and safety depend on the access and accuracy of the data.
The following uses are permitted only with prior approval of the Molecular Sciences Computing Operations Capability Lead:
Storing of scratch files and intermediate results (only when such data is needed for ongoing research and the cost of recreating the data is prohibitive)
Storing of data produced for or by non-EMSL work.
Users who store data in violation of these policies may have the data removed and their archive access privileges suspended. Such users will be notified before any of their data is deleted.
The data owner controls access to the data. By default, only the owner of the data will have any permission for files and directories. The owner can grant access permissions to other users so that files can be shared among individuals, or within a work group. This access can include any combination of read, write, and delete permissions. Projects can also own data. In this case, project members share access to the project’s data. As with any computer system, the administrators of the archive can always get at data on the system. This will be done only for error recovery or when requested by authorized staff members.
Permissions are initially established when the user account or the project account is established. If an account owner wishes to change those permissions or add or delete users to a project account, that owner should contact the Aurora Support Queue.
Transfer of Ownership¶
If users leave EMSL or projects within EMSL, any data they have placed in the archive is kept and can be transferred to the ownership of another staff member (such as a project manager, principal investigator, or scientist). The new owner then has complete control of the data and access controls. Even if a new owner is not immediately assigned, the data will not automatically be removed.
Getting an Aurora Account¶
All EMSL users who are part of an active EMSL user proposal are allowed access to Aurora. Request an account by contacting your PI (or by requesting an account using IOPS if you are an internal user).
If you will need more than 500 GB of space for storing your files, include this information in your request.
How to Transfer Files to Aurora¶
For users outside of PNNL, use one of these tranfer methods with your SecurID credentials:
Globus endpoint emsl#archive.
For users inside PNNL, use one of these transfer methods with your Kerberos/PNL domain credentials:
CIFS (Windows mount) at
HSI transfer client installed on EMSL compute resources.
Transferring files with HTAR¶
HTAR will move your data into Aurora and TAR it up on the fly. Simple instructions:
/msc/bin/htar -cvf /path/to/destination/file.tar /path/to/source/directory
-cvf options create (c) an archive, verbosely (v) report the incoming files, and (f) name the TAR file you’re creating. Last argument is the directory or list of files you want to archive.
HTAR will ask you for “principal” (this is your username) and then your password. Then it will begin creating the TAR file for you.
/msc/bin/htar –xvf /path/to/file.tar somedata
This will retrieve & extract the file “somedata” from file.tar. Leave this argument off to extract all files.
For more information, see Lawrence Livermore National Lab’s great guide to HTAR usage: https://hpc.llnl.gov/manuals/ezstorage/htar-examples
Transferring files with HSI¶
HSI is available on EMSL compute resources at
/msc/bin/hsi. HSI is significantly faster than
all other transfer methods and automatically handles files greater than 2TB. HSI operates similar
to a command-line FTP client. Example:
-bash-4.2$ /msc/bin/hsi Principal: username Password: Username: username UID: 21321 Acct: 0(0) Copies: 1 COS: 0 Firewall: off [hsi.6.0.0.p11 Thu Jan 31 14:15:58 PST 2019] ? lcd /home/username/mydata ? lls file1 file2 file3 ? cd /archive/username/tmp ? put file1 file2 file3 put 'file2' : '/username/tmp/file2' ( 0 bytes, 0.0 KBS (cos=1)) put 'file1' : '/username/tmp/file1' ( 0 bytes, 0.0 KBS (cos=1)) put 'file3' : '/username/tmp/file3' ( 0 bytes, 0.0 KBS (cos=1)) ?
lcd: change local directory
cd: change remote directory. Note the leading
put: store a file. You can type “put” by itself to see syntax and avalable options.
put -R: recursively store a directory.
help: Get a full list of available HSI commands.
Aurora’s data is mounted on /archive on the Tahoma and Cascade login nodes. If you should have a directory of your own under /archive/, e.g. if your username is d30000, you might see:
[d30000@cu0login3 ~]$ df -h /archive Filesystem Size Used Avail Use% Mounted on fuse 3.9P 1.4P 2.5P 36% /archive [d30000@cu0login3 ~]$ ls /archive/d30000 /archive/d30000/bjobsinfo.output /archive/d30000/random-junk.out /archive/d30000/n0.tar /archive/d30000/xc_v321_x86_64_0.dvdiso
You can copy files to and from your archive directory mostly as you would expect from a normal file system. However, if you copy particularly large files into your archive directory, or particularly old files out of your archive directory, you may find those operations to be slow.
Recently created or recently accessed files in the archive are likely to be stored on disk, and access to them will be fast. Files that are particularly large or have not been accessed in some time may only be stored on tape. In this case, it may take up to a minute to access the file contents because Aurora’s tape robot must automatically retrieve the correct tape and begin transferring the data.
It is important not to try to search for data in large numbers of files while they are in the archive, as tape access times will cause such a process to be slow. Instead, retrieve copies of the files to a local file system with good performance and perform the operation(s) there.
When storing data in Aurora, it is important to store fewer, larger files vs. many smaller files. Users are encouraged to use utilities such as ZIP and TAR to collate their data before transferring it to Aurora for long-term storage.