CloudForest Policies


28 May 2019

CloudForest is a tool for performing computation on your data that exceeds the capacity of GSB’s on-premises research servers or other resources. It is designed to be used temporarily by you and your research team for specific compute tasks. CloudForest is not appropriate for storing or archiving data, or as a long-term replacement for use of existing on-premise computing environments. The system has the following policies to help keep costs manageable, access fair, and your data and work secure.

Access

  • Each GSB faculty can sponsor a single group for CloudForest access.

  • CloudForest users can be in any number of groups, so long as they are added by administrators of those groups.

  • Only researchers with a valid SUNet ID can be granted access to CloudForest dashboard and instances.

  • Access to CloudForest instances requires SUNet ID/password as well as DUO two-factor authentication. You will have the opportunity to set your password to anything on CloudForest instances, but must use your SUNet ID and Stanford two-factor device(s).

  • Access to CloudForest instances is granted to all members of the associated groups, and all users are standard (non-sudo) users of the machines. That is, users do not have administrator access to a CloudForest instance.

  • We offer three levels of access restriction: low, where the instance can be reached from anywhere (any IP address); medium, where the instance can only be reached from Stanford IP addresses; and high, where the instance can only be reached from a single IP address. These access restiction levels are not associated with Stanford Risk Classifications.

  • GSB’s DARC team act as administrators for all CloudForest instances. DARC staff can log into any instances at any time as a root user for investigative or maintenance tasks, and given sufficient justification (e.g., a security breach or policy violation) can stop and/or terminate any instances at any time with or without notice. Obviously DARC respects researcher activities and will endeavor to make system administration as invisible as possible while keeping users informed of any major actions required.

Automation

  • Idle instances will be stopped automatically after $20 of spend or 24 hours of inactivity, whichever comes first. The more resources you require, the faster you will reach this limit.

  • For the purposes of these policies, an instance is at idle over a particular time period (longer than 5 minutes) whenever the maximum 5-minute averaged CPU utilization during that period is less than 1%. DARC can provide users access to CPU utilization values and statistics, by request, to better understand their usage.

  • You will recieve a warning email after an hour of instance inactivity, a warning email an hour before the instance will stop, and an notification email after your instance was automatically stopped.

  • Stopping your instance will not delete data stored on the instance volumes, but may lose data not flushed to disk (e.g., in program workspace). DARC will not keep an idle instance alive by request; it is your responsibility to be consistently running useful work on CloudForest instances, writing results to disk, and moving data off instances to persistent storage.

  • Running software whose sole purpose is to circumvent our autostop policy (e.g., stress) will result in instance termination.

Data Retention

  • CloudForest is not designed to store and archive data.

  • CloudForest is not (yet) suitable for use with high-risk data.

  • Data transferred to and generated on CloudForest instances are not backed up. Please move final and/or intermediate results to an appropriate long-term storage solution. DARC can help you find the appropriate location for long-term data storage.

  • If you have a large data set (100s of GBs) to analyze repeatedly on CloudForest, please contact us to save time copying the data to separate instances multiple times.

  • Because CloudForest is designed as a temporary resource, instances will be terminated (deleted from AWS) after being stopped for 30 days. Data stored on any volumes associated with the instance will also be deleted. If you wish to retain your data, DARC can help you find the appropriate solution for long-term data storage.

Availability

  • You can only create, stop, re-start, and delete instances from the CloudForest dashboard.

  • CloudForest instances are currently created automatically, and are typically available and ready for use within a few minutes.

  • CloudForest instances are launched in AWS, and are not 100% reliable. Problems, though rare, do occur. DARC does not currently run replicated clusters for you; if an instance becomes unreachable or fails, you may have to restart work when resources again become available.

  • Moreover, not every instance type is always available under standard caps placed by AWS on our services. We cannot guarantee the availability of any instance type at any time for you.

Computation

  • Approved users (from user groups) can launch a CloudForest instance and use it to support their research for active computing tasks over a reasonably limited amount of time.

  • CloudForest instances have the same set of software as the yen servers. Customizing software is possible within each user’s own space; e.g., using pip install --user ... with python.

  • Users are responsible for monitoring the progress of their work, moving data off of instances to persistent storage, and stopping/terminating the instance when their work is completed.

  • CloudForest instances will be closely monitored, and thus we can provide you with feedback on how well you are utilizing your instances. DARC reserves the right to use monitoring to restrict available instance types to improve computations and/or reduce costs, potentially including downsizing instances where appropriate.

Storage

  • It is your responsibility to transfer, organize, and manage data on CloudForest instances.

  • DARC will snapshot (backup) and delete data volumes associated with CloudForest instances when they are stopped, and as soon as they are stopped, in order to not interfere with running computations. Users can initiate snapshots using the cfsave tool installed on all instances, acknowledging that (i) using cfsave while running computations can interrupt work and (ii) CloudForest will maintain only one backup of any instance (calling cfsave twice overwrites the backup). Instances and data stored on them can be reconstructed and restarted from these snapshots.

  • In order to maintain strong security in CloudForest instances DARC will not ever backup instance root volumes. This ensures that we can always restart instances with the latest available OS, software, and security features. DARC encourages you to use the home and data volumes as much as possible, especially for any data or software you will need availble on restart.

  • DARC will delete both snapshots and volumes (as applicable) when instances are terminated. At this point the instances are not recoverable.

  • Snapshots incur costs too. Data transferred to and generated on CloudForest instances will be stored in snapshots for 30 days after instances are stopped, after which they will be deleted.

  • An email notice will be sent to users at the end of the 30-day storage period. You will have the option of either requesting a data storage extension or transferring the data from CF to another location. A limited number of extensions can be granted for another 30 days, after which snapshots will be deleted.

Security

  • Logging into CloudForest instances requires a valid SUNetID and Stanford DUO two-factor authentication.

  • CloudForest is not (yet) suitable for storing or computing with high-risk data.

  • Using CloudForest to (temporarily) store data and compute with it must be allowable under any terms of usage for the associated data in any applicable ToS, NDA, or DUA documents. Contact DARC if you are unsure if CloudForest is compliant with your contracts.

  • Regardless of perceived data risk level, DARC recommends users contact University IT regarding whether a Data Risk Assessment (DRA) should be completed before undertaking their research. DARC reserves the right to require faculty and/or their collaborators undertake a DRA to utilize CloudForest, depending on the content of the data and associated analysis.

  • CloudForest users cannot share their access credentials with any other parties, nor can they modify CloudForest instances to change security settings or grant access to anyone except through DARC. Any member of a group doing so will require DARC to immediately revoke access to CloudForest for user and pause access for any group(s) they were in during investigations.

Budget

  • CloudForest machines have an hourly cost, regardless of whether they are being used. This cost has two components: charges for the use of the machines, and charges for the use of the data storage devices associated with the machines.

  • To facilitate effective cost control, DARC will make its best effort to provide users with cost estimates before instances are launched and an always-available site with up-to-date information about current and historical usage.

  • CloudForest does not currently have any faculty-specific spending limits. However DARC reserves the right to impose budgets if the usage pool expands.