Update on MyFiles work, 9 April

Maintenance badge

[09 April 2014]

Over the past three weeks we have seen significant issues affecting the Student and Staff “MyFiles” Services. This has led to periods when the service was unavailable as work took place to recover the service and, in the case of the Staff MyFiles, reconfigure the data structure to make it perform correctly.

Background

Between 14 and 20 March there were problems which affected both the staff and student MyFiles services, and full recovery took until 27 March. Information on these problems is in the attached document MyFiles Services Incidents, 14-26 March 2014 (UoB only).

In order to carry out the recovery, we put in place new filestore components but we made clear these are not long-term solutions. We have identified work that needs to be carried out over the next few weeks to increase performance, resilience and reliability.

Recent issues

On Sunday 30 March and again in the early morning of 3 April we saw a disruption to the staff MyFiles service when hardware on some elements of the new filestore failed. The staff MyFiles service became unavailable for some users for short periods. The disc controller at fault was restarted on both occasions and access restored. The supplier has since identified that they made an error with a factory setting.

The student MyFiles service encountered a further problem on 2 April when under heavy load, and we had to take emergency action and take the service down at short notice to avoid damage to data. A limited service was quickly put in place which gave students access to a subset of their files while the main disk volume was repaired. The full service was made available on the evening of 3 April.

On the morning of 8 April, many staff reported problems with access to MyFiles. The cause of this was a problem with network Domain Name Systems (DNS) and not with storage-MyFiles.

Next steps

We have additional hardware resources on order, arriving later this week and next. Using this, we will be able to divide up the Student MyFiles data volume and move the already sub-divided chunks of Staff MyFiles data to better provisioned volumes, which operate with better performance. This will mean the data is more manageable i.e. we can run normal nightly checks and de-duplication activity and be confident these will run to completion, increasing reliability. There are no single system points of failure, but the amount of data we are restructuring (100TB or so) means that work is by necessity not immediately able to be completed.

We estimate that the work will take a few weeks and until this is complete the risk of further “go-slows” must be considered a possibility and there may need to be some planned maintenance / downtime. Once this work is completed any service risks should reduce very considerably; if problems were encountered a much smaller cohort of staff or students would be affected at any one time.

Longer term

Work will not stop there. We have commissioned a storage consultant to assist us with the longer term plan. They are already with us and taking a current picture of all our file storage, departmental and personal - they will work with us to look at options for the future and make further recommendations over the coming days. This will take weeks to consider and months to implement- but will mean that we reduce the risk of future incidents of this nature as far as possible as well as giving us improved storage for all services.

Changes to Filestore management

There are two significant strands that will be implemented for future management of the service

  1. We will have in place advertised periods of maintenance to the Filestore service. The details of this have not been finalised but may include periods where the service may be unavailable or read only for a time for essential maintenance work.
  2. We will implement and promote clear guidelines on what data can be stored in Filestore locations, this includes use of Central Filestore, Google Drive and the Research Data Storage Facility. We will impose a quota on the amount of data that can be stored in each area, where that is not already in place. We will assist and guide staff to store data in the most appropriate locations.

Advice to staff

  • Please review your data and ensure that only appropriate data is stored in MyFiles - work related documentation for your use (but not research data).
  • Investigate other storage services available and feel free to use them or request access - see the webpage for information: http://www.bristol.ac.uk/it-services/applications/whichfilestore.html
  • If you require support for any work necessary, or think you may have a problem with your data in MyFiles, please contact the IT Service Desk.