Data Center Operational Safety and Security
Why do co-operate companies have their own data center and many of them declining to move to the cloud? Many of the surveys show that the top notched reason is the security considerations which can be either physical or logical. When we talk about security there is a co-related word which is Safety. So this is a hot topic, we will discuss more about the physical security aspects which are inevitable when considering a data center.
So what is security in terms of a data center? Data Center security is the set of policies, precautions, and practices adopted to avoid unauthorized access and manipulation of a data center’s resources. A physical security typically includes a set of physical security measures (e.g. perimeter barriers, access control, fencing, etc.) electronic security systems (e.g. access control, cameras, alarms, exterior lighting, etc.) and an operational security(focuses on policies, processes, training, written and unwritten protocols and on personnel). Most security failures occur on the operational side. As such, operational security aspects should be included in a comprehensive security assessment.
The term safety is used to refer to the condition of being protected from the aspects that are likely to cause harm. In addition, the term safety can be used to refer to the state at which one has the control of the risk causing aspects hence protecting himself or herself against the risk that is fully unintended.
In this article, we will discuss the operational security and safety procedures of a data center and its various considerations.
Difference between safety and security
One of the primary differences between the two terms is their definition. Security refers to the protection of individuals, organizations, and properties against external threats that are likely to cause harm. It is clear that security is generally focused on ensuring that external factors do not cause trouble or unwelcome situation to the organization, individuals, and the properties within the premises. On the other hand, safety is the feeling of being protected from the factors that cause harm. It is also important to highlight that an individual who controls the risk causing factors has the feeling of being safe. Whenever we discuss security it will also include steps for safety which we cannot deny.
The operational security covers major security precautions that are necessary to be followed to avoid operational risks. Operational risk summarizes the uncertainties and hazards a company faces when it attempts to do its day-to-day business activities within a given field or industry. Data Center facility operations should implement the following policies,
- Specific data center work rules, which are thoroughly reviewed and signed by each individual before entering the facility the first time (and again annually).
- Limited access minimize that permitted unescorted access
- Shipping/receiving only on a planned basis unscheduled deliveries turned away
- Computer hardware installation planned in advance by a team of IT and facilities individuals
- Power and network cabling connections performed only by designated and trained individuals
- Team development and continuous training
In fact, the operational risk management cycle will define us to identify and eliminate security issues apply safety measures.
Now let us look into some of the operational security and safety measures in data centers,
Site Induction
This is one of the major training when someone is stepping into data centers for the first time. Site inductions training is designed to detail specific information covering safety and working rules which will be provided by someone from the DC operations team to the one who enters the facility. These are aligned to cover both permanent data center staff(yearly once and whenever a new amendment happens) and visiting contractors. Site inductions ensure safe practices are carried out within the data center and provide the initial working authorization. All personnel entering the data center should be signed off as having completed the induction process. Depends on the site complexity, site induction training may take from 15 minutes to a few days.
Risk Assessment
Risk assessment is an inherent part of a broader risk management strategy to “introduce control measures to eliminate or reduce” any potential risk-related consequences. Risk assessment consists of an objective evaluation of risk in which assumptions and uncertainties are clearly considered and presented.
Risk assessment is the combined effort of,
- Identifying and analyzing potential (future) events that may negatively impact individuals, assets, and/or the environment (i.e. risk analysis).
- Making judgments “on the tolerability of the risk on the basis of risk analysis” while considering influencing factors (i.e. risk evaluation).
Put in simpler terms, a risk assessment determines possible mishaps, their likelihood and consequences, and the tolerances for such events.
The first step in risk assessment is to establish the context of the given task. This restricts the range of hazards(anything that can cause harm)that need to be considered. Risk assessment is something that you required to do it many times in a data center environment. For example, when you required to drill a wall of the data center for any new connections, you may need to prepare the risk assessment document and get it approved from relevant teams to start the action.
Method Statement
Method statements, sometimes called ‘a safe system of work’ are documents that detail the way in which tasks should be completed to adhere to safe working practices. The simplest term would be, method statement is a sequence of specific steps taken to complete a work task in a safe manner.
Method Statements are frequently requested as part of many processes to gain an insight into your organization and the way you operate. It’s your opportunity to show how you’ll provide goods and services in a safe and high-quality manner. The method statement should be written by a person that is competent in the task. When a method statement is prepared, the risks are identified during the work sequence and we can prepare the necessary to avoid any risks that may occur. The method statement should explain in detail the work that is to be undertaken, which may also include,
- Sequence of operating tasks
- Workplace details and areas to be segregated
- Detail of emergency or operational recovery procedures
One of the examples that you have to submit a method statement in the data center would be, when you want to receive an item that is heavy and involves risks, you must submit a method of statement and get the necessary approvals from capacity management and safety teams.
Permit to Work (PTW)
Permits to Work ( PTW ) are a formal management system used to control high risk. It ensures that the work which is intended to take place is properly authorized with control and implement specific control measures. A work permit is a written record that authorizes specific work, at a specific location, for a specific time period.
- These are used for controlling and coordinating work to establish and maintain safe working conditions.
- Permit to work ensure that all foreseeable hazards have been considered and that the appropriate precautions are defined and carried out in the correct sequence.
- Change management processes are also used to authorize activities.
- Ensures that all persons who have control of or are affected by the activity are aware of the activity procedure.
Usually, permit to work(PTW) in a data center will be reviewed and approved by HSE (Health, safety, and environment) department.
One of the questions that you may ask at this point would be “Isn’t method of statement and Permit to work are same”. The answer is NO. Even though both the term descriptions are looks similar, the method statement is only a description of how the task/activity would be safely carried out. Permit to work is a system used to control high-risk activities to ensure that adequate controls are in place before work is allowed to commence.
Employee’s Responsibilities
As an employee, we are all having responsibilities to make sure our working environment is safe and secure.
- Take reasonable care for your own health and safety and that of others who may be affected by what you do or do not do.
- Cooperate with your employer on health and safety
- Correctly use work items provided by the employer, including personal protective equipment, in accordance with training or instructions
- Do not interfere with or misuse anything provided for your health, safety or welfare.
A data center facility will operate successfully if the Facilities team is provided with management support, appropriate resources, and site-specific systems experience. Effectively deploying a facility operations strategy, and the additional control processes will provide for a much higher reliability potential over the life of the facility. With these practices in place, you may realistically achieve multiple years of continuous facilities systems availability significant savings when compared to the average operating experience in the critical data center sector.
Manual Handling
We know that there are many things to be handled in a data center such as lifting and shifting which all required manual handling. Many injuries are caused by incorrect methods when lifting heavy items. Some of the steps to avoid these risks are,
- Take necessary precautions when handling things — Wear safety footwears, glows and glasses(if applicable).
- Bend knees, not back – Many of the materials that you are handling in a data center would be heavy. Hence you have to consider this when lifting the materials.
- Consider asking for assistance – Seek assistance and get advice.
- Try to use mechanical lifting aids wherever practicable.
Things to be considered can be summarized as TILE.
- T – TASK(Consider the task required)
- I –Individual(Consider the ability of the individual to perform the task)
- L – Load(Consider the load, weight, dimensions, center of gravity, etc.)
- E – Environment(Consider the environment: clearances, surfaces, temperature, etc.)
Confined Spaces
How many of you have heard about the term confined spaces and are you aware of these areas in your data center? Basically, a confined space is a space with limited entry and egress and not suitable for human inhabitants. An example is the interior of a storage tank, occasionally entered by maintenance workers but not intended for human occupancy. Hazards in a confined space often include harmful dust or gases, asphyxiation, submersion in liquids or free-flowing granular solids (for example, grain bins), electrocution, or entrapment. Many workplaces contain areas that are considered “confined spaces” because while they are not necessarily designed for people, they are large enough for workers to enter and perform certain jobs. A confined space also has limited or restricted means for entry or exit and is not designed for continuous occupancy.
A number of people are killed or seriously injured each year in confined spaces. These occur across a wide range of industries, from complex plant to simple storage vessels. Those killed include not only people working in confined spaces but those who try to rescue them without proper training or equipment.
Think about these areas in your data center? Are you able to identify any of these near you? Here is a list of confined spaces in a data center environment, but are not limited to.
- Underfloor
- False ceilings
- Cable pits
- Low oxygen areas
- Hazardous atmosphere
- Between cabinets in aisles
- Enclosed hot or cold aisles
- Tanks, vessels, storage bins
- Vaults, manholes, tunnels
- Equipment housings, ductwork, pipelines, etc.
Even though this is the task of the HSE department to make sure the necessary precautions. It’s our duty and responsibility that we take enough precautions when working on these areas.
Electrical Safety
The steps to take care of electrical safety and security might be something many of you are already aware of. Let us see some of the criteria to be considered in our day to day life,
- Awareness — Even if you are an electrical, mechanical or IT engineer, it’s your first initiative to get an overall awareness about the data center and its related components. One example is the case of electrical safety is to be aware that most equipment has at least one backup power supply. Hence you must all make sure that all actions must be in consideration with this.
- Make sure it is completely switched off! – Whenever you are working with a device that utilizes the electricity, make sure the devices are completely switched off which will protect the device as well as human lives.
- Utilize lock-out/tag-out procedures – Make sure Lockout/Tagout ( or lock and tag safety procedure is employed. LOTO is used in the data center sector and industrial settings to ensure dangerous machines are properly shut off, and not started up again prior to the completion of maintenance or servicing work. It requires hazardous power sources to be “isolated and rendered inoperative” before any maintenance or repair procedure is started .”Lock and tag” works in conjunction with a lock, usually locking the device or the power source, such that no hazardous power sources can be turned on. The procedure requires a tag to be affixed to the locked device, indicating it should not be turned on.
- Do not stretch cables – Stretching the cables too much can cause the damage of cables and interruption in the flow of current. Hence make sure that you do not bend or stretch the cables too much.
- Always disconnect from power supply before any maintenance/adjustments are carried out.
- When drilling, always check the other side of the wall.
Fiber & Laser Safety
Due to the higher speed and bandwidth requirements, we are all moving to Optical fiber networking. We knew that data transmission through fiber optic cables are in the form of laser lights. But these laser lights can be dangerous to the human body in many ways. A sudden flash from a laser causes a spot or halo to remain at the center of the visual field for a few seconds or even a minute, rendering a person virtually blind to all other visual input. Not only that, when you are working with fiber optic cables you may all want to different chemicals and cleaning agents in many instances. You have to take extra precautions when you are working on these environments.
So some of the precautions can be,
- Never look directly into connectors
- Never look directly into an unknown light sources
- Never look at a live laser source
- Always ensure you know what you are looking into
Lone Workers
A lone worker is an employee who performs an activity that is carried out in isolation from other workers without close or direct supervision. Such staff may be exposed to risk because there is no-one to assist them and so a risk assessment may be required. In the event of an emergency, whether it may be medical or caused by any of the other issues talked about here very often operators(and particularly third party engineers) are working their own. How do we deal with this situation? One of the answers can be to monitor the activity through CCTV security cameras which will be present in all data centers. What happens if there is a blind spot and it’s not necessary that every square inch of the data center can be visible through CCTV. One of the best solutions for this issue is the usage of “Lone worker down or Man Down Safety Devices” which is generally a lone worker tracking system to make sure safety and security.
Noise Level
We all know that in a data center environment it is having many of the heavy equipment such as generators, devices that are included in power train, equipment related to cooling infrastructure(CRAC units), legacy servers, legacy networking devices, high compute racks(HPC racks), etc. If you have been around these devices you will notice that many of these are producing a high amount of noise within their working environment. Isn’t the high level of noise is dangerous and cause safety and health issues? The actual fact is that exposure to prolonged or excessive noise has been shown to cause a range of health problems ranging from stress, poor concentration, productivity losses in the workplace, and communication difficulties and fatigue from lack of sleep, to more serious issues such as cardiovascular disease, cognitive impairment, tinnitus and hearing loss. So it’s clear that NOISE is not a simple thing that we can ignore.
Data Center operators need to consider these noise levels and take necessary precautions to avoid the challenges. Noise levels now need to be tackled at the design stage with the selection of quieter CRAC units to minimize noise emissions. Noise level regulations came into force at the end of 2005 throughout the world.
Lower exposure action values:
- Daily or weekly exposure of 80 dB
- Peak sound pressure of 135 dB
Upper exposure action values:
- Daily or weekly exposure of 85 dB
- Peak sound pressure of 137 dB
Some of the precautions that you will be seen in a data center would be,
- Sound-proof your space
- Keep noisy machines away or keep the devices in a closed room that do not require to access to the public.
- Installnoise-reducing insulation and glass.
- Use earplugs and noise-canceling headphones
Personal Protective Equipment (PPE)
Personal protective equipment is protective clothing, helmets, goggles, or other garments or equipment designed to protect the wearer’s body from injury or infection. The hazards addressed by protective equipment include physical, electrical, heat, chemicals, biohazards, and airborne particulate matter.
So it’s necessary for a person to make sure that they wore PPE when and wherever applicable for his/her own safety and security.
Have a comment or points to be reviewed? Let us grow together. Feel free to comment.
Thank you for sharing this amazing content. There are a range of factors driving, restricting and creating opportunities for the data center security market. Increasing demand for advanced security solutions, while enabling enterprises meet the regulatory compliance has emerged as a key driver.