New Path Cloud Path Blogspot.com: Functions and Processes in Service Operation

INCIDENT MANAGEMENT

An Incident is an unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a CI that has not yet affected service is also an incident

Introduction

When there is a problem or issue with a service or product provided, customer will call in to report or it might be detected by the employee of the provider company
Once a problem or issue is detected it has to be handled in a organized way
Incident management is the process used

Responsibilities
To restore normal service operation as quickly as possible and minimize the adverse impact on business operations

Incident definition
Definition of Incident

Any event which is not part of the standard/normal operation of a service (e.g. user requesting relocation of internet line or PC)
and or causes a reduction in the quality of service (e.g. internet link going down, mail server not working resulting in mail not being able to be sent)

Types of incidents

Software incident – software related errors resulting in service not available (e.g. run time error, corrupted files)
Hardware incident – hardware related errors resulting in service not available (e.g. hard disk crashed, internet link down, processor too slow)
Service request – not due to any problem or failure. Request from user for support or advice not being related to a failure (e.g. reset password, product information)

Processes and activities

Incident acceptance and recording
Classification
Matching
Investigate and diagnose
Resolution
Closure

Incident Management Lifecycle

Incident recording

When a incident is reported, information are being recorded
Activities

Recording name of user reporting incident
Recording Date/time
Assigning a incident or ticket number to user so that user can call back to check by referring to the number

Classification

Incident classification aims to determine the incident category to facilitate monitoring
Classification in terms of

Category
Priority

Category

Incident are assigned a category based on nature of the problem
Knowing the nature of problem help the service desk personnel to route the problem to the correct internal party to solve
For example:

Systems – operating system, applications
Network – router, hub, IP address
Workstation – network card, disk drive, keyboard
Service request – request by users for some information such as user guides

Example

When a user sends a request saying that he is unable to print. There can be various reasons why he is unable to print, the service desk needs to ask the right questions to narrow down the nature of problem and categorize it
Some questions to ask might be

What is the model of the printer he is connected to
Is the Windows printer driver on the PC?
Is the Toner Low in Printer
Is the Printer out of paper
Is there a Printer Paper Jams

Prioritization

When there is more than one incident coming in, it is necessary to classify the incidents based on their priority
By prioritizing the incidents into different priorities,

it helps the service desk personnel to monitor when an incident is taking too long to be resolved and proceed to escalation
identifies the more important incident that needs to be handled first

Prioritization method
Prioritize based on

Urgency (how soon it has to be done?)
Impact (What happens if it is not done?)
Expected effort (how much work to be done?)

Impact

Impact of the incident refers to the extend of the deviation from the normal service
Deviation in terms of numbers of users affected or business process affected
For example, 50 users affected by a email server breakdown or web server breakdown affecting business in terms of sales

Urgency

Urgency refers to the acceptable delay, to the user or business to solve the problem
For example, a mail server problem maybe impact a lot of people, but if it happen on a Friday night, then maybe the urgency is less as it can be rectified over weekend

Software

The screen capture below is a sample of how the incident management software that you use in the lab, prioritize the incident

Incident matching

Some incident might have happened before
During incident resolution process, incident information is entered into the problem management database and matched against past history database to see if there was a similar problem
Types of problems

New problem (no similar problem reported before)
Known problem (similar problem reported before but no solution)
Known error (similar problem reported before and solution is found)

When a problem is first reported and there are no past history of such problem, it is recorded in problem management as 'new problem'
Should a second incident come in reporting the same problem, the new problem is classified as 'known problem'
Once problem management receive the input from the incident management process that there is a new problem, problem management will try to come up with a solution
Once the problem management is able to come up with a solution to a a new problem or known problem, the problem will be classified as 'known error'
Since the 'known error' has a solution, incident management will use the solution to solve subsequent problems (related) that users reports

Incident matching diagram

Benefit of different stages of problem classification

When a new problem or known problem is being entered by incident management, the problem management can immediately act upon the problem
In the same token, when there is a solution available for a problem, the problem status becomes 'known error' and the incident management can immediate know that there is a solution available, thus reducing the time needed to solve a problem

Investigation and diagnosis

If a solution cannot be found in the database, it has to be escalated to be investigated
When a problem cannot be resolved by the service desk personnel, it has to be escalated to the next level according to the incident management flow - ESCALATION

Escalation

If an incident cannot be resolved by the service desk personnel based on the database, the incident will be escalated to a support group with more expertise and technical competence
Issue can be escalated in two ways

Functional Escalation: Escalate to someone who is at a higher level technically (eg. Service desk personnel to programmer) to solve it from technical perspective
Hierarchical Escalation: Escalate to someone who is at a higher authority and higher post

Functional Escalation

When an incident is first recorded it is being assigned to a technical staff if it is an problem that cannot be solved by the service desk personnel
If an incident cannot be resolved by first-line support within the agreed time based on the priority, more expertise will be required

Hierarchical Escalation

Hierarchical escalation means involving a higher level of organizational authority
Situation to involve higher authority escalation maybe happen when it appears that the incident maybe not be resolved in time or in a satisfactorily manner
For example, an urgent case of power outage that is supposed to be solved in 3 hrs is still not solved after 4 hrs and no solution is found. It might be necessary to escalate to the management as there might not be a technical solution available

Resolution
Based on the investigation earlier on, the following activities follows in the resolution stage

Solving the incident
The information is recorded into the database for future reference
What happens if the proble cannot be solved immediately? - RFC

Closure

Even when the incident has been resolved work does not end
Service desk contacts the person who reported the incident to verify that is has indeed been resolved

Advantages

Resolution in time to reduce impact
Management report can assist in proactive identification of problem (e.g. if a lot of users or customer are complaining about hard disk crashing for a particular brand, maybe should consider using another brand of hard disk)
Provide input to fine tune Service level agreement to meet customer‟s business needs
More organized and efficient staff

Challenges

Not enough management commitment for resource to implement
Lack of knowledge for resolving incidents
Lack of integration with other processes (e.g. Problem management)
Not able to know the level of service to provide to customer due to lack of SLA

Source: OGC

New Path Cloud Path Blogspot.com

Monday, January 24, 2011

Functions and Processes in Service Operation - Incident Management

No comments:

Post a Comment