Monday, January 24, 2011

Functions and Processes in Service Operation - Incident Management

INCIDENT MANAGEMENT

An Incident is an unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a CI that has not yet affected service is also an incident

Introduction
  • When there is a problem or issue with a service or product provided, customer will call in to report or it might be detected by the employee of the provider company
  • Once a problem or issue is detected it has to be handled in a organized way
  • Incident management is the process used

Responsibilities
To restore normal service operation as quickly as possible and minimize the adverse impact on business operations

Incident definition
Definition of Incident
  • Any event which is not part of the standard/normal operation of a service (e.g. user requesting relocation of internet line or PC)
  • and or causes a reduction in the quality of service (e.g. internet link going down, mail server not working resulting in mail not being able to be sent)

Types of incidents
  • Software incident – software related errors resulting in service not available (e.g. run time error, corrupted files)
  • Hardware incident – hardware related errors resulting in service not available (e.g. hard disk crashed, internet link down, processor too slow)
  • Service requestnot due to any problem or failure. Request from user for support or advice not being related to a failure (e.g. reset password, product information)

Processes and activities
  • Incident acceptance and recording
  • Classification
  • Matching
  • Investigate and diagnose
  • Resolution
  • Closure

Incident Management Lifecycle



Incident recording
  • When a incident is reported, information are being recorded
  • Activities
    • Recording name of user reporting incident
    • Recording Date/time
    • Assigning a incident or ticket number to user so that user can call back to check by referring to the number

Classification
  • Incident classification aims to determine the incident category to facilitate monitoring
  • Classification in terms of
    • Category
    • Priority

Category
  • Incident are assigned a category based on nature of the problem
  • Knowing the nature of problem help the service desk personnel to route the problem to the correct internal party to solve
  • For example:
    • Systems – operating system, applications
    • Network – router, hub, IP address
    • Workstation – network card, disk drive, keyboard
    • Service request – request by users for some information such as user guides

Example
  • When a user sends a request saying that he is unable to print. There can be various reasons why he is unable to print, the service desk needs to ask the right questions to narrow down the nature of problem and categorize it
  • Some questions to ask might be
    • What is the model of the printer he is connected to
    • Is the Windows printer driver on the PC?
    • Is the Toner Low in Printer
    • Is the Printer out of paper
    • Is there a Printer Paper Jams

Prioritization
  • When there is more than one incident coming in, it is necessary to classify the incidents based on their priority
  • By prioritizing the incidents into different priorities,
    • it helps the service desk personnel to monitor when an incident is taking too long to be resolved and proceed to escalation
    • identifies the more important incident that needs to be handled first

Prioritization method
Prioritize based on
  • Urgency (how soon it has to be done?)
  • Impact (What happens if it is not done?)
  • Expected effort (how much work to be done?)

Impact
  • Impact of the incident refers to the extend of the deviation from the normal service
  • Deviation in terms of numbers of users affected or business process affected
  • For example, 50 users affected by a email server breakdown or web server breakdown affecting business in terms of sales

Urgency
  • Urgency refers to the acceptable delay, to the user or business to solve the problem
  • For example, a mail server problem maybe impact a lot of people, but if it happen on a Friday night, then maybe the urgency is less as it can be rectified over weekend

Software
  • The screen capture below is a sample of how the incident management software that you use in the lab, prioritize the incident



Incident matching
  • Some incident might have happened before
  • During incident resolution process, incident information is entered into the problem management database and matched against past history database to see if there was a similar problem
  • Types of problems
    • New problem (no similar problem reported before)
    • Known problem (similar problem reported before but no solution)
    • Known error (similar problem reported before and solution is found)
  • When a problem is first reported and there are no past history of such problem, it is recorded in problem management as 'new problem'
  • Should a second incident come in reporting the same problem, the new problem is classified as 'known problem'
  • Once problem management receive the input from the incident management process that there is a new problem, problem management will try to come up with a solution
  • Once the problem management is able to come up with a solution to a a new problem or known problem, the problem will be classified as 'known error'
  • Since the 'known error' has a solution, incident management will use the solution to solve subsequent problems (related) that users reports

Incident matching diagram



Benefit of different stages of problem classification
  • When a new problem or known problem is being entered by incident management, the problem management can immediately act upon the problem
  • In the same token, when there is a solution available for a problem, the problem status becomes 'known error' and the incident management can immediate know that there is a solution available, thus reducing the time needed to solve a problem

Investigation and diagnosis
  • If a solution cannot be found in the database, it has to be escalated to be investigated
  • When a problem cannot be resolved by the service desk personnel, it has to be escalated to the next level according to the incident management flow - ESCALATION

Escalation
  • If an incident cannot be resolved by the service desk personnel based on the database, the incident will be escalated to a support group with more expertise and technical competence
  • Issue can be escalated in two ways
    • Functional Escalation: Escalate to someone who is at a higher level technically (eg. Service desk personnel to programmer) to solve it from technical perspective
    • Hierarchical Escalation: Escalate to someone who is at a higher authority and higher post

Functional Escalation
  • When an incident is first recorded it is being assigned to a technical staff if it is an problem that cannot be solved by the service desk personnel
  • If an incident cannot be resolved by first-line support within the agreed time based on the priority, more expertise will be required

Hierarchical Escalation
  • Hierarchical escalation means involving a higher level of organizational authority
  • Situation to involve higher authority escalation maybe happen when it appears that the incident maybe not be resolved in time or in a satisfactorily manner
  • For example, an urgent case of power outage that is supposed to be solved in 3 hrs is still not solved after 4 hrs and no solution is found. It might be necessary to escalate to the management as there might not be a technical solution available

Resolution
Based on the investigation earlier on, the following activities follows in the resolution stage
  • Solving the incident
  • The information is recorded into the database for future reference
  • What happens if the proble cannot be solved immediately? - RFC

Closure
  • Even when the incident has been resolved work does not end
  • Service desk contacts the person who reported the incident to verify that is has indeed been resolved

Advantages
  • Resolution in time to reduce impact
  • Management report can assist in proactive identification of problem (e.g. if a lot of users or customer are complaining about hard disk crashing for a particular brand, maybe should consider using another brand of hard disk)
  • Provide input to fine tune Service level agreement to meet customer‟s business needs
  • More organized and efficient staff

Challenges
  • Not enough management commitment for resource to implement
  • Lack of knowledge for resolving incidents
  • Lack of integration with other processes (e.g. Problem management)
  • Not able to know the level of service to provide to customer due to lack of SLA

Source: OGC

No comments:

Post a Comment