Monday, January 24, 2011

Functions and Processes in Service Operation - Problem Management

PROBLEM MANAGEMENT

A problem is the cause of one or more incident

Introduction
  • The solution to a problem can be permanent or only temporary due to a permanent solution not available
  • To prevent a problem from recurring, a permanent solution must be identified

Responsibilities
  • Provide support to incident management
  • To minimize the adverse impact of incidents and problems and to prevent reoccurrence of incidents related to these errors
  • Problem Management seeks to get to the root cause or the real reason of a problem

Example
  • Referring back to the printer problem:
  • After incident management provides the information to problem management, the technical staff will try to determine the cause of it and the solution
  • E.g. Paper jam – provide instructions to remove paper
  • E.g. Printer not added to Windows OS – Go to 'Control Panel' and add the printer

Workarounds
  • For some incident, the solution implemented might be temporary as a permanent solution is not available and user needs the problem to go away fast
  • Using the printer incident as an example, a workaround will be to ask the user affected to use another printer on another room
  • Workarounds solve the problem temporarily but does not prevent it from happening

Root cause
  • Root cause refers to the real cause of a problem
  • To determine a root cause and its solution usually take more time than a workaround
  • If the real cause of the incident is not determined and resolved and problem may recur unless root cause is determined
    • For example, if it is found that a user computer is affected by virus and a particular file is corrupted. The immediate workaround is to replace the affect system file. However, to prevent it from happening again, problem management might suggest a long term solution such as installing a anti-virus software to solve the root of the problem

Difference between IM and PM
  • Incident Management
    • Aim is to restore the service to the customer as quickly as possible
    • Work around might be used
    • For example, rebooting a computer when it hang is just a temporary solution as the real problem is not solved
  • Problem Management
    • Aim focuses on the detection of the underlying or root cause of an incident reported and prevent it from happening
    • For example, if a computer keeps hanging, it might be due to some registry or software problem and problem management needs to find out the real cause

Terminology
  • Problem – A “Problem” is an unknown underlying cause of one or more incidents
  • Known Error – A “Known Error” is a problem that is successfully diagnosed and root cause determined and a work-around has been identified

Relationship between processes
  • As soon as the item that caused the problem is found, the problem changes into a known error
  • A known error is removed by means of a change in the form of a RFC (request for change) request to the Change Management

Activities
  • Problem resolution (Problem control + error control)
  • Proactive problem management
  • Management information – in terms of cause of problem and resolution

Problem resolution
  • Incidents are unavoidable in an IT infrastructure
  • Can be due to Telecommunications failure, software bugs etc
  • Problem resolution can be broken down into two phases
    • Problem control
    • Error control


>


Problem control
  • Problem control is the first phase of problem resolution
  • Problem control has the purpose of identifying the cause of a problem. It has the threes sub phases
    • Problem identification
    • Problem classification
    • Diagnosis

Problem identification
  • A few incidents might be related to the same problem (e.g. mail related incidents and internet surfing incidents might be related to the same problem due to internet link problem)
  • When an incident is received by problem management, there is a need to try and identify if the incident is related to any problem identified

Problem classification
  • Classification group the problem based on different characteristic
  • Classification based on
    • Category – hardware, software etc
    • Impact – effect to business if problem not resolved
    • Urgency – when must be problem be solved

Investigation and diagnosis
  • Diagnosis the problem to determine the cause and the Configuration Item causing the problem and produce a solution
  • Expertise is needed to investigate and find out the cause of problem
  • Problems can be more than just hardware or software
  • Problems can be due to human error (e.g. wrong password, wrong version of software used)

Error control
  • Error controls takes over the problem control when a solution is identified
  • Error controls manage a problem by raising a RFC (request for change) to change management to seek permission to implement the solution found in problem control
  • A RFC is a document stating the detail of a change needed to solve the root cause of a problem

Closure
  • Once changes are implemented, the results has to be reviewed to make sure that the problem is resolved
  • To know if problem is resolved, check with customer via the service desk

Types of problem management
There are two approaches towards managing a problem
  • Reactive
  • Proactive

Reactive problem management
  • When a problem has occurred and a solution need to be found, it is considered reactive problem management
  • Problem management tries to solve problem in two ways
    • Finding a root cause and its solution for problem that has occurred

Proactive problem management
  • So far we have been dealing with reactive problem management
  • It is possible to prevent a problem from happening in the first place - Proactive
  • Proactive problem management helps to manage a problem proactively (opposite of reactive) by trying to anticipate problems and prevent them from happening
  • Proactive problem management helps to manage a problem proactively (opposite of reactive) by trying to anticipate problems and prevent them from happening
  • The benefit of Proactive management is that it helps to identify possible problems that may occur and try to prevent the problem from occurring. This will reduce downtime for customer

Proactive problem management - methods
There are a few ways to achieve proactive management
  • Trend analysis to identify areas where problem might grow out of control (e.g. more and more user affected by virus might indicate that anit-virus software is not adequate or users are not installing the software)
  • Targeting preventive action for more critical problem
  • Major problem reviews for further improvement

Putting all together



Advantages
  • More effective and efficient incident handling
  • Increased service quality
  • Reduction in the number of incidents and problems
  • Permanent solution to problems by finding root cause

Challenges
  • Staff technical skill in adequate to determine root cause
  • Lack of tracking and recording of problems and solution on past incidents
.
Source: OGC

No comments:

Post a Comment