A problem is the cause of one or more incident
Introduction
- The solution to a problem can be permanent or only temporary due to a permanent solution not available
- To prevent a problem from recurring, a permanent solution must be identified
Responsibilities
- Provide support to incident management
- To minimize the adverse impact of incidents and problems and to prevent reoccurrence of incidents related to these errors
- Problem Management seeks to get to the root cause or the real reason of a problem
Example
- Referring back to the printer problem:
- After incident management provides the information to problem management, the technical staff will try to determine the cause of it and the solution
- E.g. Paper jam – provide instructions to remove paper
- E.g. Printer not added to Windows OS – Go to 'Control Panel' and add the printer
Workarounds
- For some incident, the solution implemented might be temporary as a permanent solution is not available and user needs the problem to go away fast
- Using the printer incident as an example, a workaround will be to ask the user affected to use another printer on another room
- Workarounds solve the problem temporarily but does not prevent it from happening
Root cause
- Root cause refers to the real cause of a problem
- To determine a root cause and its solution usually take more time than a workaround
- If the real cause of the incident is not determined and resolved and problem may recur unless root cause is determined
- For example, if it is found that a user computer is affected by virus and a particular file is corrupted. The immediate workaround is to replace the affect system file. However, to prevent it from happening again, problem management might suggest a long term solution such as installing a anti-virus software to solve the root of the problem
Difference between IM and PM
- Incident Management
- Aim is to restore the service to the customer as quickly as possible
- Work around might be used
- For example, rebooting a computer when it hang is just a temporary solution as the real problem is not solved
- Problem Management
- Aim focuses on the detection of the underlying or root cause of an incident reported and prevent it from happening
- For example, if a computer keeps hanging, it might be due to some registry or software problem and problem management needs to find out the real cause
Terminology
- Problem – A “Problem” is an unknown underlying cause of one or more incidents
- Known Error – A “Known Error” is a problem that is successfully diagnosed and root cause determined and a work-around has been identified
Relationship between processes
- As soon as the item that caused the problem is found, the problem changes into a known error
- A known error is removed by means of a change in the form of a RFC (request for change) request to the Change Management
Activities
- Problem resolution (Problem control + error control)
- Proactive problem management
- Management information – in terms of cause of problem and resolution
Problem resolution
- Incidents are unavoidable in an IT infrastructure
- Can be due to Telecommunications failure, software bugs etc
- Problem resolution can be broken down into two phases
- Problem control
- Error control
Problem control
- Problem control is the first phase of problem resolution
- Problem control has the purpose of identifying the cause of a problem. It has the threes sub phases
- Problem identification
- Problem classification
- Diagnosis
Problem identification
- A few incidents might be related to the same problem (e.g. mail related incidents and internet surfing incidents might be related to the same problem due to internet link problem)
- When an incident is received by problem management, there is a need to try and identify if the incident is related to any problem identified
Problem classification
- Classification group the problem based on different characteristic
- Classification based on
- Category – hardware, software etc
- Impact – effect to business if problem not resolved
- Urgency – when must be problem be solved
Investigation and diagnosis
- Diagnosis the problem to determine the cause and the Configuration Item causing the problem and produce a solution
- Expertise is needed to investigate and find out the cause of problem
- Problems can be more than just hardware or software
- Problems can be due to human error (e.g. wrong password, wrong version of software used)
Error control
- Error controls takes over the problem control when a solution is identified
- Error controls manage a problem by raising a RFC (request for change) to change management to seek permission to implement the solution found in problem control
- A RFC is a document stating the detail of a change needed to solve the root cause of a problem
Closure
- Once changes are implemented, the results has to be reviewed to make sure that the problem is resolved
- To know if problem is resolved, check with customer via the service desk
Types of problem management
There are two approaches towards managing a problem
- Reactive
- Proactive
Reactive problem management
- When a problem has occurred and a solution need to be found, it is considered reactive problem management
- Problem management tries to solve problem in two ways
- Finding a root cause and its solution for problem that has occurred
Proactive problem management
- So far we have been dealing with reactive problem management
- It is possible to prevent a problem from happening in the first place - Proactive
- Proactive problem management helps to manage a problem proactively (opposite of reactive) by trying to anticipate problems and prevent them from happening
- Proactive problem management helps to manage a problem proactively (opposite of reactive) by trying to anticipate problems and prevent them from happening
- The benefit of Proactive management is that it helps to identify possible problems that may occur and try to prevent the problem from occurring. This will reduce downtime for customer
Proactive problem management - methods
There are a few ways to achieve proactive management
- Trend analysis to identify areas where problem might grow out of control (e.g. more and more user affected by virus might indicate that anit-virus software is not adequate or users are not installing the software)
- Targeting preventive action for more critical problem
- Major problem reviews for further improvement
Putting all together
Advantages
- More effective and efficient incident handling
- Increased service quality
- Reduction in the number of incidents and problems
- Permanent solution to problems by finding root cause
Challenges
- Staff technical skill in adequate to determine root cause
- Lack of tracking and recording of problems and solution on past incidents
Source: OGC
No comments:
Post a Comment