In ITIL, Problem is defined as unknown cause of one or more incident.
Problem Management ensures the identification of problems and performs Root Cause Analysis. It also ensures that recurring incidents are minimized and problems can be prevented.
Problem Manager is the process owner of this process.
Problem Management comprises of activities required to diagnose the root cause of the incident and to determine the resolution to those problems.
When a problem is resolved after root cause analysis, it becomes known error.
Problem Management also records information regarding problems in a system called Known Error Database (KED).
Problem Management consists of following two processes −
Reactive Problem Management is executed as part of service operation.
Proactive Problem Management initiated in service operation but generally driven as part of Continual Service Improvement
The following diagram describes activities involved in Problem Management −
Problem can be detected in following ways −
Analysis of incident by technical support group.
Automated detection of an infrastructure or application fault, using alert tools automatically to raise an incident which may reveal the need for problem management.
A notification from supplier that a problem exists that has to be resolved.
Problem should be fully logged and contains the following details −
Priority and categorization details
Date/time initially logged
In order to trace true nature of Problem, It is must to categorize the Problems in same way as Incidents.
Problems must be categorized in the same way as incidents to identify how serious the Problem is from an infrastructure perspective.
It is temporary way to overcome the difficulties. Details of workaround should always be documented within Problem record.
Known error must be raised and placed in Known Error Database for future reference.
Once resolution is found, it must be applied and documented with the problem details.
At time of closure, a check should be performed to ensure that record contains full historical descriptions of all events.
A review of following things should be made −
Those things that were done correctly
Those things that were done wrong
What could be done better in future
How to prevent recurrence