What is sequential pattern mining?

Sequential pattern mining is the mining of frequently appearing series events or subsequences as patterns. An instance of a sequential pattern is users who purchase a Canon digital camera are to purchase an HP color printer within a month.

For retail information, sequential patterns are beneficial for shelf placement and promotions. This industry, and telecommunications and different businesses, can also use sequential patterns for targeted marketing, user retention, and several tasks.

There are several areas in which sequential patterns can be used such as Web access pattern analysis, weather prediction, production processes, and web intrusion detection.

Given a set of sequences, where each sequence includes a file of events (or elements) and each event includes a group of items, and given a user-specified minimum provide threshold of min sup, sequential pattern mining discover all frequent subsequences, i.e., the subsequences whose occurrence frequency in the group of sequences is no less than min_sup.

Let I = {I1, I2,..., Ip} be the set of all items. An itemset is a nonempty set of items. A sequence is an ordered series of events. A sequence s is indicated {e1, e2, e3 … el} where event e1 appears before e2, which appears before e3, etc. Event ej is also known as element of s.

In the case of user purchase information, an event defines a shopping trip in which a customer purchase items at a specific store. The event is an itemset, i.e., an unordered list of items that the customer purchased during the trip. The itemset (or event) is indicated (x1x2···xq), where xk is an item.

An item can appear just once in an event of a sequence, but can appear several times in different events of a sequence. The multiple instances of items in a sequence is known as the length of the sequence. A sequence with length l is known as l-sequence.

A sequence database, S, is a group of tuples, (SID, s), where SID is a sequence_ID and s is a sequence. For instance, S includes sequences for all users of the store. A tuple (SID, s) is include a sequence α, if α is a subsequence of s.

This phase of sequential pattern mining is an abstraction of user-shopping sequence analysis. Scalable techniques for sequential pattern mining on such records are as follows −

There are several sequential pattern mining applications cannot be covered by this phase. For instance, when analyzing Web clickstream series, gaps among clicks become essential if one required to predict what the next click can be.

In DNA sequence analysis, approximate patterns become helpful because DNA sequences can include (symbol) insertions, deletions, and mutations. Such diverse requirements can be considered as constraint relaxation or application.