1. For each of the following questions, provide an example of an association rule from the market basket domain that satisfies the following conditions. Also, describe whether such rules are subjectively interesting.
(a) A rule that has high support and high confidence.
(b) A rule that has reasonably high support but low confidence
(c) A rule that has low support and low confidence.
(d) A rule that has low support and high confidence.
2. Consider the data set shown in Table 5.20.
Table 5.20. Example of market basket transactions.
Customer ID Transaction ID Items Bought
1 0001 {a, d, e} 1 0024 {a, b, c, e} 2 0012 {a, b, d, e} 2 0031 {a, c, d, e} 3 0015 {b, c, e} 3 0022 {b, d, e} 4 0029 {c, d} 4 0040 {a, b, c} 5 0033 {a, d, e} 5 0038 {a, b, e}
(a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket.
(b) Use the results in part (a) to compute the confidence for the associa- tion rules {b, d} ?? {e} and {e} ?? {b, d}. Is confidence a symmetric measure?
(c) Repeat part (a) by treating each customer ID as a market basket. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise).
(d) Use the results in part (c) to compute the confidence for the association rules {b, d} ?? {e} and {e} ?? {b, d}.
(e) Suppose s1 and c1 are the support and confidence values of an association rule r when treating each transaction ID as a market basket. Also, let s2 and c2 be the support and confidence values of r when treating each cus- tomer ID as a market basket. Discuss whether there are any relationships between s1 and s2 or c1 and c2.
6. Consider the market basket transactions shown in Table 5.21.
(a) What is the maximum number of association rules that can be extracted from this data (including rules that have zero support)?
(b) What is the maximum size of frequent itemsets that can be extracted (assuming minsup > 0)?
Table 5.21. Market basket transactions. Transaction ID Items Bought
1 {Milk, Beer, Diapers} 2 {Bread, Butter, Milk} 3 {Milk, Diapers, Cookies} 4 {Bread, Butter, Cookies} 5 {Beer, Cookies, Diapers} 6 {Milk, Diapers, Bread, Butter} 7 {Bread, Butter, Diapers} 8 {Beer, Diapers} 9 {Milk, Diapers, Bread, Butter} 10 {Beer, Cookies}
(c) Write an expression for the maximum number of size-3 itemsets that can be derived from this data set.
(d) Find an itemset (of size 2 or larger) that has the largest support.
(e) Find a pair of items, a and b, such that the rules {a} ?? {b} and {b} ?? {a} have the same confidence.
7. Show that if a candidate k-itemset X has a subset of size less than k ? 1 that is infrequent, then at least one of the (k ? 1)-size subsets of X is necessarily infrequent.
8. Consider the following set of frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2, 3, 5}, {3, 4, 5}.
Assume that there are only five items in the data set.
(a) List all candidate 4-itemsets obtained by a candidate generation proce- dure using the Fk?1 × F1 merging strategy.
(b) List all candidate 4-itemsets obtained by the candidate generation proce- dure in Apriori.
(c) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori algorithm.
9. The Apriori algorithm uses a generate-and-count strategy for deriving frequent itemsets. Candidate itemsets of size k + 1 are created by joining a pair of frequent itemsets of size k (this is known as the candidate generation step). A candidate is discarded if any one of its subsets is found to be infrequent during the candidate pruning step. Suppose the Apriori algorithm is applied to the data set shown in Table 5.22 with minsup = 30%, i.e., any itemset occurring in less than 3 transactions is considered to be infrequent.
Leave a Reply