## Association Rule Learning Pt 1

Association rule learning is a common technique for rule discovery.  It is used for predicting the occurrence of an event or item based on the occurrences of items or events.  The most common example is that for market analysis, literally, at the market.  If we want to look inside people’s shopping carts to determine what items are likely to be bought with others in the store, association rule learning does that based on looking at the occurrences of each item that happen together.  Each rule has the form of A -> B where A and B can contain one or more items.

In this case, I’ve decided to try it out in order to look for appliances and end-use devices that are generally linked.  Two parameters need to be specified, which can be arbitrary to set, but makes intuitive sense.  The support answers the question, “How common are A & B occurring together?”  The higher the support, the more opportunities there are to activate some sort of action on that itemset.  The confidence parameter gives an indication of how exclusive that relationship between A & B is, which would give information on predicting the consumption of one end use given the information about another’s usage.

## Definitions

Itemset is a collection of one or more items
Support is the fraction of transactions that contain both A and B
 S=\frac{A,B}{# transactions} [\latex]
Confidence is the fraction of transactions where A and B appear in transactions out of all those that contain A
 C=\frac{A,B}{A} [\latex]
Frequent itemset is an itemset whose support is greater or equal to a threshold
Candidate set is the set of itemsets that require testing

## Application

The dataset I am using for this is the Smart* dataset as produced by UMass Amherst. It contains power traces from 3 smart homes, but only Home A has individual meter data, labeled on/off events, voltage/frequency data as well. It turns out that there are 47 separately measured devices and circuits in the house, which we can treat as our available items. First, we notice that the data is very sparse and skewed towards a few end-uses that dominate the entire set. The frequency of baseload and non-baseload devices is pretty apparent (except the TV, which may not be accurate since most of the data occurred on one day of the months examined).

Expressed as a fraction of the highest occurring item, for May the top 5 are:

1. living room tv: 1.0
2. living room lamp: .157
3. living room lamp2: .155
4. living room dvd: .154

This is different for June, where the top 5 are:

1. living room tv: 1.0
2. basement dehumidifier: .230
3. master bedroom mac mini: .099
4. master bedroom ac: .032
5. basement freezer: .032

Given the large difference, as of now I decided to limit the algorithm to devices that appeared no greater than 1.5% of the entire set — effectively cutting out devices that are likely to be baseload than not. However, I can try other means of preprocessing the power readings that can address what it appears to be perhaps phantom loads or disparities between sensing intervals for each device.

Anyway, for the purposes of trying out an algorithm to see what we can find, an apriori search of all the candidate set revealed some interesting rules which confirm what I would suspect. Each month had rules extracted and ranked by order of descending confidence and support.  (Min Support = .02  Min Confidence = .3)

For May

•First 10 rules by support:
•Rule   (Support, Confidence)

livingroom:subwoofer -> livingroom:wii (7.8085%, 84.8601%)
livingroom:wii -> livingroom:subwoofer (7.8085%, 98.9614%)
livingroom:roku -> livingroom:subwoofer (7.7734%, 99.2526%)
livingroom:subwoofer -> livingroom:roku (7.7734%, 84.4784%)
livingroom:roku -> livingroom:wii (7.7382%, 98.8042%)
livingroom:wii -> livingroom:roku (7.7382%, 98.0712%)
livingroom:roku -> livingroom:subwoofer,livingroom:wii (7.6914%, 98.2063%)
livingroom:subwoofer -> livingroom:roku,livingroom:wii (7.6914%, 83.5878%)
livingroom:wii -> livingroom:roku,livingroom:subwoofer (7.6914%, 97.4777%)
livingroom:roku,livingroom:subwoofer -> livingroom:wii (7.6914%, 98.9458%)

•First 10 rules by confidence:
•Rule   (Support, Confidence)

bedroom:nitelight -> bedroom:noise (4.2028%, 99.7222%)
livingroom:roku,livingroom:wii -> livingroom:subwoofer (7.6914%, 99.3949%)
livingroom:roku -> livingroom:subwoofer (7.7734%, 99.2526%)
livingroom:wii -> livingroom:subwoofer (7.8085%, 98.9614%)
livingroom:roku,livingroom:subwoofer -> livingroom:wii (7.6914%, 98.9458%)
livingroom:roku -> livingroom:wii (7.7382%, 98.8042%)
bedroom:noise -> bedroom:nitelight (4.2028%, 98.6264%)
livingroom:subwoofer,livingroom:wii -> livingroom:roku (7.6914%, 98.5007%)
livingroom:roku -> livingroom:subwoofer,livingroom:wii (7.6914%, 98.2063%)
livingroom:wii -> livingroom:roku (7.7382%, 98.0712%)

For June

•First 10 rules by support:
•Rule   (Support, Confidence)
•master:desklamp -> master:nightstand2  (36.3531%, 97.5625%)
•master:nightstand2 -> master:desklamp  (36.3531%, 96.2096%)
•bedroom:nitelight -> bedroom:noise  (10.1188%, 97.4215%)
•bedroom:noise -> bedroom:nitelight  (10.1188%, 97.8604%)
•bedroom:lamp1 -> bedroom:noise  (9.1407%, 93.0095%)
•bedroom:noise -> bedroom:lamp1  (9.1407%, 88.4009%)
•bedroom:lamp1 -> bedroom:nitelight  (8.9544%, 91.1137%)
•bedroom:nitelight -> bedroom:lamp1  (8.9544%, 86.2108%)
•livingroom:subwoofer -> master:nightstand2  (5.7406%, 40.7775%)
•livingroom:subwoofer -> master:desklamp  (5.6707%, 40.2812%)
•First 10 rules by confidence:
•Rule   (Support, Confidence)
•livingroom:wii -> livingroom:subwoofer  (3.2371%, 99.2857%)
•basement:hrv,master:desklamp -> master:nightstand2  (4.285%, 98.9247%)
•bedroom:noise -> bedroom:nitelight  (10.1188%, 97.8604%)
•master:desklamp -> master:nightstand2  (36.3531%, 97.5625%)
•bedroom:nitelight -> bedroom:noise  (10.1188%, 97.4215%)
•master:nightstand2 -> master:desklamp  (36.3531%, 96.2096%)
•basement:hrv,master:nightstand2 -> master:desklamp  (4.285%, 95.8333%)
•bedroom:ac,bedroom:lamp1 -> bedroom:nitelight  (2.8412%, 94.5736%)
•bedroom:lamp1 -> bedroom:noise  (9.1407%, 93.0095%)
•bedroom:lamp1 -> bedroom:nitelight  (8.9544%, 91.1137%)

## Observations

Some devices have a seasonality (eg. AC, dehumidifier) and thus can become nearly baseload-like devices that probably have little sensitivity to the other activities in the home. Given the sparsity of the data, a low support threshold is necessary to have most of the devices considered. However, confidence appears fairly high for many of the rules found, although examining how the synchronization of the data affects this would help confirm this.

This method only captures the co-occurrence of items, so many very obvious connections are easily found, such as the subwoofer being connected to the other entertainment devices in the living room. At least for the first dozen or so rules, nothing appears out of the ordinary — sensors in the same room are linked together more so with sensors across different living spaces. The next step would be to adapt the algorithm or explore others that can look for sequential patterns as well.