Using rule discovery

Rule discovery analyses data from a data source, looks for patterns, and uses statistical techniques to deduce a set of rules. Combined with published specifications and knowledge from individuals, rule discovery can find rules that may otherwise be missed.

1Integrate considers the following types of relationship:

Rule discovery workspace

The Rule Discovery workspace has two tabs: General for metadata and Specification for defining the parameters and algorithms which control rule discovery.

In the Specification tab:

·      Algorithm - currently, only Spatial Boosting Algorithm is supported.

·      Within-distance tolerance is used to consider objects that are nearby, but do not interact. Any objects within this distance are analysed. If this parameter is not checked, the algorithm only looks for rules involving objects that spatially intersect each other.

·      Minimum support cutoff threshold (range 0 to 1) is the proportion of sample data for which spatially interacting objects were found, that might lead to a rule being discovered. Setting the cut-off threshold causes rule discovery to ignore any rules with a lower support than the specified value.

·      Lower cutoff threshold (range 0 to 1) is the probability threshold above which correlations between attributes on different objects are considered to be statistically significant. For example, if attributes on spatially related objects are found to match more than half the time, this would be included in the rule.

·      Minimum rule probability (range 0 to 1) is the minimum probability that the spatial component of a rule must have, as a proportion of the spatially interacting objects in the data sampled. For example, more than 80% of the objects that spatially interact must have a particular spatial relationship for a rule to be inferred about those objects.

·      Minimum probability improvement ratio (range greater than 1) relates to attribute clauses in a discovered rule. An attribute clause must be at least this much more likely to occur, for objects with the spatial relationship found in the rule, than it would by chance.

·      Maximum number of interacting objects The discovery algorithm performs a spatial search to find nearby objects and analyses how they interact with the main object it is considering. This parameter puts an upper limit on the number of nearby objects. If it is increased, the results may show a slight improvement but it will consume more memory. If this number is increased significantly, it is possible the application server memory will be exhausted. If this happens, restart the application server.

Using rule discovery results

In the Sessions interface, the results from Rule Discovery tasks are presented as a table of candidate rules and a confidence level that the rule is correct. Any of these rules can be promoted so that they can be used in conformance checks and rule-based transformations.