Skip to content

Latest commit

 

History

History
704 lines (496 loc) · 46.1 KB

HR_Background.md

File metadata and controls

704 lines (496 loc) · 46.1 KB

HR Background

The foundation of this system is HR, an automated theorem discovery program written by Simon Colton.

This document describes HR and how it operates. This content has been revived from its former documentation, still accessible in limited form via the Internet Archive's Wayback Machine:

https://web.archive.org/web/20071009225148/http://www.dai.ed.ac.uk/homes/simonco/research/hr/

If the information here does not suffice, an additional paper can be accessed at:

http://ccg.doc.gold.ac.uk/wp-content/uploads/2016/07/colton_cade02.pdf

How HR Builds Concepts

HR deals with three aspects of a mathematical concept:

  • Models: ie. examples of a concept, for instance, the models of prime numbers are 2,3,5,7,11, etc.
  • Definitions. A concept may have more than one definition. For example, primes are often (incorrectly) defined as those integers divisible by only 1 and themselves - which means that 1 is a prime number. A better definition is: those integers with exactly 2 divisors.
  • Properties of concepts. These are measurable qualities of a concept which give more information than the definition and models alone. For example a property of prime numbers is that there are 25 primes less than 100.

Representation

In HR, two things are used to represent the concept, the models and the construction history (a series of instructions detailing how the concept was built using previous concepts). HR is not tied to a single logical formalism for the definitions of the concepts it produces. Definitions are generated as and when they are needed, using the construction history, and are given in whichever format is required for the task at hand. For example, when the definition is required in a proof by the Otter theorem prover (see later), it is generated in a first order logic style acceptable to Otter. Similarly, when the user wants to read the definition, it is output in LaTeX and marked up to make it more readable. This approach makes it easier for HR to interact with theorem provers, model generators, computer algebra packages and so on.

The models in HR are stored as complete data-tables. By complete, we mean that there is no missing data. An initial set of concepts (which can be given by the user, or generated by HR and MACE, if the user supplies the axioms of the finite algebra of interest), are given to HR in the form of data-tables. HR needs to know how to de-compose each entity into elements and it needs some relations between the elements. For example, amongst others, integers can be broken into finite sets of:

  1. Divisors
  2. Digits
  3. Positive integers less than them

Then the multiplication relation can be given for pairs of divisors of an integer, and the addition relation can be given for pairs of integers less than the integer.

Note that presenting addition and multiplication in this way ensures that the tables are complete.

When working in a finite algebra, both the decompositions and relations are determined by the axioms. In the following table, I give some algebras and their decompositions and relations.

Algebra Decompositions Relations
Group Elements, Identities Multiplication (*), Inverses
Ring Elements, Additive Identities Addition, Multiplication, Additive Inverse
IP-loop Elements, Identities Multiplication, IP

When working in graph theory, the decomposition is into nodes, and the relation is given by the edges. We are currently implementing ways by which HR can use more complicated decompositions, for example graphs into paths of nodes.

Example Constructions

HR builds new concepts using one of a set of production rules which take data-tables of old concepts and output new data-tables. For example, below we are given Table 0, with the division of integers up to 4, namely rows [a,b,c] where a=b*c. The "match" production rule then takes table 0 and produces table 1 by finding all rows in table 0 where the 2nd and 3rd columns are the same.

Each production rule is supplied with parameters to describe exactly what to do. In this case, the parameters [1,2,2] should be read: "find rows where column 1 is the same as column 1, column 2 is the same as column 2 and column 3 is the same as column 2". As column 2 and 3 are the same, the new data-table has only one column here. After the match production rule has produced table 1, the new table is input to the "exists" production rule. The exists production rule simply removes columns from the input table, and the parameters tell it which column(s) to keep. In this case, the parameters [1] tell it to keep just the first column. The outcome of these two production rules is to take the given table of divisions of the integers and to re-invent the square numbers.

[Missing image!]

In the second example, below, we see the size and split production rules in action. The size production rule counts the number of times an entry is seen in a column. For example, if we look down column one of Table 0 below, we see that the number 1 appears once, 2 and 3 appear twice and the number 4 appears three times. The parameters [1] for the size production rule tell it to count the number of occurrences of entries in column 1. The split production rule, with parameters 2=2, goes through table 1 and finds rows where column two equals the numerical value 2. The outcome of these two production rules is to take the given table of divisions of the integers and to re-invent the prime numbers.

[Missing image!]

Note that the construction history of the tables is recorded as a set of triples (old_concept, production_rule, parameters), which have to be performed to reconstruct the data-table. For example, the construction history for table 4 above is:

(0,size,[1]) (3,split,[2=2]) HR is also given a definition for table 0, which we write in the form:

  • [a,b,c] : a=b*c.

From this definition, and because it knows that the match production rule was used to build table 1, it can generate a definition for the pairs of integers in table 1 as this:

  • [a,b] : a=b*b.

From this definition, and because it knows that the exists production rule was used to build table 2, it can generate a definition for the integers in table 2 as this:

  • [a] : exists b in N s.t. b*b=a.

Similarly, HR can generate a definition for the pairs in table 3:

  • [a,n] : n=|{(b,c) in N2 : b*c = a}|

and from definition 3, a (slightly non-standard) definition for prime numbers can be generated:

  • [a] : 2=|{(b,c) in N2 : b*c = a}

The Production Rules

There are at present 10 production rules, but these get added to from time to time. Each production rule manipulates data-tables in a fairly simple, and non-domain-specific way. An informal description of each one is given here (in increasing difficulty).

  1. Exists This removes columns.
  2. Match This finds rows where certain columns are identical.
  3. Size This counts the number of occurrences of each entry in a column.
  4. Split This finds rows where certain columns have certain (usually numerical) values.
  5. Negate This produces the compliment of a data-table, ie. those rows which could possibly be in the table, but which aren't there.
  6. Conjunct This takes two old data-tables and finds rows in the first which match with rows in the second.
  7. Forall This finds sets of rows where all possible entries are present (eg. all the divisors of an integer).
  8. Common This finds entities (eg. integers) in a data-table which have rows in common (eg. integers which share a prime divisor).
  9. Compose This takes two old data-tables which can used to define functions, and computes the data-table representing their composition.
  10. Fold This constructs the orbit of entities under a function (ie. it repeatedly takes the output of a function and puts it back in as input).

How HR Assesses New Concepts

Even with just ten production rules, the number of possible concepts that HR can find is huge, and a best first search must be used if HR is to make any serious headway into the search space. The heuristic used is very simple:

  • Identify the best concept, in terms of what the user is looking for.
  • Identify the production rule which is responsible for outputting, on average, the most interesting concepts.
  • If possible, input the most interesting concept into the most interesting production rule and produce a new concept.

Of course, it is necessary to determine what makes a concept interesting, and how to measure this at the time the concept is introduced (or perhaps, later on). HR has three general ways to assess concepts, based on how they classify objects, how comprehensible they are and the quality of the provable statements which involve them. Each of the following measures is normalised to give a value between 0 and 1, where 0 is the worst possible, and 1 is the best possible score.

Classification Measures

[Missing image!]

The three tables above can be used to describe the integers 1 to 4. For example, table 0 describes 1 as: an integer which divides as {(1,1)}. Similarly, table 0 describes 2 as: an integer which divides as {(1,2),(2,1)}, it describes 3 as: an integer which divides as {(1,3),(3,1)} and integer 4 as: an integer which divides as {(1,4),(2,2),(4,1)}. Table 3 gives more succinct descriptions: 1 is an integer with 1 divisor, 2 and 3 are integers with 2 divisors and 4 is an integer with 3 divisors. Finally, table 4 gives the most succinct description: 1 and 4 are not primes, 2 and 3 are primes. The first measure which a user might be interested in is the PARSIMONY of a concept, which is inversely proportional to the size of the data-table of the concept. There are 3*8=24 entries in table 0, so this scores 1/24 for parsimony. There are 8 entries in table 1, so this scores 1/8 for parsimony, and obviously, table 4 scores 1/2 for parsimony. These scores give a rough indication of how succinct the descriptions given by the concepts will be.

Note that table 3 above describes integers 2 and 3 in the same way (they both have 2 divisors). The user may prefer concepts which describe all the integers differently, such as table 0. However, in other domains, such as group theory, we usually want descriptions of isomorphic objects to be the same. Therefore, the user can give HR a Gold Standard categorisation that they are interested in, and HR can measure how close the categorisation given by a particular concept is to the gold standard, and use this to measure the concept itself. Table 0 above categorises all the integers differently, so we write [1][2][3][4] for the categorisation it gives. Table 3 gives this categorisation: [1][2,3][4], and table 4 gives this categorisation: [1,4][2,3]. Suppose that the user is interested in the following gold standard categorisation: [1][2,3,4].

We can first measure the INVARIANCE of a concept, by measuring the proportion of pairs of integers which should be categorised together which are categorised correctly by the concept. Looking at the gold standard categorisation, we see that pairs (2,3),(2,4) and (3,4) should be categorised as the same.

  • Table 0 categorises (2,3), (2,4) and (3,4) as different, so scores 0/3=0 for invariance.
  • Table 3 categorises (2,3) as the same, but (2,4) and (3,4) differently, so scores 1/3 for invariance.
  • Table 4 categorises (2,3) as the same, but (2,4) and (3,4) differently, so this also scores 1/3 for invariance.

We can next measure the DISCRIMINATION of a concept with a similar calculation using the pairs of integers which should are classed as different. Looking at the gold standard categorisation, we see that pairs (1,2),(1,3) and (1,4) should be categorised as different.

  • Table 0 categorises (1,2), (1,3) and (1,4) as different, so scores 3/3=1 for discrimination.
  • Table 3 categorises (1,2), (1,3) and (1,4) as different, so scores 3/3=1 for discrimination.
  • Table 4 categorises (1,4) as the same, but (1,2) and (1,3) differently, so this scores 2/3 for discrimination.

If the user is not interested in a particular categorisation of the integers, they may ask HR to find as many possible categorisations, which means a wider coverage of the search space. To do this, HR should choose to build on the concepts with the least common categorisation, and HR uses the NOVELTY measure to assess how often a categorisation has been seen. The following table shows how many times the categorisations given by the concepts have been seen:

Concept Categorisation Occurrences
0 [1][2][3][4] 1
1 [1][2,3][4] 2
2 [1,4][2,3] 2
3 [1][2,3][4] 2
4 [1,4][2,3] 2

Therefore, we see that the most novel concept is number 1, because the categorisation given by it occurs nowhere else in the theory. Usually, there is a greater spread of number of occurrences, but in this case, concept 1 scores 1 for novelty, and the other concepts score 0.

Comprehensibility Measures

In the same way that it is desirable to produce parsimonious descriptions, we also want to produce simple, easy to understand concept definitions. Every time a production rule is used, in general, the definition becomes a little more complicated, and a little less comprehensible. In general, the more old concepts the user has to understand to be able to understand the definition of the present concept, the less comprehensible the definition will be. Therefore, HR measures the COMPREHENSIBILITY of a concept as inversely proportional to the number of old concepts in its construction path. The construction history of concept 4 above was this:

(0,size,[1]) (3,split,[2=2]) This means that to understand concept 4, it is necessary to understand 3 concepts, namely concepts 0 and 3 and concept 4 itself. Therefore concept 4 scores 1/3 for comprehensibility.

If we look at two definitions that HR produces with 5 or less concepts in the construction path, we see that they are fairly understandable:

(a) [I,N] s.t. N = |{(a) : a | I}|

(b) [I,a] s.t. a | I & a = |{(b) : b | I}|

However, if we look at two definitions with 10 concepts in the construction path, we see that they are fairly difficult to grasp:

(c) [I,a] s.t. -(a | I & a = |{(b) : b | I}| & (exists c d (-(I=c*d & I | d & c = |{(e) : e | I}|))))

(d) [I,N] s.t. N = |{(M) : M = |{(a b) : -(I=a*b & I | b & a = |{(c) : c | I}|)}| & (exists d e (-(M=d*e & M|e & d = |{(f) : f | M}|)))}|

HR can provide graphs to show the construction history of a concept, from which the user can see how complicated the concept is. For example, here is the graph of the construction history of concept (c) above:

[Missing image!]

Note that there are 10 boxes, 9 old concepts and the new concept. In practice, we make the search that HR performs depth limited based on this comprehensibility measure. We usually restrict the number of concepts in a new concept's construction path to be at most 7 or 8.

Provable Statements

Prime numbers are interesting because there are many interesting statements you can make about them. For example,

  • There are an infinite number of primes.
  • Every integer > 1 has a unique factorisation into primes.
  • The number of primes less than x tends to x/ln(x).

There are also some unanswered questions about primes:

  • Are there an infinite number of prime pairs?
  • Given prime p what is the next prime? (without calculating and checking).

As discussed ahead, HR can make and sometimes prove or disprove conjectures about the concepts it has invented. Also, as detailed ahead, HR can estimate the quality of the conjectures. Given a set of conjectures involving a concept, each with an associated measure of quality, a measure of the QUALITY OF CONJECTURES of a concept can be calculated by simply taking an average of the scores for the conjectures.

How HR Makes Conjectures

In order to be able to better assess the concepts HR has invented, it can make some conjectures about them, based on the empirical evidence supplied by the models. As the data-tables are complete, the conjectures are true for all the entities in the theory.

Subsumption Conjectures

If HR produces a new data-table, it is asked to find previous concepts where either the old data-table contains the new data-table, or the old data-table is contained in the new data-table. For example, suppose HR produced this data-table:

[Missing image!]

with definition [N] : 2|N & exists a (a*a=N), ie. even square numbers. Also suppose that HR had previously produced this data-table:

[Missing image!]

with definition: [N] : 4|N, ie. multiples of 4. Then HR would spot that all the rows of its new data-table were rows of the previous data table, and could make the conjecture:

2|N & exists a (a*a=N) -> 4|N.

ie. HR can make the conjecture that all even square numbers are multiples of 4. Other subsumption conjectures include, in group theory, that the centre of a group contains the identity element. Because of the nature of the concept formation, lots of concepts are specialisations of previous ones, so there are many subsumption conjectures that HR comes across. To cut down on the number of trivially true subsumption conjectures, HR first estimates the surprisingness of the conjecture, (see below), and discards any where the conjecture is likely to be unsurprising.

Equivalence Conjectures

In the same way that HR can spot when one data-table is contained in another, it can also spot that two data-tables are exactly the same. In this case, it can make the conjecture that the definitions for the two concepts are equivalent. For example, suppose HR produced the following data table:

[Missing image!]

and did so using the fold production rule, giving them a definition "powers of two" (or something similar). Then also suppose that HR produces another data-table later which is exactly the same, but this time had the definition: "integers with no odd divisors except 1", then HR could make the conjecture that an integer has no odd divisors except 1 if and only if it is a power of two. This example is contrived, but some better examples occur in group theory. For example, HR makes conjectures like this in group theory:

for all a,b,c, a*b=c & a*c=b & b=id <-> a*b=c & c*a=b & a*a=b

which is neither trivial nor obvious.

Applicability Conjectures

Imagine stating Goldbach's conjecture as an equivalence conjecture. One way to do this would be to say that

n is an even number greater than 4 <-> n is the sum of two odd primes

To do so automatically, HR would have to invent the concept of even numbers greater than 4, which is unlikely, as it is so specialised. If we gave HR the concept of numbers greater than 4, we would stand a better chance of HR spotting this conjecture, but that would be cheating. A much more natural way for HR to spot this conjecture is to:

  1. invent the concept of even numbers which are the sum of two odd primes.
  2. from this, build the concept of even numbers which are not the sum of two odd primes.
  3. spot that the data-table for this concept has very few rows (in fact, only 2).
  4. make the conjecture that the only even integers which are not the sum of two odd primes are 2 and 4.

This is an example of an applicability conjecture, where HR spots that the number of entities (integers, groups, graphs, etc.) satisfying a concept's conditions is small, and makes the conjecture that the only entities whatsoever which satisfy the conditions are those it has in its table.

Two nice examples of applicability conjectures that HR makes are:

  • The only integers for which tau(n)=sqrt(n) are 1 and 9
  • The only integers for which tau(n)=n are 1 and 2

[where tau(n)=number of divisors of n].

If a production rule step produces a data-table with no rows whatsoever, HR can make a non-existence conjecture which states that there are no models whatsoever which satisfy the definition of the concept. For example, if HR attempted to use the conjunct production rule with tables 2 and 4 above, it would produce an empty table. In this case, it would make the conjecture that there are no integers which are both prime and square.

How HR Settles Conjectures

Where possible, HR attempts to prove or disprove the conjectures it makes. It relies on the Otter theorem prover to prove theorems, and the MACE model finder to disprove conjectures by finding a counterexample. Otter is a state of the art first order resolution theorem prover written by William McCune, and MACE is its sister program which can find small models for first order statements. See further below for web pages for Otter and MACE. Otter and MACE do not deal well with inductively defined concepts, so certain conjectures involving numerical concepts cannot be settled by Otter or MACE. However, there are still many conjectures that can be settled. To begin with, HR asks Otter to prove all the conjectures it makes. Later on, when Otter has proved some core results (prime implicants), HR can take over, and use these to prove more complicated conjectures. Sometimes, if both Otter and HR fail to prove a conjecture, it may turn out to be false, in which case, it is passed to MACE, which will look for a counterexample. Alternatively, HR could attempt to generate a counterexample itself, and test it against the conditions of the conjecture.

Proofs Supplied by Otter

When the concepts are non-numerical, HR is usually able to generate Otter-style definitions for the concepts discussed in a conjecture and pass the negation of the conjecture to Otter along with the axioms. Note that the negation of the conjecture is given because Otter is a resolution theorem prover, which derives a contradiction. A typical equivalence conjecture in groups theory as input to Otter, supplied by HR, looks like this:

set(auto).
assign(max_seconds,36000).
assign(max_mem, 1000000).
formula_list(usable).
all ax1 (ax1=ax1).
all ax1 ax2 ax3 (ax1 * (ax2 * ax3) = (ax1 * ax2) * ax3).
all ax1 (ax1*id=ax1 & id*ax1=ax1).
all ax1 (inv(ax1)*ax1=id & ax1*inv(ax1)=id).
-(all a b c ( ( a*b=c & a*c=b & b*a=c ) <-> ( a*b=c & a*c=b & c*a=b ) ) ).
end_of_list.

However, to give Otter the best chance of proving this, HR splits the conjecture into non-trivial implies conjectures. There are just two things to be proved for this conjecture:

-(all a b c ( ( a*b=c & a*c=b & b*a=c -> ( c*a=b ) ) ).

and

-(all a b c ( ( a*b=c & a*c=b & c*a=b -> ( b*a=c ) ) ).

More than this, HR attempts first to extract any prime implicants which are more general than the implied conjectures. It tries these more general conjectures:

a*b=c -> c*a=b
a*c=b -> c*a=b
b*a=c -> c*a=b
a*b=c & a*c=b -> c*a=b
a*b=c & b*a=c -> c*a=b
a*c=b & b*a=c -> c*a=b
------
a*b=c -> b*a=c
a*c=b -> b*a=c
c*a=b -> b*a=c
a*b=c & a*c=b -> b*a=c
a*b=c & c*a=b -> b*a=c
a*c=b & c*a=b -> b*a=c

None of which turn out to be true. Had any been true, HR would have stored them as "prime implicants", ie. those conjecture where no subset of the premises implies the goal. Of course, if any prime implicant is true, then the whole implies conjecture is true. Extracting and proving prime implicants is time wasting to begin with, but HR never attempts to prove a conjecture it has tried before, and the payoff comes later when HR itself can use the prime implicants to prove conjectures. If Otter does prove any implies conjectures or prime implicants, then HR reads Otter's output and stores the results.

Proofs Supplied by HR

If we take this conjecture as an example:

all a b c (( a*b=c & a*c=b & b*a=c ) <-> ( a*b=c & a*c=b & c*a=b )).

and suppose that, in the course of proving some other conjecture, HR has extracted and Otter has proved these prime implicants:

(i) a*b=c & a*c=b -> a=inv(a)
(ii) b*a=c & a=inv(a) -> c*a=b
(iii) c*a=b & a=inv(a) -> b*a=c

In this case, HR doesn't need to call Otter, because it has enough information to prove the conjecture itself. When trying to prove the first implies conjecture:

all a b c (( a*b=c & a*c=b & b*a=c -> ( c*a=b )).

HR knows that two of the premises, namely a*b=c and a*c=b imply that a=inv(a) [using prime implicant (i) above]. This new fact can be added to the premises, so HR now needs to prove this conjecture:

all a b c (( a*b=c & a*c=b & b*a=c & a=inv(a) -> ( c*a=b )).

Now, using prime implicant (ii) above, HR spots that two of the premises, namely b*a=c and a=inv(a) imply the goal, c*a=b. Hence the implies conjecture is proved. Using prime implicants (i) and (iii) above, HR can also prove the second implies conjecture, and the whole equivalence conjecture is settled. HR's output for this conjecture looks like this:

all a b c ((a*b=c & a*c=b & b*a=c) <-> (a*b=c & a*c=b & c*a=b)).

all a b c ( a*b=c & a*c=b & b*a=c -> c*a=b ).
---------------------------------------------
all a b c ( a*b=c & a*c=b -> a=inv(a) ).
all a b c ( b*a=c & a=inv(a) -> c*a=b ).

all a b c ( a*b=c & a*c=b & c*a=b -> b*a=c ).
---------------------------------------------
all a b c ( a*b=c & a*c=b -> a=inv(a) ).
all a b c ( c*a=b & a=inv(a) -> b*a=c ).

It is clearly an advantage to have two ways for HR to prove conjectures. The particular advantages in allowing HR to prove conjectures itself are two-fold:

  • The proofs output are human readable.
  • There is a chance later, when more prime implicants have been found, to return to the conjecture and prove it.

HR can often produce very complicated proofs, for example, while working in ring theory recently, HR proved this conjecture:

all a b c ((a+b=c & c=idp & a*c=b) <-> (a+b=c & a+c=b & b+b=a & b+a=b & c+b=b)).

This took a long proof, with 14 subgoals proved along the way.

all a b c ( a+c=b & b+b=a & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).

all a b c ( a+a=b & a+b=a & c+a=a -> c=idp ).
---------------------------------------------
all a b c ( c+a=a -> a+c=a ).
all a b c ( a+c=a -> c=idp ).


all a b c ( a+b=a & c+a=a -> c=idp ).
---------------------------------------------
all a b c ( c+a=a -> a+c=a ).
all a b c ( a+c=a -> c=idp ).


all a b c ( a+c=b & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).


all a b c ( a+b=c & b+b=a & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).


all a b c ( a+b=c & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).


all a b c ( a+b=c & a+c=b & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).


all a b c ( a+b=c & a+c=b & b+b=a & b+a=b & c+b=b -> c=idp ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).


all a b c ( a+c=b & b+b=a & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( a+c=b & b+b=a & a+c=a -> a*c=b ).


all a b c ( a+c=b & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( a+c=a -> c+a=a ).
all a b c ( b+a=b -> a=idp ).
all a b c ( a=idp -> c+a=c ).
all a b c ( c+a=c & c+a=a -> a*b=c ).
all a b c ( a+c=b & a=idp -> a+b=c ).
all a b c ( a+b=c & c=idp -> b=invp(a) ).
all a b c ( a+b=c & a+c=b & b=invp(a) -> a+a=c ).
all a b c ( a+c=b -> c+a=b ).
all a b c ( c+a=b & c=idp -> c+b=a ).
all a b c ( c+b=a & a+a=c & c+a=c -> b=idp ).
all a b c ( b=idp & a*b=c -> a+b=a ).
all a b c ( c+b=a & a=idp -> b=invp(c) ).
all a b c ( c+b=a & c+a=b & b=invp(c) -> c+c=a ).
all a b c ( a+b=c & a+c=b & c+c=a -> b+b=a ).
all a b c ( a+c=b & b+b=a & a+b=a -> a*c=b ).


all a b c ( a+b=c & b+b=a & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( b+a=b -> a=idp ).
all a b c ( a+b=c & a=idp -> a+c=b ).
all a b c ( a+c=b & b+b=a & a+c=a -> a*c=b ).


all a b c ( a+b=c & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( a+c=a -> c+a=a ).
all a b c ( b+a=b -> a=idp ).
all a b c ( a=idp -> c+a=c ).
all a b c ( c+a=c & c+a=a -> a*b=c ).
all a b c ( a+b=c & c=idp -> b=invp(a) ).
all a b c ( a+b=c & a=idp -> a+c=b ).
all a b c ( a+b=c & a+c=b & b=invp(a) -> a+a=c ).
all a b c ( a+c=b -> c+a=b ).
all a b c ( c+a=b & c=idp -> c+b=a ).
all a b c ( c+b=a & a+a=c & c+a=c -> b=idp ).
all a b c ( b=idp & a*b=c -> a+b=a ).
all a b c ( c+b=a & a=idp -> b=invp(c) ).
all a b c ( c+b=a & c+a=b & b=invp(c) -> c+c=a ).
all a b c ( a+b=c & a+c=b & c+c=a -> b+b=a ).
all a b c ( a+c=b & b+b=a & a+b=a -> a*c=b ).


all a b c ( a+b=c & a+c=b & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( a+c=a -> c+a=a ).
all a b c ( b+a=b -> a=idp ).
all a b c ( a=idp -> c+a=c ).
all a b c ( c+a=c & c+a=a -> a*b=c ).
all a b c ( a+b=c & c=idp -> b=invp(a) ).
all a b c ( a+b=c & a+c=b & b=invp(a) -> a+a=c ).
all a b c ( a+c=b -> c+a=b ).
all a b c ( c+a=b & c=idp -> c+b=a ).
all a b c ( c+b=a & a+a=c & c+a=c -> b=idp ).
all a b c ( b=idp & a*b=c -> a+b=a ).
all a b c ( c+b=a & a=idp -> b=invp(c) ).
all a b c ( c+b=a & c+a=b & b=invp(c) -> c+c=a ).
all a b c ( a+b=c & a+c=b & c+c=a -> b+b=a ).
all a b c ( a+c=b & b+b=a & a+b=a -> a*c=b ).


all a b c ( a+b=c & a+c=b & b+b=a & b+a=b & c+b=b -> a*c=b ).
---------------------------------------------
all a b c ( c+b=b -> b+c=b ).
all a b c ( b+c=b -> c=idp ).
all a b c ( c=idp -> a+c=a ).
all a b c ( a+c=b & b+b=a & a+c=a -> a*c=b ).

Counterexamples Supplied by MACE

HR can form theories using just the axioms of a finite algebra. From the axioms, HR asks MACE to produce just one model of the algebra. HR first asks for a model of order 1, but if one is not found HR asks for a model of order 2, etc. Usually, the first model found is the trivial one, for example the trivial group, with only one element. Using only the trivial group to supply empirical evidence for conjectures, the conjectures are often wrong. When Otter fails to prove a conjecture, it is passed to MACE, which attempts to find a counterexample. Again, a counterexample of size 1 is sought first, then one of size 2 and so on.

If a counterexample is found, then:

  1. The conjecture is discarded, as it is not a theorem.
  2. HR checks whether the new model is a counterexample to any other previously unsettled conjectures.
  3. For every equivalence conjecture that is disproved, a new concept is entered into the theory (as it turns out that it was not equivalent to a previous concept).
  4. The new model is entered into the database, and all previous data-tables are recalculated to include the extra data.

I've extracted below some false group theory conjectures that HR made in a recent session, and given the counterexamples that MACE finds for them. The extracts are presented in the order in which HR found them:

2. all a b c ((a*b=c) <-> (a*b=c & a*c=b)).       %%% ORIGINAL CONJECTURE %%%
all a b c ( a*b=c -> a*c=b ). sos     %%% FALSE SUBGOAL %%%

Mace generating counterexample of order: 1 2 3 e02   %%% CALLING MACE %%%
Checking whether conjectures are disproved: 2:yes(New Concept = 21), 
(21) [G,a,b,c] s.t. a*b=c & a*c=b  %%% NEW CONCEPT ADDED %%%

* | 0 | 1 | 2 | 
--+---+---+---+
0 | 0 | 1 | 2 |      %%% MODEL FOUND BY MACE %%%
--+---+---+---+
1 | 1 | 2 | 0 | 
--+---+---+---+
2 | 2 | 0 | 1 | 
--+---+---+---+
.
.
.

4. all a ((a=id) <-> ((exists b c (a*b=c & a*c=b)))).
all a ( a=id -> (exists b c (a*b=c & a*c=b)) ). max_proofs 3
all a ( (exists b c (a*b=c & a*c=b)) -> a=id ). sos

Mace generating counterexample of order: 1 2 e03 
Checking whether conjectures are disproved: 4:yes(New Concept = 22),
(22) [G,a] s.t. (exists b c (a*b=c & a*c=b)) " ahlfors element 1 "

* | 0 | 1 | 
--+---+---+
0 | 0 | 1 | 
--+---+---+
1 | 1 | 0 | 
--+---+---+
.
.
.

15. all a b c ((a*b=c) <-> (a*b=c & b*a=c)).
all a b c ( a*b=c -> b*a=c ). sos

Mace generating counterexample of order: 1 2 3 4 5 6 e04 
Checking whether conjectures are disproved: 13:no,14:no,15:yes(New Concept = 34),
(34) [G,a,b,c] s.t. a*b=c & b*a=c

* | 0 | 1 | 2 | 3 | 4 | 5 | 
--+---+---+---+---+---+---+
0 | 0 | 1 | 2 | 3 | 4 | 5 | 
--+---+---+---+---+---+---+
1 | 1 | 0 | 3 | 2 | 5 | 4 | 
--+---+---+---+---+---+---+
2 | 2 | 4 | 0 | 5 | 1 | 3 | 
--+---+---+---+---+---+---+
3 | 3 | 5 | 1 | 4 | 0 | 2 | 
--+---+---+---+---+---+---+
4 | 4 | 2 | 5 | 0 | 3 | 1 | 
--+---+---+---+---+---+---+
5 | 5 | 3 | 4 | 1 | 2 | 0 | 
--+---+---+---+---+---+---+
.
.
.

27. all a b c ((a*b=c & c=inv(b) & -(a=id)) <-> (a*b=c & c=inv(b) & -(a*c=b))).
all a b c ( a*b=c & -(a*c=b) -> -(a=id) ). max_proofs 4
all a b c ( a*b=c & c=inv(b) & -(a*c=b) -> -(a=id) ).
---------------------------------------------
all a b c ( a*b=c & -(a*c=b) -> -(a=id) ).
all a b c ( a*b=c & c=inv(b) & -(a=id) -> -(a*c=b) ). sos

Mace generating counterexample of order: 1 2 3 4 e05 
Checking whether conjectures are disproved: 13:no,14:no,27:yes(New Concept = 57),
(57) [G,a,b,c] s.t. a*b=c & c=inv(b) & -(a*c=b)

* | 0 | 1 | 2 | 3 | 
--+---+---+---+---+
0 | 0 | 1 | 2 | 3 | 
--+---+---+---+---+
1 | 1 | 0 | 3 | 2 | 
--+---+---+---+---+
2 | 2 | 3 | 1 | 0 | 
--+---+---+---+---+
3 | 3 | 2 | 0 | 1 | 
--+---+---+---+---+

Counterexamples Supplied by HR

As well as being able to write definitions for concepts in Otter-style first order logic, and in LaTeX scripts, HR can also generate a Prolog definition for concepts. For example, in another recent run, HR re-invented prime numbers as concept 21. I then asked for the prolog program which tests for primality, and tried it out a few times.

| ?- concept(21)::prolog_program.

predicate(1,[I,A,B]) :-
      I is A * B, !.

predicate(3,[I,A]) :-
      length(L,I), nth(A,L,_), 0 is I mod A.

predicate(18,[I,N]) :-
      findall([A,B],(predicate(3,[I,A]),
                     predicate(3,[I,B]),
		     predicate(1,[I,A,B])),
	      TuplesA),
      remove_duplicates(TuplesA,Tuples),
      length(Tuples,N), N > 0, !.

predicate(21,[I]) :-
      predicate(18,[I,2]), !.

yes
| ?- predicate(21,[17]).

yes
| ?- predicate(21,[10]).

no
| ?- predicate(21,[1193]).

yes

Note here that predicate 1 checks whether two integers (A and B) multiply to give the entity (I). Predicate 3 can actually generate all the divisors of an input integer, I. Predicate 18 is the tau function, which calculates the number of divisors of an integer, and predicate 21 only says yes if the tau function returns 2. Predicates 1 and 3 were provided by the user, and are essential if HR is going to be able to build further prolog definitions.

The original motivation for allowing HR to generate and read its own prolog code is so that we can compare HR to machine learning programs such as Progol, which uses inductive logic programming to learn given concepts. We can claim that, as HR has written its own prolog code for a primality test, it has learned the concept of prime numbers. Another application of this functionality is that HR can generate models itself, and, using the prolog predicates it has generated, check whether the models are counterexamples to a conjecture. The user has to supply prolog code which generates entities. For certain domains, such as number theory and quasigroup theory, this is a simple matter, and there are plenty of examples which are easy to generate. However, for more complicated domains, such as group theory, it is more difficult to generate examples, and there are fewer of them.

The advantage to allowing HR to generate its own counterexamples is that it can do so when MACE cannot. Otter and MACE have very limited abilities in number theory, so HR can be used to generate counterexamples to numerical conjectures. For example, when given only the integers up to 10, HR makes the conjecture that there are no integers, n for which tau(n)=4 and 4 divides n. The first counterexample that HR found was the number 12. Another example comes when HR is given the integers up to 30. HR makes the conjecture that all integers, n for which tau(n) has 3 divisors, are such that n has 4 divisors. This is true for the integers up to 30, as none of them have 9 divisors. However, HR found the first counterexample, namely the number 36. Whenever HR finds a counterexample itself, it goes through the same routine as when MACE finds them, ie. it checks previous conjectures, adds any new concepts, discards any non-theorems and add the new model to the data-base, updating each concept as it does so.

Theory Formation (Putting It All Together)

Given just the axioms of a finite algebra, HR can construct a theory containing

  • Concepts with models and definitions
  • Theorems and proofs about the concepts
  • Models for the theory (eg. groups)
  • Open conjectures

To be able to do this, certain restrictions have to be imposed. Firstly, as Otter and MACE are not suited for theories with numerical content, theory formation is restricted mainly to finite algebras, and we do not use the size and split production rules, which introduce numerical concepts. Secondly, we cannot yet use the fold production rule as the concepts are not easily represented in first order logic, and we often do not use the compose production rule, as this adds a layer of complexity that Otter finds difficult to deal with in a limited time scale. Finally, we usually restrict the depth of the search by limiting the number of old concepts in a new concept's construction path. Usually, we set the limit at around 6 or 7, so that the concepts are fairly comprehensible for Otter and MACE.

Given the abilities discussed above for HR to (i) form new concepts, (ii) spot conjectures (iii) prove theorems and (iv) disprove theorems, we have chosen the following cycle of mathematical activity to form theories in finite algebras.

[Missing image!]

Finite Algebras Looked at by HR

An interesting way to use HR is to define new algebras, sometimes with unusual (or made up) axioms, and to see what the theories it produces are like. So far, we have tried HR with these well known finite algebras:

Algebra Axioms for * Axioms for + Other Axioms
Abelian Group Associativity, Commutativity, Identity, Inverse
Group Associativity, Identity, Inverse
Quasigroup Left-quasigroup, Right-quasigroup
Ring Associativity, Identity, Inverse, Commutativity Distributivity(+ over *).
Robbins Algebra Commutativity, Associativity, Robbins
Semigroup Associativity

Noting these axioms,

  • IP(*): [Given a unary operator, ip], then for all a,b, (ip(a*(a*b)) = b & (a*b)*(ip(a))=b)
  • Medial: for all a,b,c,d, ((a*b)*(c*d) = (a*c)*(b*d))
  • Nilpotent: for all a, (a*a=a)
  • TS: for all a,b, (a*(a*b)=b)

we've also tried HR with these lesser-known finite algebras:

Algebra Axioms for * Axioms for + Other Axioms
Galois Field Associativity, Commutativity, Identity, Inverse Associativity, Commutativity, Identity, Inverse Distributivity(+ over *)
IP-loop Left-quasigroup, Right-quasigroup, Identity IP(*)
Loop Left-quasigroup, Right-quasigroup, Identity
Medial-quasigroup Left-quasigroup, Right-quasigroup, Medial
Nilpotent Quasigroup
Left-quasigroup, Right-quasigroup, Nilpotency
TS-quasigroup Left-quasigroup, Right-quasigroup, Commutativity, TS

Finally, using these axioms that I've made up:

  • Co-Commutativity: for all a,b, (a*b=b+a)
  • Non-Associativity: for all a,b,c, (a*(b*c) =/= (a*b)*c)

HR has explored the following theories:

Algebra Axioms for * Axioms for + Other Axioms
Co-Commutative Quasigroups Left-Quasigroup, Right-Quasigroup Left-Quasigroup, Right-Quasigroup Distributivity(+ over *), Co-Commutativity
Non-Associative Quasigroups Left-Quasigroup, Right-Quasigroup, Non-Associativity

Summaries of Example Theories

Here are some statistics taken after some 3-hour theory formation sessions in various finite algebras. The following choices were made:

  1. To only use the exists, forall, conjunct, negate and match production rules.
  2. To employ a complexity cap of 8 (ie. only six old concepts allowed in a new concept's construction history).
  3. To try HR first when proving implies theorems (ie. using the prime implicants), then try Otter if HR fails, and to give both only 10 seconds each.
  4. To only use MACE to generate counterexamples, and to give it 10 seconds to try to find a counterexample of size 1,2,...,8.
  5. To sort the concepts and conjectures after every 5th new concept has been introduced

For each of the theories, we performed three searches for concepts, namely (i) Best-First, (ii) Exhaustive and (iii) Random (with the colour coding being reflected in the table below). In the best first search, HR chose to build on the concepts which were involved in the most interesting conjectures, and which produced the most novel categorisations (with a 4:1 weighting for conjectures:novelty). Conjectures were deemed to be interesting with an 4:1 ratio of difficulty and surprisingness. Before starting the best first search, HR ran in exhaustive mode for 20 minutes, then random mode for 20 minutes, then in novelty mode (looking for novel categorisations) for 20 minutes, and only the last 2 hours were run in the best-first mode. This was an attempt to ensure good coverage before the search strategy homed in on particular concepts. In the exhaustive search, HR chose each (concept,production rule) pair and ran through the full set of parameters for each. In the random search, HR simply chooses the concept and production rule to use next randomly.

Algebra Concepts Models Largest Model Size Catns Theorems Open Conjs Prime Implicants
Group 234 183 240 7 6 7 6 8 6 25 14 31 632 715 598 8 0 2 450 219 451
Medial Quasigroup 310 259 207 15 13 10 4 4 4 194 126 108 217 77 312 44 58 51 213 42 249
Ring 279 344 124 13 8 6 5 4 4 109 46 15 439 408 251 13 3 0 401 178 210
TS-Quasigroup 245 200 247 8 8 7 6 6 6 57 39 39 959 1033 651 3 1 11 457 384 433
Non-Assoc 272 318 295 17 17 20 4 4 4 192 213 207 314 366 347 80 114 100 262 310 300
IP-Quasigroup 383 447 488 9 13 13 4 4 4 213 281 300 481 504 576 24 30 13 674 708 848
Algebra Av. Proof Len Av. Surprise Av. P.I proof length Otter Proofs HR Proofs
Group 4.8 3.6 4.8 2.7 2.3 2.9 6.0 4.3 5.8 450 219 451 899 1344 965
Medial Quasigroup 3.5 1.9 1.7 2.5 1.8 2.8 2.5 1.8 1.0 213 42 249 127 54 300
Ring 7.9 5.7 6.6 2.9 1.9 3.1 7.1 6.2 5.9 401 178 210 538 659 921
TS-Quasigroup 2.9 2.1 6.0 2.7 2.3 2.8 3.1 2.9 7.3 457 384 433 1368 1491 1429
Non-Assoc 2.8 2.8 2.5 2.8 3.0 3.0 0.7 0.4 0.5 262 310 300 119 184 132
IP-Quasigroup 5.7 5.5 8.1 3.5 3.6 4.1 3.6 3.3 4.5 674 708 848 183 212 264

Prime Implicates for Theories

One aspect of a theory we can extract is the set of prime implicates found by HR. These can be given as re-write rules and may be used separately in a theorem proving environment, or indeed if they can be turned into constraints, they can be used to help solve CSP problems.