Extracting a knowledge graph from Wikipedia articles on Geometry using LLMs.
Have a look at the notebooks:
manually_annotate_polygon_wikipedia_article.ipynb
extract_kg.ipynb
- calculate metrics better, accounting for equivalences in the graph, e.g.
square -> 4-gon -> polygon
is almost as good assquare <- polygon -> 4-gon
- add annotations for same-as relationship, e.g. 3-gon is same as triangle
- use pydantic and
tools
API to format ChatGPT output in a JSON - take a long-context model like Claude instead of atomic chunks, compare the results
- Try extracting all terms as NER first, then do
$N^2$ calls to check if connections exist. Compare the connection calls with and without context chunk.