Self Emergence of Knowledge Trees:

Extraction of the Wikipedia hidden hierarchies

Lev Muchnik 1, Royi Itzhak 2Sorin Solomon 3,4and Yoram Louzoun 2*

  1. Physics department, Bar Ilan University, Ramat Gan, Israel 52900

  2. Math department, Bar Ilan University, Ramat Gan, Israel, 52900

  3. Racah Institute of Physics Hebrew University, Jerusalem, Israel.

  4. ISI Torino I-10133, Italy.


* To whom correspondence and requests should be sent: Yoram Louzoun, math department, Bar Ilan University, Ramat Gan, Israel, 52900 , phone: 972-3-5317610



The rapid accumulation of knowledge and the recent emergence of new dynamic and practically unmoderated information repositories made the classical concept of the knowledge hierarchal structure irrelevant and impossible to impose manually. This led to modern methods of data location, such as browsing or searching, which conceal the underlying information structure. We here propose new methods designed to automatically construct a hierarchy from a network of related terms. We apply these methods to Wikipedia and compare the hierarchy obtained from network of articles to the complementary acyclic category layer of the Wikipedia and show an excellent fit. We verify our methods in two networks with no apriori hierarchy: The E. coli genetic regulatory network and the C. Elegans neural network and reproduce a known functional order.

Data & Results


Genetic Network

File Name Details  &  Explanation
E.Coli Hierarchy

Attraction Basin Hierarchy applied to E.Coli Genetic Network


Neural Network

File Name Details  &  Explanation
Neural Net Hierarchy Hierarchal Intermediacy (betweenness-based) applied to C-elegans Neural Network
Neural Net Hierarchy Local Hierarchy applied to C-elegans Neural Network


Wikipedia Measurements (MatLab mat-files)

Wikipedia Articles Categories  
cawiki_Redirect.mat eswiki_Redirect.mat itwiki_Redirect.mat ptwiki_Redirect.mat
cswiki_Redirect.mat fiwiki_Redirect.mat jawiki_Redirect.mat slwiki_Redirect.mat
dawiki_Redirect.mat frwiki_Redirect.mat nlwiki_Redirect.mat svwiki_Redirect.mat
dewiki_Redirect.mat hewiki_Redirect.mat nowiki_Redirect.mat ukwiki_Redirect.mat
enwiki_Redirect.mat huwiki_Redirect.mat plwiki_Redirect.mat zhwiki_Redirect.mat
Wikipedia Categories
bgwiki_betweenness1.mat jawiki_betweenness1.mat bgwiki_betweenness2.mat itwiki_betweenness2.mat
cawiki_betweenness1.mat nlwiki_betweenness1.mat cawiki_betweenness2.mat jawiki_betweenness2.mat
cswiki_betweenness1.mat nowiki_betweenness1.mat cswiki_betweenness2.mat nlwiki_betweenness2.mat
dawiki_betweenness1.mat plwiki_betweenness1.mat dawiki_betweenness2.mat nowiki_betweenness2.mat
eowiki_betweenness1.mat ptwiki_betweenness1.mat dewiki_betweenness2.mat plwiki_betweenness2.mat
eswiki_betweenness1.mat ruwiki_betweenness1.mat eowiki_betweenness2.mat ptwiki_betweenness2.mat
fiwiki_betweenness1.mat slwiki_betweenness1.mat eswiki_betweenness2.mat ruwiki_betweenness2.mat
frwiki_betweenness1.mat svwiki_betweenness1.mat fiwiki_betweenness2.mat slwiki_betweenness2.mat
hewiki_betweenness1.mat ukwiki_betweenness1.mat frwiki_betweenness2.mat svwiki_betweenness2.mat
huwiki_betweenness1.mat zhwiki_betweenness1.mat hewiki_betweenness2.mat ukwiki_betweenness2.mat
itwiki_betweenness1.mat   huwiki_betweenness2.mat zhwiki_betweenness2.mat
Network Properties
bgwiki_properties.mat jawiki_properties.mat cawiki_properties.mat nlwiki_properties.mat
cswiki_properties.mat plwiki_properties.mat dawiki_properties.mat ptwiki_properties.mat
eswiki_properties.mat ruwiki_properties.mat fiwiki_properties.mat slwiki_properties.mat
frwiki_properties.mat svwiki_properties.mat hewiki_properties.mat ukwiki_properties.mat
huwiki_properties.mat zhwiki_properties.mat itwiki_properties.mat  



The work of LM and YL was covered by the Yeshaya Horowitz. The work of YL and RI is also supported by the co3 NEST PATHFINDER of the EU 6th framework.



  1. Aristotle, Metaphysics, IV, 1.

  2. Encyclopédie, Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers (Editions Flammarion, 993 reprint).

  3. B. A. Huberman and L. A. Adamic, NATURE 401, 131 (1999).

  4. R. R. Larson, In Proceedings of the Annual Meeting of the American Society of Information Science  (1996).

  5. T. Berners-Lee, R. Caillau, A. Luotonen, et al., Communications of the ACM 37, 76 (1994).

  6. Wikipedia-

  7. Pubmed-

  8. Wikimedia.

  9. V. D. P. S. A Capocci, F Colaiori, L S Buriol, D Donato, S Leonardi, G Caldarelli, Phys Rev E  (2006).

  10. M. B. V. Zlatic, H. Stefancic, M. Domazet, Phys Rev E 74 (2006).

  11. M. Faloutsos, P. Faloutsos, and C. Faloutsos, Comp. Comm. R. 29,, 251 (1999).

  12. Wikipedia_Categories-

  13. D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).

  14. L. C. Freeman, Sociometry 40, 35 (1977).

  15. U. Brandes, Journal of Mathematical  Sociology 25, 163 (2001).

  16. S. B. L. Page, R. Motwani, T. Winograd., Stanford Digital Libraries Working Paper  (1998).

  17. J. Kleinberg, Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms ACM Press, New York, , 668 (1998).

  18. D. R. Hofstadter, Godel, Escher, Bach: An Eternal Golden Braid (Basic books, New York, 1979).

  19. F. Heylighen, in Foundations of Science (Springer, Amsterdam, 2000), Vol. 4.

  20. A. L. Barabasi and R. Albert, Science 286, 509 (1999).

  21. P. Erdos and A. Renyi,  (Institute of Mathematics Hungarian Academy of Sciences, 1959).

  22. L. A. Amaral, A. Scala, M. Barthelemy, et al., Proc Natl Acad Sci U S A. 97, 11149 (2000).

  23. S. Strogatz, Nature 410, 268 (2001).

  24. H. Jeong, B. Tombor, R. Albert, et al., Nature 407, 651 (2000).

  25. M. L. Louzoun Y, Solomon S., Bioinformatics In press (2006).

  26. S. S. Shen-Orr, R. Milo, S. Mangan, et al., Nat Genet 31, 64 (2002).