{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":110439950,"defaultBranch":"main","name":"cc-webgraph","ownerLogin":"commoncrawl","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2017-11-12T14:38:16.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1194841?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1719592602.0","currentOid":""},"activityList":{"items":[{"before":"12bedb26c304fc6fb25397a5a3ea1df0ce61e148","after":"8cc0f4c88072c258776df2cb7b86a6fe7cee12fe","ref":"refs/heads/explore-graphs","pushedAt":"2024-07-04T12:14:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nAdd script to extract top-10k vertices by indegree and outdegree.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"81424f2ece562495a5702dfd8ba60b2435249002","after":"12bedb26c304fc6fb25397a5a3ea1df0ce61e148","ref":"refs/heads/explore-graphs","pushedAt":"2024-07-04T12:02:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nAdd more utility methods to save data in files, to map host names\nto registered domains and to translate from/to reverse domain name\nnotation.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"1bfda67ba6c6a753ffbdfb2010233b1b504231cb","after":"81424f2ece562495a5702dfd8ba60b2435249002","ref":"refs/heads/explore-graphs","pushedAt":"2024-07-03T17:56:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nUpdate scripts to download webgraphs and build the vertex map:\nsupport host-level webgraphs which are shipped with a list of\nvertex files.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"8fffff223f42b65780afb369b1520273ea0b0480","after":"1bfda67ba6c6a753ffbdfb2010233b1b504231cb","ref":"refs/heads/explore-graphs","pushedAt":"2024-07-01T12:00:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nUse curl in script to download webgraphs, keep wget as optional\n(to be commented in). Use timestamping for downloads.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"153db365dc4a29224073e5046da23d191597f33e","after":"8fffff223f42b65780afb369b1520273ea0b0480","ref":"refs/heads/explore-graphs","pushedAt":"2024-06-29T12:48:18.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nAdd examples how to use the graph exploration Java classes.\nAdd information how to build the Javadocs.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":null,"after":"e98d1e9ac1f898a3fe90d8b708cea2915a2384bc","ref":"refs/heads/cc-main-2024-apr-may-jun","pushedAt":"2024-06-28T16:36:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"chore: update hostgraph configuration for cc-main-2024-apr-may-jun","shortMessageHtmlLink":"chore: update hostgraph configuration for cc-main-2024-apr-may-jun"}},{"before":"25d59a482c060b3e9a86d61466a40714a801822a","after":"15917a1afb37d7b4b5140cd021a4327f85de90e2","ref":"refs/heads/main","pushedAt":"2024-06-28T13:09:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: text export of outdegrees and indegrees\n\nJoin outdegrees and indegrees with vertex names and write it to\na text file. Export the top-10k vertices by outdegree resp. indegree.","shortMessageHtmlLink":"feat: text export of outdegrees and indegrees"}},{"before":"e684eb5da5d3bc469dce8420f4df978f97495839","after":"153db365dc4a29224073e5046da23d191597f33e","ref":"refs/heads/explore-graphs","pushedAt":"2024-06-27T15:48:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\nAdd methods to access and view successors/predecessors and\ncount top-level domains in lists of vertices.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"4149d3239b1d5442a290f23083721a3d4e667ef9","after":"e684eb5da5d3bc469dce8420f4df978f97495839","ref":"refs/heads/explore-graphs","pushedAt":"2024-06-23T12:23:26.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\n- JShell script to load a graph\n- tutorial / quick start graph exploration","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"450e6b3381fd964f83f4cb831a62d880b82a6130","after":"4149d3239b1d5442a290f23083721a3d4e667ef9","ref":"refs/heads/explore-graphs","pushedAt":"2024-06-23T12:20:19.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraph\n\n- JShell script to load a graph\n- tutorial / quick start graph exploration","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraph"}},{"before":"ed43b76e43910aaf84a596d7221ac94c7e824cd2","after":"25d59a482c060b3e9a86d61466a40714a801822a","ref":"refs/heads/main","pushedAt":"2024-06-03T21:00:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"chore: update hostgraph configuration for cc-main-2024-feb-apr-may","shortMessageHtmlLink":"chore: update hostgraph configuration for cc-main-2024-feb-apr-may"}},{"before":"c895c23bdfe0db5399e7064f0fa57a25f31f117e","after":"ed43b76e43910aaf84a596d7221ac94c7e824cd2","ref":"refs/heads/main","pushedAt":"2024-05-02T21:33:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"update hostgraph configuration for cc-main-2024-nov-feb-apr","shortMessageHtmlLink":"update hostgraph configuration for cc-main-2024-nov-feb-apr"}},{"before":"67416fb499c703d1c9bc383deee5123689c8eb92","after":"54c0e8eeef82e1d16a101f22306e58f527e5efb1","ref":"refs/heads/crawler-commons-dev","pushedAt":"2024-04-27T18:16:47.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Use crawer-commons development version","shortMessageHtmlLink":"Use crawer-commons development version"}},{"before":"1cbbf46ca9a1f55e28d25015c147e08ca0ab05aa","after":null,"ref":"refs/heads/dependabot/maven/org.apache.commons-commons-configuration2-2.10.1","pushedAt":"2024-04-27T18:14:48.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"ce985e26e9e75ffa0900eda82a4ad19d9c96d58b","after":"c895c23bdfe0db5399e7064f0fa57a25f31f117e","ref":"refs/heads/main","pushedAt":"2024-04-27T18:14:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Bump org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1\n\nBumps org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1.\n\n---\nupdated-dependencies:\n- dependency-name: org.apache.commons:commons-configuration2\n dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1"}},{"before":"48682aa2b55335737132f011d53b22d92c52e7af","after":"ce985e26e9e75ffa0900eda82a4ad19d9c96d58b","ref":"refs/heads/main","pushedAt":"2024-04-27T18:11:02.000Z","pushType":"pr_merge","commitsCount":8,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Merge pull request #15 from commoncrawl/upgrade-build-and-dependencies\n\nUpgrade build and dependencies","shortMessageHtmlLink":"Merge pull request #15 from commoncrawl/upgrade-build-and-dependencies"}},{"before":null,"after":"450e6b3381fd964f83f4cb831a62d880b82a6130","ref":"refs/heads/explore-graphs","pushedAt":"2024-04-15T14:26:36.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"feat: tool and scripts to interactively explore webgraphs\n\nThe class GraphExplorer allows to explore webgraphs using the JShell.\nThe script graph_explore_build_vertex_map.sh builds a map of vertex\nlabel to vertex ID and verifies that all graph files required for\ngraph exploration are downloaded.","shortMessageHtmlLink":"feat: tool and scripts to interactively explore webgraphs"}},{"before":"90c06f228de5674c9f0fe12349c5425651a259eb","after":"daec4d33562f9ea9e3c3e28b26a402648b9c2f16","ref":"refs/heads/upgrade-build-and-dependencies","pushedAt":"2024-04-15T12:16:54.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"docs: fix Javadoc errors/warnings\n\nReplace entities not supported by Java 11; add missing docs for\nparams and return value.","shortMessageHtmlLink":"docs: fix Javadoc errors/warnings"}},{"before":"c2703a74f0713a7403323079204386e191c13238","after":"90c06f228de5674c9f0fe12349c5425651a259eb","ref":"refs/heads/upgrade-build-and-dependencies","pushedAt":"2024-04-15T12:09:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"docs: fix Javadoc errors/warnings","shortMessageHtmlLink":"docs: fix Javadoc errors/warnings"}},{"before":null,"after":"c2703a74f0713a7403323079204386e191c13238","ref":"refs/heads/upgrade-build-and-dependencies","pushedAt":"2024-04-15T11:10:20.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"build(dependencies): upgrades and improvements\n\nUpgrade fastutil, slf4j, junit and Maven plugins.\nComplete exclusions of transitive dependencies.","shortMessageHtmlLink":"build(dependencies): upgrades and improvements"}},{"before":"2e84615b638bb1148388370e920c0c71aff1fa09","after":"48682aa2b55335737132f011d53b22d92c52e7af","ref":"refs/heads/main","pushedAt":"2024-04-09T12:46:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"build(junit): disable unit test potentially failing due to limited memory\n\nDisable the unit test which tests for properly saving and reloading of\na large array using the fastutil library. The unit test allocates 2 GiB\nRAM which may fail if the system does not provide sufficient memory.\nFixes #9 \"Maven build failure\".","shortMessageHtmlLink":"build(junit): disable unit test potentially failing due to limited me…"}},{"before":null,"after":"1cbbf46ca9a1f55e28d25015c147e08ca0ab05aa","ref":"refs/heads/dependabot/maven/org.apache.commons-commons-configuration2-2.10.1","pushedAt":"2024-03-21T19:17:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1\n\nBumps org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1.\n\n---\nupdated-dependencies:\n- dependency-name: org.apache.commons:commons-configuration2\n dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump org.apache.commons:commons-configuration2 from 2.9.0 to 2.10.1"}},{"before":"2856b04f14a340a6a61de2678eaa656ac38df320","after":"2e84615b638bb1148388370e920c0c71aff1fa09","ref":"refs/heads/main","pushedAt":"2024-03-14T11:04:00.000Z","pushType":"pr_merge","commitsCount":4,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"Merge pull request #13 from commoncrawl/hostlink-extraction-reuse-data-archived-on-s3\n\nHostgraph construction: reuse extracted hostlinks checkpointed on S3","shortMessageHtmlLink":"Merge pull request #13 from commoncrawl/hostlink-extraction-reuse-dat…"}},{"before":"0c0fc9c032aa674dde00db7b35f6aaaf5978d2b5","after":"c04c1b60f00422ffc76dadb6ffa3006220d30056","ref":"refs/heads/hostlink-extraction-reuse-data-archived-on-s3","pushedAt":"2024-03-12T22:58:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Hostgraph construction: reuse extracted hostlinks checkpointed on S3\n- skip over success markers, do not add them as input of the merge graph\n job, it will fail on non-Parquet input","shortMessageHtmlLink":"Hostgraph construction: reuse extracted hostlinks checkpointed on S3"}},{"before":"3a6af9ec066593621c483117115aebc1c1930338","after":"0c0fc9c032aa674dde00db7b35f6aaaf5978d2b5","ref":"refs/heads/hostlink-extraction-reuse-data-archived-on-s3","pushedAt":"2024-03-12T13:57:17.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"Update hostgraph configuration for next web graph release","shortMessageHtmlLink":"Update hostgraph configuration for next web graph release"}},{"before":"07aab0e9a724926fc41e875a6df190ef0d03fd1f","after":"3a6af9ec066593621c483117115aebc1c1930338","ref":"refs/heads/hostlink-extraction-reuse-data-archived-on-s3","pushedAt":"2024-03-12T12:41:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"chore: update CRAWLS and MERGE_NAME vars","shortMessageHtmlLink":"chore: update CRAWLS and MERGE_NAME vars"}},{"before":null,"after":"07aab0e9a724926fc41e875a6df190ef0d03fd1f","ref":"refs/heads/hostlink-extraction-reuse-data-archived-on-s3","pushedAt":"2024-03-09T12:09:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Hostgraph construction: reuse extracted hostlinks checkpointed on S3\n- look on the S3 checkpoint/archive prefix for extracted hostlinks\n of every crawl to be processed\n- if found use the checkpointed/archived data and do not extract\n the hostlinks again\n- add success marker for processed crawl(s) to ensure that all\n input splits have been successfully processed","shortMessageHtmlLink":"Hostgraph construction: reuse extracted hostlinks checkpointed on S3"}},{"before":"ae360cc166a6630ea305515c44320223c4116fbc","after":"67416fb499c703d1c9bc383deee5123689c8eb92","ref":"refs/heads/crawler-commons-dev","pushedAt":"2024-02-20T15:28:16.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"Use crawer-commons development version","shortMessageHtmlLink":"Use crawer-commons development version"}},{"before":"9c33ca5d5c5bc00880080b1120dc2928506c400a","after":"2856b04f14a340a6a61de2678eaa656ac38df320","ref":"refs/heads/main","pushedAt":"2023-12-19T15:10:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"Correct MERGE_NAME variable, add explanatory comment","shortMessageHtmlLink":"Correct MERGE_NAME variable, add explanatory comment"}},{"before":"360489872411ae97edc6199c150ab428b9babb88","after":"9c33ca5d5c5bc00880080b1120dc2928506c400a","ref":"refs/heads/main","pushedAt":"2023-12-18T13:38:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"thunderpoot","name":"underwood","path":"/thunderpoot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/54200401?s=80&v=4"},"commit":{"message":"Update post 2023-50","shortMessageHtmlLink":"Update post 2023-50"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEduE90wA","startCursor":null,"endCursor":null}},"title":"Activity · commoncrawl/cc-webgraph"}