Skip to content

Toxic comment prediction & collection service based on DistilKoBERT and Korean Hate Speech Dataset. ๐Ÿ… Won 2nd place in Korean Language Information Processing System Competition 2020

Notifications You must be signed in to change notification settings

osori/korean-hate-comments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

korean-hate-comments

๊ตญ๋ฆฝ๊ตญ์–ด์› 2020 ๊ตญ์–ด์ •๋ณด์ฒ˜๋ฆฌ์‹œ์Šคํ…œ ๊ฒฝ์ง„๋Œ€ํšŒ ๊ธˆ์ƒ ์ˆ˜์ƒ (๋ณด๊ณ ์„œ) (๋ฐœํ‘œ ์ž๋ฃŒ) (๋™์˜์ƒ)

์ง€๊ธˆ ๋ฐ”๋กœ ์‚ฌ์šฉํ•ด ๋ณด์„ธ์š”! (๋งํฌ) ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋Œ๋ฆฌ๋Š” ์„œ๋ฒ„์˜ ๋น„์šฉ ๋ถ€๋‹ด์œผ๋กœ ์ธํ•ด ์ž ์‹œ ์›น์‚ฌ์ดํŠธ๋ฅผ ๋‹ซ์•˜์Šต๋‹ˆ๋‹ค. ์•„์นด์ด๋ธŒ ์ž‘์—…์ด ๋๋‚˜๋Š” ๋Œ€๋กœ ๋‹ค์‹œ ์—ด๊ฒ ์Šต๋‹ˆ๋‹ค.

Korean Hate Speech Dataset์„ ์ด์šฉํ•ด ๊ตฌํ˜„ํ•œ ์•…ํ”Œ ์˜ˆ์ธก ๋ชจ๋ธ ๋ฐ ์ธํ„ฐ๋„ท ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ์•…ํ”Œ ๋‚ด๋ ค๋ฐ›๊ธฐ ์„œ๋น„์Šค

๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ ๋ฐ ์ฃผ์š” ํŒŒ์ผ ์„ค๋ช…

+-- good_comments_guardian_ipynb: ๋ชจ๋ธ ๊ตฌํ˜„ ๊ณผ์ •์„ ๋‹ด์€ Jupyter Notebook
+-- backend
|	+-- crawlers: ์ผ๋ฒ , ์ธ๋ฒค ๋“ฑ์—์„œ ๋Œ“๊ธ€ ์ˆ˜์ง‘
|	+-- model: ์˜ˆ์ธก ๋ชจ๋ธ ์ƒ์„ฑ
|	+-- build_db.py: crawlers ๋ฅผ ์ด์šฉํ•ด DB ๊ตฌ์ถ•
|	+-- comments.db: build_db.py๋กœ ๊ตฌ์ถ•๋œ DB
|	+-- main.py
+-- frontend: Ionic + React๋กœ ๊ตฌํ˜„๋œ ํ”„๋ก ํŠธ์—”๋“œ

์‚ฌ์ง„

๋Œ“๊ธ€ DB ๊ตฌ์ถ• ๊ณผ์ •

build_db.py ๊ตฌ๋™

build_db.py

์ƒ์„ฑ๋œ DB

  • ์ด ๊ฒŒ์‹œ๋ฌผ ์ˆ˜: 22,582๊ฑด (์ผ๋ฒ :12,304๊ฑด, ์ธ๋ฒค: 10,278๊ฑด)

    • ์ผ๋ฒ : 2020-08-28 ~ 2020-09-15 ์‚ฌ์ด์˜ ์ผ๊ฐ„ ๋ฒ ์ŠคํŠธ ๊ฒŒ์‹œ๊ธ€
    • ์ธ๋ฒค: 2020-06-07 ~ 2020-09-15 ์‚ฌ์ด์˜ ์˜คํ”ˆ ์ด์Šˆ ๊ฐค๋Ÿฌ๋ฆฌ 3์ถ”๊ธ€
  • ์ด ๋Œ“๊ธ€ ์ˆ˜: 494,331๊ฑด (์ผ๋ฒ :325,890๊ฑด, ์ธ๋ฒค: 198,636๊ฑด)

DB ๊ตฌ์กฐ

About

Toxic comment prediction & collection service based on DistilKoBERT and Korean Hate Speech Dataset. ๐Ÿ… Won 2nd place in Korean Language Information Processing System Competition 2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published