Sep
27
2009

Checking vocabulary against JLPT level 4

JLPT level 4 only has about 100 kanji and I wanted to check a database of vocabulary to see which entries are readable by people with JLPT4 and which need me to convert to a phonetic reading.

I first thought I’d do my standard regex check for hiragana and katakana, then compare the rest of the characters against an array of JLPT4 kanji.

I then realised I might as well just use a regex for the whole thing, and this monster was born:

^[ぁ-ゖ~ァ-ヶー一七万三上下中九二五人今休会何先入八六円出分前北十千午半南友口古右名四国土外多大天女子学安小少山川左年店後手新日時書月木本来東校母毎気水火父生男白百目社空立耳聞花行西見言話語読買足車週道金長間雨電食飲駅高魚]+$

Breaking it down:

  • ぁ-ゖ: Hiragana
  • ~: Handle prefixes/suffixes in the dictionary (such as “~枚” or “第~”)
  • ァ-ヶー: Katakana
  • The rest is the entire JLPT4 kanji list, in no particular order.
Written by ダニエル氏 in: Uncategorized |

No Comments »

RSS feed for comments on this post. TrackBack URL


Leave a Reply

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com