My First Steps in Japanese Learning at INALCO


My First Steps in Japanese Learning at INALCO

As a new student of Japanese at INALCO (Institut National des Langues et Civilisations Orientales). Like many beginners, I’m searching for effective ways to memorize vocabulary and kanji. I used Anki previously for different topics and I’ll use it for my Japanese journey. It’s a popular flashcard app that helps you memorize information through spaced repetition.

Experimenting with Anki

While the Anki community offers many shared decks for Japanese, I wanted to try creating cards that matched exactly what we’re learning in class. So, I decided to experiment with a small Python script to help me generate Anki decks from simple CSV files.

Python Script that Generates Anki Decks

  1. Reads Japanese-French word pairs from a CSV file.
  2. Attempts to generate audio for the Japanese words.
  3. Creates basic Anki cards with the word pairs and audio.
  4. Packages everything into an Anki deck file.

It’s just the beginning. But for now, it’s helping me create study materials that align with my coursework.

import argparse
import csv
import logging
import os
import tempfile

import anki
from anki.collection import Collection
from anki.exporting import AnkiPackageExporter
from anki.media import MediaManager
from gtts import gTTS


def generate_audio_for(text, temp_dir):
    logging.info("Generating audio for text: %s", text)
    tts = gTTS(text, lang="ja")
    audio_file = os.path.join(temp_dir, f"{text}.mp3")
    tts.save(audio_file)
    return audio_file


def import_to_collection(args):
    temp_dir = tempfile.mkdtemp()
    logging.info("Using temp dir: %s", temp_dir)
    collection = Collection(os.path.join(temp_dir, "collection.anki2"))
    deck_id = collection.decks.id(args.deck_name)
    model = collection.models.by_name("Basic (and reversed card)")
    media_manager = MediaManager(collection, False)

    # Update the styling for larger font
    model[
        "css"
    ] = """
    .card {
        font-size: 24px;
        text-align: center;
    }
    """

    # For each line in the CSV, add a note to the collection and generate audio
    with open(args.input_csv, newline="", encoding="utf-8") as csvfile:
        csvreader = csv.reader(csvfile)
        if args.skip_header:
            next(csvreader)  # Skip the header row
        for row in csvreader:
            japanese_text, french_text = row[0], row[1]
            audio_file = generate_audio_for(japanese_text, temp_dir)
            media_manager.add_file(audio_file)
            note = collection.new_note(model)
            if os.path.exists(audio_file):
                japanese_text += f" [sound:{os.path.basename(audio_file)}]"
                note.add_tag("has_audio")
            for tag in args.tags:
                note.add_tag(tag)
            note.fields = [japanese_text, french_text]
            logging.info("Adding row: %s", note.fields)
            collection.add_note(note, deck_id)

    # Export to apkg
    exporter = AnkiPackageExporter(collection)
    exporter.exportInto(args.output_apkg)


def command_line():
    parser = argparse.ArgumentParser(description="Convert CSV to Anki APKG")
    parser.add_argument("input_csv", help="Path to the input CSV file")
    parser.add_argument("output_apkg", help="Path to the output APKG file")
    parser.add_argument("--deck-name", default="Default Deck", help="Deck name")
    parser.add_argument(
        "--tags", nargs="*", default=[], help="List of tags to add to each card"
    )
    parser.add_argument(
        "--skip-header", action="store_true", help="Skip the header row in the CSV file"
    )
    return parser.parse_args()


def main():
    anki.lang.set_lang("en")
    logging.basicConfig(level=logging.INFO)
    import_to_collection(command_line())


if __name__ == "__main__":
    main()

Generating the CSV with LLMs

I’m also experimenting with Language Learning Models (LLMs) to generate the CSV files. I tried multimodal models like Mistral Pixtral and I was able to generate CSV files directly from course material (consist of screenshots, pdf files, bad quality images, etc.). It’s working really well and it makes the whole experience really quick:

  1. Upload the course material in Pixtral with a prompt like Generate a CSV file: Japanese hiragana in one column and French in the other.
  2. Download the CSV file.
  3. Run the script above to generate the Anki deck.
  4. Import the Anki deck.

It takes less than 2 minutes to generate the Anki deck for a whole lesson.

This should work with Claude.ai and ChatGPT as well.

Looking Ahead

As I continue my Japanese journey at INALCO, I hope to gradually improve both my language skills and my study techniques. There’s still so much to learn, not just about Japanese, but also about effective learning strategies.

がんばります (Ganbarimasu) - I’ll do my best!