Every parent who experiments with AI storybooks has seen it: page one features a kid in a red hoodie, page two swaps the hoodie for armor, and by the finale the same child is suddenly a corgi. Random diffusion models love to improvise, which is why building an AI picture book with consistent characters once felt impossible. Here’s how the Silly Scribe team turned that chaos into a predictable system—and why StoryMaker became our nickname for the mix of classes and services that ship in the app.
Why traditional stacks drift
Four reasons AI storybooks shapeshift
Prompt amnesia
Most image APIs trim or ignore instructions once a prompt crosses roughly 200 tokens. Specific cues like “keep Ava’s blue glasses” vanish mid-request, so the very next render quietly forgets who your hero is supposed to be.
Stochastic sampling
Diffusion models re-inject noise at every step even when seeds are locked. The shorter the character description, the faster tiny attributes mutate until the star of page one becomes a stranger by the finale.
No shared canon
Plenty of apps describe a character once in prose and never store it. Without a canonical descriptor, every render call starts from scratch and prays the model remembers tone, palette, or accessories.
Panel isolation
Each panel is usually rendered in isolation. Without piping prior panels back into the model, there is nothing forcing the illustrator to lock poses, props, or style, so visual drift is inevitable.
Meet StoryMaker
The consistency engine powering Silly Scribe
Internally, the combination of Character, CharacterStore, StoryDirector, ArtDirectedImagePromptBuilder, and GoogleGeminiFlashImageProvider is nicknamed StoryMaker. Each piece removes a different source of randomness so AI-generated casts stay recognizable.
Character DNA
Models/Character.swiftThe Character model treats avatars like data—not vibes—so StoryMaker always knows exactly who belongs in every scene.
- Character.getConsistentImageDescription() prioritizes avatar prompts, sketch descriptions, and structured traits, always appending “KEEP IDENTICAL across all scenes.”
- Character.toCanonicalDescriptor() hashes those details into a deterministic token such as <char:GRANDMA_HAZEL>, giving every hero a reusable handle.
- CharacterEditViewController asks Gemini to summarize uploaded photos (“Please analyze this cartoon avatar…”) so the descriptor mirrors the hairstyle, outfit, and palette parents approved.
Shared context for text + art
Services/Storage/CharacterStore.swiftCharacterStore keeps narrative and visuals in sync so nobody forgets who is supposed to show up in each beat.
- getCharacterContextForLLM injects every canonical descriptor into the story prompt so the text LLM never drops a family member.
- getCharacterVisualsForImageGen mirrors that list for the art layer, ending with “Ensure these characters appear consistently across all images,” so text and art stay synchronized.
Genre-aware storyboard
Services/LLM/StoryDirector.swiftStoryDirector generates beat-by-beat plans that adapt each character to the selected genre without losing their core identity.
- generateStoryBoard() and adaptCharactersToGenre upgrade wardrobes automatically—Grandma’s raincoat becomes a fantasy cape or desaturates for noir without changing her face.
- Every storyboard JSON ships with a style bible, character bible, prop map, and call sheet. Even if Gemini answers in SCREAMING_SNAKE_CASE, we normalize names and fall back to deterministic storyboards when needed.
Prompt compiler + style locks
Services/ArtDirectedImagePromptBuilder.swiftEach illustration prompt is carefully compiled so the model cannot improvise new looks mid-story.
- buildPrompt() blends SceneBeat, ShotPlan, global art direction (“paper-cut collage, stop-motion shadows”), cast constraints, prop requirements, and a style locks list: “Use canonical <char:GRANDMA_HAZEL>—do not change hairstyle, costume colors, or accessories.”
- forbidTokens explicitly lists who and what should stay off-page, preventing stray cameos or props from previous chapters.
Sequential Gemini rendering
Services/LLM/Google/GoogleGeminiFlashImageProvider.swiftSequential rendering threads every panel together so the model references its own prior work—just like a human illustrator.
- generateStoryBookImages() builds the render plan, applies ArtDirectedImagePromptBuilder, and appends a shared negative prompt: “remove duplicate characters, remove extra limbs, remove text…”
- ImageSequenceGenerator feeds each rendered panel back into subsequent calls with instructions like “Reference panel 1… Maintain consistency but vary composition,” which finally keeps shoes, skin tone, and props identical throughout.
Fallbacks and QA
StoryBookStore + helpersIf anything slips, StoryMaker still refuses to ship chaos.
- When Gemini hiccups, createPlaceholder() drops branded art so parents never stare at half-finished spreads.
- StoryBookStore stores text, art, and canonical metadata together, so replays and exports always reuse the exact same assets—no surprise re-rolls.
Results parents can see
What StoryMaker delivers
Zero surprise shapeshifts
Style locks plus sequential rendering eliminate almost all visual drift, and the remaining edge cases are caught by the same QA hooks we use while building.
Faster storytime
Because StoryMaker rarely re-renders, families jump from Mad Libs inputs to a finished illustrated book in one pass.
Publisher-grade output
Teachers and partners can export anthologies confident that panel five still features the hero who opened the book.
Why it matters
Kids spot inconsistency instantly. When you promise an AI storybook with consistent characters, you’re promising trust: that the hero they built with mom shows up the same way on every spread. StoryMaker proves that when you treat characters as first-class data—complete with canonical descriptors, storyboard guardrails, and sequential diffusion—you can deliver generative AI consistent characters without the guesswork.
Ready to see it in action?
Open Silly Scribe, build a cast, and watch StoryMaker keep them perfectly in-sync from the first prompt through the final illustration.