How Mascots Work
Understanding the data model behind Masko's API.
The Hierarchy
Every mascot in Masko follows a four-level hierarchy. Projects group your work, collections define characters, items represent poses or actions, and assets are the actual files.
Project
└── Collection (= one mascot character)
├── Item: "Wave"
│ ├── Asset: image (pose.png)
│ ├── Asset: transparent_image (pose_nobg.png)
│ ├── Asset: video (wave.mp4)
│ ├── Asset: webm (wave.webm)
│ └── Asset: hevc (wave.mov)
├── Item: "Idle"
│ └── ...
└── Item: "Thumbs Up"
└── ...Collections
A collection represents a single mascot character. It holds the character's prompt (the text description used for generation), reference images (up to 6 examples of what the character looks like), and a style card (an auto-extracted summary of the character's visual traits).
Collections also store settings like animation sizes, CDN configuration, and the caution list used to maintain consistency across generations.
Items
An item is a single pose or action for the mascot - like "Wave", "Idle", or "Thumbs Up". Each item has a name, a prompt describing the action, and a type (image, animation, or logo).
When you generate an image for an item, the API combines the collection's character prompt with the item's action prompt to produce a consistent result.
Assets
An asset is a single generated file. Each item can have multiple assets of different types:
| Type | Format | Description |
|---|---|---|
image | .png | Original generated image with background |
transparent_image | .png | Background removed, transparent PNG |
video | .mp4 | Animated version (H.264) |
webm | .webm | Web-optimized format with alpha channel |
hevc | .mov | Apple-compatible format with alpha channel |
stacked_video | .mp4 | Stacked layout for custom alpha compositing |
Generation Graph
Assets are connected through a generation graph. When you generate an animation, the API first creates an image, then removes the background, then animates it, then converts to web formats. Each step links back to its source via the generation_links table.
image (.png)
└── transparent_image (.png) [role: source]
└── video (.mp4) [role: source, end_frame]
├── webm (.webm) [role: source]
└── hevc (.mov) [role: source]The role field on each link indicates the relationship. source means "this asset was derived from that asset". end_frame is used for animations where a final pose image guides the motion.
Reference Images & Style Cards
Each collection can have up to 6 reference images. These are examples of what the mascot looks like - they guide every generation to maintain visual consistency.
When you first generate an image, Masko automatically extracts a style card from the references. The style card is a structured summary of the character's visual traits (colors, proportions, line style, shading) that gets injected into every prompt. If you change the references, the style card is cleared and re-extracted on the next generation.
The caution list accumulates notes from post-generation validation. If a generated image drifts from the style (wrong color, missing detail), the issue is logged and injected into future prompts to prevent recurrence.
Size Variants
Animations can be generated at multiple sizes simultaneously. Set the animation_sizes field on a collection to define which resolutions you need (e.g. 512x512, 256x256, 128x128). Resizing is free - you only pay credits for the base animation generation.