How Mascots Work

Understanding the data model behind Masko's API.

The Hierarchy

Every mascot in Masko follows a four-level hierarchy. Projects group your work, collections define characters, items represent poses or actions, and assets are the actual files.

Project
  └── Collection (= one mascot character)
        ├── Item: "Wave"
        │     ├── Asset: image (pose.png)
        │     ├── Asset: transparent_image (pose_nobg.png)
        │     ├── Asset: video (wave.mp4)
        │     ├── Asset: webm (wave.webm)
        │     └── Asset: hevc (wave.mov)
        ├── Item: "Idle"
        │     └── ...
        └── Item: "Thumbs Up"
              └── ...

Collections

A collection represents a single mascot character. It holds the character's prompt (the text description used for generation), reference images (up to 6 examples of what the character looks like), and a style card (an auto-extracted summary of the character's visual traits).

Collections also store settings like animation sizes, CDN configuration, and the caution list used to maintain consistency across generations.

Items

An item is a single pose or action for the mascot - like "Wave", "Idle", or "Thumbs Up". Each item has a name, a prompt describing the action, and a type (image, animation, or logo).

When you generate an image for an item, the API combines the collection's character prompt with the item's action prompt to produce a consistent result.

Assets

An asset is a single generated file. Each item can have multiple assets of different types:

TypeFormatDescription
image.pngOriginal generated image with background
transparent_image.pngBackground removed, transparent PNG
video.mp4Animated version (H.264)
webm.webmWeb-optimized format with alpha channel
hevc.movApple-compatible format with alpha channel
stacked_video.mp4Stacked layout for custom alpha compositing

Generation Graph

Assets are connected through a generation graph. When you generate an animation, the API first creates an image, then removes the background, then animates it, then converts to web formats. Each step links back to its source via the generation_links table.

image (.png)
  └── transparent_image (.png)    [role: source]
        └── video (.mp4)          [role: source, end_frame]
              ├── webm (.webm)    [role: source]
              └── hevc (.mov)     [role: source]

The role field on each link indicates the relationship. source means "this asset was derived from that asset". end_frame is used for animations where a final pose image guides the motion.

Reference Images & Style Cards

Each collection can have up to 6 reference images. These are examples of what the mascot looks like - they guide every generation to maintain visual consistency.

When you first generate an image, Masko automatically extracts a style card from the references. The style card is a structured summary of the character's visual traits (colors, proportions, line style, shading) that gets injected into every prompt. If you change the references, the style card is cleared and re-extracted on the next generation.

The caution list accumulates notes from post-generation validation. If a generated image drifts from the style (wrong color, missing detail), the issue is logged and injected into future prompts to prevent recurrence.

Size Variants

Animations can be generated at multiple sizes simultaneously. Set the animation_sizes field on a collection to define which resolutions you need (e.g. 512x512, 256x256, 128x128). Resizing is free - you only pay credits for the base animation generation.