Reference images and style consistency

A mascot is only useful when it looks like the same character in every pose. The Masko API keeps your character on-model by passing reference images as visual context on every generation and by extracting a style card that captures the defining traits of the design.

There are three different API fields that accept an image, and they are not interchangeable. This page explains what each one does and when to use it.

The three reference concepts

WherePersistenceUse case
POST /v1/collections/:id/references (body: asset_id or url)PersistentStyle anchors for every generation in this collection. Max 6.
CreateCollectionBody.reference_image_urls / reference_asset_idsPersistent (seeded at creation)Same as above, set when creating the collection.
PreviewBody.reference_image_urls on /v1/generate/previewEphemeral (one-off)Style hints for a single preview, not stored.
GenerateBody.source_image_asset_id on /v1/collections/:id/generateN/AThe image to animate or edit. NOT a style reference.

The first two rows are the same concept (persistent collection references) set via different endpoints. The third row is a per-request hint. The fourth row is something completely different: the source image you are transforming.

Removing a reference

Remove a persistent reference by passing its asset ID in the path:

curl -X DELETE https://api.masko.ai/v1/collections/COLLECTION_ID/references/ASSET_ID \
  -H "Authorization: Bearer masko_YOUR_API_KEY"

Removing a reference clears the cached style card; it will be re-extracted on the next generation.

How the style card works

On the first generation in a collection, Masko analyses your reference images and extracts a style card - a short description that captures the character's color palette, proportions, outline style, and any other defining traits. The style card is cached on the collection config and injected into every subsequent prompt alongside the raw reference images.

The cached style card is cleared whenever you add or remove a reference. The next generation will re-extract it, so your character stays in sync with your current reference set. You never call the style card extraction yourself - it runs lazily on generation.

Common mistakes

  • Using a preview URL as source_image_asset_id. Previews returned by /v1/generate/preview are not persisted assets. They have no asset ID and cannot be used as a source. If you want to animate a preview, generate it as a real item first with POST /v1/collections/:id/generate and then use the returned asset_ids.image.
  • Passing a transparent_image asset ID as source_image_asset_id. The source must be the full image (type image), not the background-removed variant. Use the asset_ids.image field from the generate response, not asset_ids.transparent_image.
  • Confusing source_image_asset_id with collection references. source_image_asset_id is the specific pose you want to animate or edit. Collection references are the style anchors that define what the character looks like. They are independent - the source tells the model "animate this exact pose" and the references tell the model "this is what the character looks like".

See also