Prompt Guide | Prompt Engineering tips for the generating high quality AI product images

Background to this guide

This guide was compiled by the Ecomtent AI Research team with the help of 20 computer science students, on paid internships to label and analyse 10,000s of customer-generated images and prompts. Following this data labelling, we conducted our own experimentation with this group of students to put together this definitive guide. It may be the most complete insight into image prompt engineering available for free on the internet.

Product size and positioning

Before you start to prompt, you should quickly visualize your output image in your mind. You should visualize the size and position of your product in the output, and adjust in advanced settings accordingly.

For example, if you wanted your output image to feature your product as the main focus, you would increase the default product resize to 100%; where as if you wanted your product to be smaller, and give the AI more space to generate a lifestyle setting around your product, you would shrink the product resize to a smaller % (e.g. 20-40%).

Secondly, you should also consider repositioning the product up or down, left or right in the output image. For example, if you are generating a product image of a shelf on a wall or glasses on a humans face, you would want to move the placement of the product upwards in the output image, changing the vertical product placement to +20 rather than the default -20. Alternatively, if you wanted to generate an output image with a focal element such as a human or a dog, you would want to change the horizontal product placement to the left or right, to give the AI space to generate these elements in a natural way.

The anatomy of a prompt

Prompts should consist of three key elements: the subject (e.g. your product), how the product should interact with its environment, and descriptor words that explain the setting or the style/aesthetic of the image.

[Subject] in/on [scenario/location], [descriptors / camera lens]

Prompts should consist of a grammatically complete sentence, with a list of descriptors at the end separated by commas. Ideally they should be between 10 and 15 words in length - too many words can lead to worse results as the AI tries to complete too many concepts.

Important Tip: Do not be polite to the AI! Do no start your prompt with "Please place my product here" or "my product is X". Additional words and context will reduce the quality of the output.

Subject

Here you should name in generic terms your product (e.g. “chair” in the first example below). Do not add your specific brand name, as the AI is unlikely to know what this is. Rather, indicate in a generic word or two what you are creating a lifestyle image for. Try and be as generic as possible, e.g. "small appliance" or "container".

Scenario / Location

- The scenario should explain the setting of where the object is, for example in an office environment or outdoors
- You should describe how the subject interacts with the scenario (e.g. on a table; or “beautiful woman wearing necklace”)
- It can also include any additional descriptions of the setting, such as “fireworks in the background” or “window in the background”

Descriptors

These should be a list of properties of the style of image you want to generate, such as angles, lighting, descriptors of the background (e.g. “studio lighting, high quality, film grain”). This is why it is important to visualise the image you want to create at the outset, as different images will require different descriptors. Please see a non-exhaustive list of example descriptors that perform well:

Realism: realistic, realism, highly detailed, hyper realistic, super detailed, trending, award winning photo, award winning photography, stuido photograph, insanely detailed, intricate, hypermaximalist, elegant, photo, canon, nikon, detailed, photorealistic, photography, full color, shallow depth of field, Sony camera, Nikon camera

Lighting: natural lighting, diffused lighting, hard lighting, volumetric lighting, neon lighting, sunny, cloudy, partly cloudy, raining, dark, night, day, evening, overcast, shadow, haze, foggy, sunrise, Afternoon, sunset, dusk, dawn, overhead lighting, underneath lighting, left side lighting, right side lighting, reflective, bright, godray, beams of light, sunbeam, ray of light, shining, glow

Environment: detailed background, bright and airy room, furnished room, atmospheric, furniture, detailed, Hardwood flooring, granite flooring, rug carpet, windows in the background, simple door, entrance, white background, plain background

Camera Lens

Camera lens impact the sharpness, depth of field and overall visual composition of photography. They are important to include in prompts not only to help the AI to define what type of image to produce, but also because our AI models have been trained on a large number of labelled images, and images labelled with a camera lens typically are of quality. The type of camera lens to include will depend on what type of image you are trying to create:

Landscapes: A wide-angle, 24mm lens is useful when generating outdoor product imagery in landscapes, where you want to capture the breadth of the scene. The output image may have a greater sense of depth and dimension. [Prompts: wide-angle lens, 24mm, etc]

Portraits: A 85mm telephoto lens is typically used for portraits. Including this prompt can help generate a flattering perspective, isolating the subject in sharp focus and creating a blurred background. [Prompts: 85mm photograph, portrait photography, telephoto lens, etc]

Close-ups: A 100mm macro lens is ideal for capturing close-ups of small objects with high magnification and sharpness. It can be used to generate images product images that are highly detailed or textured, and can help showcasing the quality and craftsmanship.. [Prompts: 100mm, 185mm, macro lens, close up, etc]

Generating Humans

When generating images of people, close ups shots work better currently than full-body shots. This partly due to the fact that AI is much better at generating realistic faces than hands / limbs.

Creative vs Precise Mode

Finding the right balance between accuracy and creativity is one of the most challenging aspects of training Generative AI neural networks. You may have experienced this yourself, with higher ChatGPT temperature settings that result in AI "hallucinations", where ChatGPT will invent sources or make factually incorrect statements. At Ecomtent, we've addressed this by allowing customers to seamlessly toggle between two distinct AI modes: Precise and Creative.

Precise Mode: In the precise mode, customers can expect their generated product images to be represented entirely accurately without 'hallucinations' common in AI generated images. Customers would typically opt for the precise mode for uncommon or unique products, which otherwise Generative AI may add additional elements to given the presence of these elements in the majority of the data it was trained on (e.g. adding a handle to a handless mug).

Creative Mode: In the creative mode, customers can expect a touch of artistic interpretation and variation while still maintaining a realistic visual presentation of their products. This mode allows for the generation of images with imaginative lighting, compositions or backgrounds, enhancing the overall aesthetic appeal of the output. The creative mode is the preferred choice for customers seeking a touch of artistic flair and enhanced visual appeal in their product images while still maintaining a realistic presentation. It works best for more recognisable / common product shapes.

Experiment!

The last, and perhaps most important tip is to experiment. This is a new technology, which even the most expert AI Researchers do not fully understand. If your prompt is not working, try alternative phrases.

Examples

Input Image

Output image

Prompt

chair in office setting, desk, high quality, hdr, 8k, studio lighting

Precise mode, central positioning
‍

portrait of beautiful woman wearing necklace, studio lighting, model, high quality, hdr, film grain, 8k

Creative mode, upper middle positioning

85mm photograph of dachshund sitting next to packet of dog food

Precise mode, lower-left positioning, resize to 50%

product on a white worktop, with dentist in labcoat and blue gloves in dentists office in the background

Creative mode, lower-left positioning, resize to 25%

vitamin container in a gym. people working out in the background

Precise mode, lower-left positioning, resize to 50%

85mm photograph handsome African American man wearing sunglasses and smiling. white teeth.

Creative mode, upper positioning, resize to 100%

Prompt Engineering Guide