MidJourney App (Draft)

2024

MidJourney

Web App version of MidJourney with added functionality

Product Design

Web App

Introduction

Introduction

Midjourney is an AI-powered tool that generates images based on text prompts through a user interface primarily accessed via Discord.
This project was to create a web-app version of the MidJourney experience adding some extra functionality along the way. ​Some of the addition functionality included a tool that helped with prompting, a prompt library and a built in file explorer.

Design Stage time frame

2 months & on-going

Project Type:

Conceptual (Unsolicited) Product Design

This is a side-project that I decided to complete on my own to work on my design skills outside my daily job. I chose MidJourney in particular as it is a product I see huge potential in and use quite regularly.

Many designers will know the feeling of analysing the tools they use and constantly seeing ways they could add or change things, and this is a product of that mindset.

For me in particular, the mix of a somewhat advanced tool and artificial intelligence is the sweet spot but being for the consumer market as opposed to enterprise was something a little different.

(Solo Project) Product Design, UI Design

This was a solo side project

Note

This case study is extremely in-depth so if you want the TLDR -> click out this overview Dribble Post.
Also on Dribble I have some short intros & walkthrough's to some of the features.

Both the case study and project is still a work in progress, I aim to keep updating the study as I go, but if you want to check out my progress, check out the Figma File here

Context

Context

Feedback (From the web)

As someone interested in Generative AI, I have dabbled quite a bit in a variety of image generators and each have their own strengths and weaknesses. However, I wanted to understand what other users thought of MidJourney, so conducted some quick research to find out what people were talking about online.

The Good

Image output quality.

MidJourney’s image output quality is commonly touted as the best when compared to competitors and while this is difficult to measure, just by looking through articles ranking the best AI image generator, MidJourney consistently ranks in the top 2 or 3 for image quality.

A unique community experience.

While the Discord server is a frustration for some, for many it is a valuable resource and quite a unique experience among AI image generators. The community channels which spew out a somewhat chaotic feed of prompts and images are accessible to all, and so can be collaborative.

See a generation from another user you like and want to riff on? You can easily upscale and start creating variations. This is not something possible in Dall-E or Stalbe Diffusion.

MidJourney is fundamentally built around a community of users.

The Bad

Creative Control and Specificity - the Closed Model

One of the key challenges is achieving the desired level of creative control over the final images. As MidJourney uses a closed model, user need to understand how to effectively use parameters and being very specific in prompts.

Contrast this with Stable Diffusion which allows user change the base model used to generate the images as well as use tools like LoRAs, Embeddings and In-Painting to gain a deeper level of control over their output.

Why Discord?

One common complaint and point of frustration for users of Midjourney has been the requirement to use the service through a Discord server. This setup is seen as less accessible and more complex for some users, especially those not familiar with Discord.

According to a comparison by Zapier, the ease of use for Midjourney is rated lower than some of its counterparts, primarily because access is exclusively through Discord.

Personally, I enjoy using the application via discord as I am already familiar with the app and use it regularly for other things, however, it definitely has its limitations and as seen in some of the questions on the MidJourney subreddit, many user struggle with the app.

Note

In Q4 2023 there was good news for those who prefer not to use Discord for generating images with Midjourney. The company has started testing an alpha version of its web-based platform, which allows image generation directly from the web, bypassing the need to use Discord.
I took this into consideration while creating this case study.

The Opportunities

Giving the user more control

As discussed, MidJourney’s image output is among the best however users can struggle to get what they want from the tool, often requiring an in-depth knowledge of parameters and use of specific words and phrasing to get the exact output they want.

Without access to external tools and limited by the closed-model, the only way to adjust their output is by improving the prompts they enter. While MidJourney has great documentation regarding prompting, inside the app itself there is no real guidance.

This gives rise to an opportunity to provide users with tools to help them create more powerful, effective prompts.

A unified experience

As Midjourney is still working on the Alpha version of its online web app it is unfair to judge it too harshly but currently the web app completely removed the community aspect to generating images that is present in Discord. Instead, users will create their images in their own workspace, outside the public channels seen on Discord.

While this is great for those who find these public channels chaotic and difficult to use, I believe it removes the user from the true essence of what makes the MidJourney experience unique; the community.

I believe that there could be a way to blend both approaches so that the feeling of community, that is so important to MidJourney, is not lost but it is also easier for users to create their own private workspaces.

Structure

Structure

Current Information Architecture

As mentioned previously, the MidJourney experience is split across 3 different apps and websites. Before jumping into any design or ideation, it was vital to get an understanding of the full set of features and functionality across these three different tools. 

To do this, I mapped the entire information architecture.

For certain functions you are forced to switch between apps.

  • For login, both apps are linked to the Discord Login.

  • When trying to edit subscription settings in Discord, users are redirected to the MidJourney Alpha website.

  • When user click on the “Community” nav link in MidJourney Alpha website, this automatically open Discord.

  • When users click on any links for the documentation in either app, they are directed to a 3rd site, docs.midjourney

No image generation on MidJourney Alpha (currently)

On top of this, users cannot search by image on Discord, but on the website, its possible to search either their own images or the entire communities based on an existing image.

There is plans to add this to the MidJourney Alpha site but currently, this functionality is not available.

MidJourney Alpha has no real community aspect

For the community features, the MidJourney Alpha site, is severely lacking. Currently the only real community feature is the “Explore” page where users can see other users images, but there is no way to interact with any users via the website.

As previously mentioned, when a user clicks on “Community” they are redirected to the Discord server.

There is also no way to view a users profile. The only way to see a specific users created images is to filter the explore page by that user.

Discord search is limited compared to the MidJourney Alpha website.

On top of this, users cannot search by image on Discord, but on the website, its possible to search either their own images or the entire communities based on an existing image.

There is plans to add this to the MidJourney Alpha site but currently, this functionality is not available.

Considerations

Current Information Architecture

Whether people like using it or not, its a fact that MidJourney's experience is not the same without Discord. To completely do away with the MidJourney Discord would be impossible, even branching away like team is doing with the Alpha for the new website will bring friction and push back as the community splits.

Unfortunately, this is what needs to happen in order for the product to improve and continue to grow. However, my view is the Discord should be a click away should the user want to quickly navigate to any of the community Channels.

Product Framing

It’s important to consider Product Framing with this project because this seems to be a point of confusion and division in the community.

Using Midjourney through Discord requires a certain amount of understanding and willingness to dive in, read some documentation and gain an understanding of the various parameters, tools and techniques to create a good image generation.

On the other hand, the web app in its current state is extremely accessible and intuitive and clearly targets a more “casual” user; someone who doesn't want to have to set up their own Discord server or navigate the various public channels in Discord.

There is a little dissonance in these two approaches as they are framed quite differently for 2 different kinds of users so for this project it is important that I understand how my solution fits into this space and what framing approach I take.

Layout

Layout

Analysis of existing screen layouts

When updating or redesigning a product, the “status quo bias” can cause huge friction for existing users. Essentially, they have become comfortable with the existing product design and functionality and are resistant to change.

It is vital so, to ensure that the new updates are similar enough to help overcome this bias and allow users to quickly learn and become comfortable with the new design. In this project to aid with this change, I decided to base the basic layout on the existing designs that users are accustomed to.

To to this, I analysed the existing screens to draw out the most important aspects to maintain while looking for opportunities to improve the layout in minor but meaningful ways.

Basic “Workspace” layouts & Updates

Analysis & changes

As both of the existing user interfaces follow a standard web app it was easy for me to continue with this approach, however I did have to make some decisions and slight modifications.

  1. Prompt Bar Positioning

  • Firstly, the prompt bar location on Discord is on the bottom. This follows many messaging application layouts where the input region is at the bottom of the screen and within the “conversation feed”, the bottom most message/image is the most recent.

  • The Midjourney web app however follows the opposite approach with the prompt bar at the top, and the most recent message/image is at the top.

  • I decided to maintain the Discord approach (Prompt bar at the bottom) as I believe it maintains the user experience users are most familiar with.

  1. Updated Navigation options

  • For Navigation, I have maintained the collapsible left side nav menu but added 2 extra navigation elements:

    • a minimal top bar that will hold high level navigation links and page titles.

    • a right side menu (Toolbar) that will house the Tools. Clicking these tools will toggle the right side menu.

Image Generation (image grids, variation & upscale screens)

Analysis & changes

Whether people like using it or not, its a fact that MidJourney's experience is not the same without Discord. To completely do away with the MidJourney Discord would be impossible, even branching away like team is doing with the Alpha for the new website will bring friction and push back as the community splits.

Here is a breakdown of these details and actions:

Prompt Input Bar

Analysis & changes

Another important component, the prompt bar is quite a challenge. Currently Midjourney uses the Discord Chat Bar which is jam packed with functionality and comes in a wide variety of compositions.

Some of the most important features that I would need to include and take inspiration from were:

  • Chat Bar action & menus (Gifs, Stickers, emojis)

  • Add attachments

  • Commands (using "/")

    • Certain commands manipulate the possible inputs (/blend requires minimum two images)

Some possible areas for improvement were:

  • Use of space

  • Progressive disclosure (some commands take multiple steps, this could be made clearer and easier to do)

Showcase/Explore

Analysis & changes

This UI component is extremely different in both iterations of the application, both having pros and cons.

In the Discord app, this is experienced as a normal text channel with the main feed containing both images, share posts, simple text messages and threads.

The pros of this approach is that the “browse” channels are geared towards community engagement and interaction, with the ability to comment, emote, like and share within the channels.

Where this approach struggles is that it is quite cluttered and not easy to browse through the channels focusing purely in images.

Its is nearly the opposite in the web app where the presentation is purely visual, using a grid that fills the page with images. However, there is no community interactions other than the ability to like and image. No comments, threads, nowhere to share other posts. Also, from what I can see, all the explore pages are automatically populated rather than created by the community.

While some of these are more feature analysis than layout related, these different approaches influences my approach to the layout (more on the features in later sections).

Search Results

Analysis & changes

When using Midjourney via the Discord app, I rarely found myself using the search, one because its somewhat hidden and I never realise I can use it, and secondly when I do you it, the results are shown in a very unfriendly way for users, crammed in the side bar.

Compare this with the web app and I found I was encouraged to “Search by image” on each image overlay, leading to me using the search bar much more. The results page is also great, modelled after the Explore page’s layout.

For my approach, I leaned more towards the web app’s approach.

  • Firstly, results fill the page, with a bottomless scroll.

  • Unlike the web app however, I made the search results a full page overlay. This is to facilitate the user doing a quick search for inspiration without losing their current progress.

  • I also liked how both options included sorting options and though it would be useful to include filters too, although not as a persistent element on the main page, rather a togglable section.

  • I also wanted to include quick access to image generation actions via the results page, allowing users to quickly generate new images directly from any image shown.

UI Design Anaylsis

Analysis of existing visual design language

As previously mentioned, making huge changes to products that users are comfortable with comes with high risks that users will be unsatisfied with the new approach.

To overcome this, I did a analysis of the existing color, copy and UI elements to give my design approach a solid foundations upon which to create new updated designs with the aim of maintaining some familiarity for existing users.

Color

General UI/Interaction Elements

Basic design system & components

Basic style guide

To begin designing the new UI, I first started off with a simple Style Guide focused on providing a strong basis for dark/light mode colors, type face and font variations, icon sizes and types and a few blurred fills.

Basic component library

Following an analysis of the existing UI elements in both Discord and the Midjourney website, I started to build a component library.

While I didn't want to get deep into building a full design system, I know from experience that having some basic components can help greatly with the UI design process and so I started off with the ground level components.

I added to this library slowly through our the design and made many page/UI element specific components, if you want to see more, check out the Figma file here.

Product Design & Features

The following is the current MVP product design that I have created based on the analysed Layout and Visual Design. I have decided to go through the design page by page, starting with the basics and ending with the more complex user flows.

General Layout

General Application layout and Navigation updates

The Application layout is ver minimal with an emphasis on maintaining as much space as possible for the central workspace area.

The Layout is designed to allow the user to hide the side panels as well as minimise the prompt bar at the bottom to allow for the best browsing experience.

That said, should the user want to expand all the tools and navigation panels, this created a more “pro” user environment with all the tools close at hand.

The main navigation links follows those on the existing web app, with a second layer of navigation links behind the main three:

Channels: These are workspaces similar to Discord Channels where users can generate images

File: Here the user can manage their creations in folders. Uploaded images are available here, as well as saved or liked images.

Explore: This section gives the user access to curated pages of all Midjourney’s user creations, allowing the user to explore the vast image database Modjourney has.

The Tool Menu is a new addition and holds some of the proposed new functionalities. This space could also be used in the future to add more tools or even allow users to add plugins.

When a user selects a tool, the Tool Panel will expand from the right, allowing the user to interact with the tool while still using their Main Workspace.

The tools themselves are explain in more detail later in the case study.

Channels and Showcase/Explore Pages

One of the major changes I decided to make with the user experience for my approach is related to image generation channels and Showcase/Explore pages.

While using Midjourney via Discord, both generation and showcase use the default Discord Text Channel layout and interactions. This is great for community engagement and interaction with the ability to start comment threads, share links and other post. However, it is not a great experience for users wanting to explore other users images.

As mentioned previously, the webapp goes the opposite direction, completely removing the social community interactions in favour of better, curated explore pages

For my approach, I decided to try and make space for both approach's and as such seperated the two use cases into two different page types.

Image Generation Channels

Use case: Image generation and community interactions

These pages aim to maintain the full experience users get from Discord channels.

Possible Actions:

  • Image generation basics

    • /imagine, /blend, /describe, /show

  • Upscaled images + actions

  • Save images

  • Download images

  • Copy/save prompts

  • Comment on images

  • (Optional) Add comments directly into channel

  • Create channels

  • Invite others to create in your channels

  • Favourite channels

  • Change channel view options

Explore Pages

Use case: Image curation and exploration.

These pages follow the web app explore pages. Best used for inspiration and browsing Midjourney’s vast library of user generated content.

Possible Actions:

  • Save images

  • Download images

  • Search images

  • Copy/save prompts

  • Comment on images

  • Favourite explore pages

  • Change page view options

Channels and Showcase/Explore Pages

Channel workspace - Image Generation output cards

Possibly the most important aspect of the design of Channels is the image output cards. These are fundamental to the user experience as they are the artifact that the Generative process outputs.

With this in mind, I put a huge amount of work into the design focusing on the following:

  1. Allow users to manipulate the view of the output (View Options)

  1. Considering space - in discord, one of the main opportunities I noted was to improve the use of space, particularly focusing on pushing the images to the forefront.

  1. Functionality density - these cards need to contain a large amount of functionality (see analysis here) and in a way that doesn't overwhelm the user.

Explore Pages

Image Overlay

For both Image Generation Channels and Explore Pages, users can click on an image to see more details. For this, I decided to use the same overlay.

Generating Images

Basic Prompt Bar use

The prompt bar is a fundamental element on any AI generation tool, however, with the huge amount of other features and functionality included in this web app, I tried to maintain its prominence but not make it overbearing in its use of space.

I really love the Discord experience of using command, but wanted to make the experience more user friendly and refined. I did this by trying to separate the different steps of creating a prompt.

Open prompt -> chose prompt type -> write prompt

Basic In-Channel Image Generation

Another way user generate images is by interacting with existing images. This includes actions like upscale, vary, zooms etc.

I found this experience to be lacking in the Discord app and wanted to make it more streamlined.

Advanced Prompt - Tool Panel

Advanced prompt (Parameters) can also be opened in the Tool Panel on the left side of the screen.

Prompt Assistant

The prompt assistant tool is an expansion to the Advanced Prompt bar. Its aimed at beginners who are unaware of these tools.

It can also be customised to provide a powerful tool for experienced users who want to quickly add, adjust and track their parameters easily.

Prompt assistant can also be opened in the Tool Panel on the left side of the screen, allowing the users to see more of the options stored in its huge library.

Prompt Assistant - Tool Panel

Just like advanced prompt, the Prompt Assistant can also be opened in the Tool Panel on the left side of the screen, allowing the users to see more of the many options stored in its huge library.

Feature Backlog

Feature Backlog

As this is a side project with quite a lot of components and features, I am still working through the design of many pages. Currently, my backlog consists of pages where the concept and wireframing is done, but the UI design hasn't begun yet

Vary Region editor

I plan to create an updated vary region selector with a focus on giving user more control.

  • More accurate selection controls

  • Smart object detection

My Profile

Planned sections:

  • Profile details

  • Public channels

  • Creations

  • Social controls

  • Application Preferences

  • Account settings

  • Subscription management

Notifications

Planned sections:

  • Channel notifications/alerts

  • Community notifications

  • Image notifications

  • Edit notification details

Community pages

Planned sections:

  • Changelog/updates feed

  • Events

  • Announcements

  • Feature feedback

Help and Support pages

Planned sections:

  • Midjourney user manual

  • Subscription/payment support

Character Management/Contextual prompting

A major use case within the image generation community is the ability to reuse the same character over multiple prompts. This can be useful for those who are using Midjourney to create comics, graphic novels or creating consistent marketing material.

Midjourney recent released an update with this functionality built into the model, and I hope to create a UI element that will help user manage this via the application.

Thanks for reading, check out more of my work

Thanks for reading, check out more of my work

Sphinx

Sphinx

tldr

Tool for controlling Sphinx range of smart cameras.

Tool for controlling Sphinx range of smart cameras.

Web App

Web App

Product Design

Product Design

Reach out and connect.

Interested to know more or think we could work together?

Reach out and connect.

Interested to know more or think we could work together?