MidJourney App (Draft)
2024
MidJourney
Web App version of MidJourney with added functionality
Midjourney is an AI-powered tool that generates images based on text prompts through a user interface primarily accessed via Discord.
This project was to create a web-app version of the MidJourney experience adding some extra functionality along the way. Some of the addition functionality included a tool that helped with prompting, a prompt library and a built in file explorer.
Design Stage time frame
2 months & on-going
Project Type:
Conceptual (Unsolicited) Product Design
This is a side-project that I decided to complete on my own to work on my design skills outside my daily job. I chose MidJourney in particular as it is a product I see huge potential in and use quite regularly.
Many designers will know the feeling of analysing the tools they use and constantly seeing ways they could add or change things, and this is a product of that mindset.
For me in particular, the mix of a somewhat advanced tool and artificial intelligence is the sweet spot but being for the consumer market as opposed to enterprise was something a little different.
(Solo Project) Product Design, UI Design
This was a solo side project
Note
This case study is extremely in-depth so if you want the TLDR -> click out this overview Dribble Post.
Also on Dribble I have some short intros & walkthrough's to some of the features.
Both the case study and project is still a work in progress, I aim to keep updating the study as I go, but if you want to check out my progress, check out the Figma File here
Feedback (From the web)
As someone interested in Generative AI, I have dabbled quite a bit in a variety of image generators and each have their own strengths and weaknesses. However, I wanted to understand what other users thought of MidJourney, so conducted some quick research to find out what people were talking about online.
The Good
Image output quality.
MidJourney’s image output quality is commonly touted as the best when compared to competitors and while this is difficult to measure, just by looking through articles ranking the best AI image generator, MidJourney consistently ranks in the top 2 or 3 for image quality.
A unique community experience.
While the Discord server is a frustration for some, for many it is a valuable resource and quite a unique experience among AI image generators. The community channels which spew out a somewhat chaotic feed of prompts and images are accessible to all, and so can be collaborative.
See a generation from another user you like and want to riff on? You can easily upscale and start creating variations. This is not something possible in Dall-E or Stalbe Diffusion.
MidJourney is fundamentally built around a community of users.
The Bad
Creative Control and Specificity - the Closed Model
One of the key challenges is achieving the desired level of creative control over the final images. As MidJourney uses a closed model, user need to understand how to effectively use parameters and being very specific in prompts.
Contrast this with Stable Diffusion which allows user change the base model used to generate the images as well as use tools like LoRAs, Embeddings and In-Painting to gain a deeper level of control over their output.
Why Discord?
One common complaint and point of frustration for users of Midjourney has been the requirement to use the service through a Discord server. This setup is seen as less accessible and more complex for some users, especially those not familiar with Discord.
According to a comparison by Zapier, the ease of use for Midjourney is rated lower than some of its counterparts, primarily because access is exclusively through Discord.
Personally, I enjoy using the application via discord as I am already familiar with the app and use it regularly for other things, however, it definitely has its limitations and as seen in some of the questions on the MidJourney subreddit, many user struggle with the app.
Note
In Q4 2023 there was good news for those who prefer not to use Discord for generating images with Midjourney. The company has started testing an alpha version of its web-based platform, which allows image generation directly from the web, bypassing the need to use Discord.
I took this into consideration while creating this case study.
The Opportunities
Giving the user more control
As discussed, MidJourney’s image output is among the best however users can struggle to get what they want from the tool, often requiring an in-depth knowledge of parameters and use of specific words and phrasing to get the exact output they want.
Without access to external tools and limited by the closed-model, the only way to adjust their output is by improving the prompts they enter. While MidJourney has great documentation regarding prompting, inside the app itself there is no real guidance.
This gives rise to an opportunity to provide users with tools to help them create more powerful, effective prompts.
A unified experience
As Midjourney is still working on the Alpha version of its online web app it is unfair to judge it too harshly but currently the web app completely removed the community aspect to generating images that is present in Discord. Instead, users will create their images in their own workspace, outside the public channels seen on Discord.
While this is great for those who find these public channels chaotic and difficult to use, I believe it removes the user from the true essence of what makes the MidJourney experience unique; the community.
I believe that there could be a way to blend both approaches so that the feeling of community, that is so important to MidJourney, is not lost but it is also easier for users to create their own private workspaces.
Current Information Architecture
As mentioned previously, the MidJourney experience is split across 3 different apps and websites. Before jumping into any design or ideation, it was vital to get an understanding of the full set of features and functionality across these three different tools. To do this, I mapped the entire information architecture.
For certain functions you are forced to switch between apps.
For login, both apps are linked to the Discord Login.
When trying to edit subscription settings in Discord, users are redirected to the MidJourney Alpha website.
When user click on the “Community” nav link in MidJourney Alpha website, this automatically open Discord.
When users click on any links for the documentation in either app, they are directed to a 3rd site, docs.midjourney
No image generation on MidJourney Alpha (currently)
On top of this, users cannot search by image on Discord, but on the website, its possible to search either their own images or the entire communities based on an existing image.
There is plans to add this to the MidJourney Alpha site but currently, this functionality is not available.
MidJourney Alpha has no real community aspect
For the community features, the MidJourney Alpha site, is severely lacking. Currently the only real community feature is the “Explore” page where users can see other users images, but there is no way to interact with any users via the website.
As previously mentioned, when a user clicks on “Community” they are redirected to the Discord server.
There is also no way to view a users profile. The only way to see a specific users created images is to filter the explore page by that user.
Discord search is limited compared to the MidJourney Alpha website.
On top of this, users cannot search by image on Discord, but on the website, its possible to search either their own images or the entire communities based on an existing image.
There is plans to add this to the MidJourney Alpha site but currently, this functionality is not available.
Considerations
Current Information Architecture
Whether people like using it or not, its a fact that MidJourney's experience is not the same without Discord. To completely do away with the MidJourney Discord would be impossible, even branching away like team is doing with the Alpha for the new website will bring friction and push back as the community splits.
Unfortunately, this is what needs to happen in order for the product to improve and continue to grow. However, my view is the Discord should be a click away should the user want to quickly navigate to any of the community Channels.
Product Framing
It’s important to consider Product Framing with this project because this seems to be a point of confusion and division in the community.
Using Midjourney through Discord requires a certain amount of understanding and willingness to dive in, read some documentation and gain an understanding of the various parameters, tools and techniques to create a good image generation.
On the other hand, the web app in its current state is extremely accessible and intuitive and clearly targets a more “casual” user; someone who doesn't want to have to set up their own Discord server or navigate the various public channels in Discord.
There is a little dissonance in these two approaches as they are framed quite differently for 2 different kinds of users so for this project it is important that I understand how my solution fits into this space and what framing approach I take.
Analysis of existing screen layouts
When updating or redesigning a product, the “status quo bias” can cause huge friction for existing users. Essentially, they have become comfortable with the existing product design and functionality and are resistant to change.
It is vital so, to ensure that the new updates are similar enough to help overcome this bias and allow users to quickly learn and become comfortable with the new design. In this project to aid with this change, I decided to base the basic layout on the existing designs that users are accustomed to.
To to this, I analysed the existing screens to draw out the most important aspects to maintain while looking for opportunities to improve the layout in minor but meaningful ways.
Basic “Workspace” layouts & Updates
Analysis & changes
As both of the existing user interfaces follow a standard web app it was easy for me to continue with this approach, however I did have to make some decisions and slight modifications.
Prompt Bar Positioning
Firstly, the prompt bar location on Discord is on the bottom. This follows many messaging application layouts where the input region is at the bottom of the screen and within the “conversation feed”, the bottom most message/image is the most recent.
The Midjourney web app however follows the opposite approach with the prompt bar at the top, and the most recent message/image is at the top.
I decided to maintain the Discord approach (Prompt bar at the bottom) as I believe it maintains the user experience users are most familiar with.
Updated Navigation options
For Navigation, I have maintained the collapsible left side nav menu but added 2 extra navigation elements:
a minimal top bar that will hold high level navigation links and page titles.
a right side menu (Toolbar) that will house the Tools. Clicking these tools will toggle the right side menu.
Image Generation (image grids, variation & upscale screens)
Analysis & changes
Whether people like using it or not, its a fact that MidJourney's experience is not the same without Discord. To completely do away with the MidJourney Discord would be impossible, even branching away like team is doing with the Alpha for the new website will bring friction and push back as the community splits.
Here is a breakdown of these details and actions:
Prompt Input Bar
Analysis & changes
Another important component, the prompt bar is quite a challenge. Currently Midjourney uses the Discord Chat Bar which is jam packed with functionality and comes in a wide variety of compositions.
Some of the most important features that I would need to include and take inspiration from were:
Chat Bar action & menus (Gifs, Stickers, emojis)
Add attachments
Commands (using "/")
Certain commands manipulate the possible inputs (/blend requires minimum two images)
Some possible areas for improvement were:
Use of space
Progressive disclosure (some commands take multiple steps, this could be made clearer and easier to do)
Showcase/Explore
Analysis & changes
This UI component is extremely different in both iterations of the application, both having pros and cons.
In the Discord app, this is experienced as a normal text channel with the main feed containing both images, share posts, simple text messages and threads.
The pros of this approach is that the “browse” channels are geared towards community engagement and interaction, with the ability to comment, emote, like and share within the channels.
Where this approach struggles is that it is quite cluttered and not easy to browse through the channels focusing purely in images.
Its is nearly the opposite in the web app where the presentation is purely visual, using a grid that fills the page with images. However, there is no community interactions other than the ability to like and image. No comments, threads, nowhere to share other posts. Also, from what I can see, all the explore pages are automatically populated rather than created by the community.
While some of these are more feature analysis than layout related, these different approaches influences my approach to the layout (more on the features in later sections).
Search Results
Analysis & changes
When using Midjourney via the Discord app, I rarely found myself using the search, one because its somewhat hidden and I never realise I can use it, and secondly when I do you it, the results are shown in a very unfriendly way for users, crammed in the side bar.
Compare this with the web app and I found I was encouraged to “Search by image” on each image overlay, leading to me using the search bar much more. The results page is also great, modelled after the Explore page’s layout.
For my approach, I leaned more towards the web app’s approach.
Firstly, results fill the page, with a bottomless scroll.
Unlike the web app however, I made the search results a full page overlay. This is to facilitate the user doing a quick search for inspiration without losing their current progress.
I also liked how both options included sorting options and though it would be useful to include filters too, although not as a persistent element on the main page, rather a togglable section.
I also wanted to include quick access to image generation actions via the results page, allowing users to quickly generate new images directly from any image shown.
UI Design Anaylsis
Analysis of existing visual design language
As previously mentioned, making huge changes to products that users are comfortable with comes with high risks that users will be unsatisfied with the new approach.
To overcome this, I did a analysis of the existing color, copy and UI elements to give my design approach a solid foundations upon which to create new updated designs with the aim of maintaining some familiarity for existing users.
Color
General UI/Interaction Elements
Basic design system & components
Basic style guide
To begin designing the new UI, I first started off with a simple Style Guide focused on providing a strong basis for dark/light mode colors, type face and font variations, icon sizes and types and a few blurred fills.
Basic component library
Following an analysis of the existing UI elements in both Discord and the Midjourney website, I started to build a component library.
While I didn't want to get deep into building a full design system, I know from experience that having some basic components can help greatly with the UI design process and so I started off with the ground level components.
I added to this library slowly through our the design and made many page/UI element specific components, if you want to see more, check out the Figma file here.
Product Design & Features
The following is the current MVP product design that I have created based on the analysed Layout and Visual Design. I have decided to go through the design page by page, starting with the basics and ending with the more complex user flows.
General Layout
General Application layout and Navigation updates
The Application layout is ver minimal with an emphasis on maintaining as much space as possible for the central workspace area.
The Layout is designed to allow the user to hide the side panels as well as minimise the prompt bar at the bottom to allow for the best browsing experience.
That said, should the user want to expand all the tools and navigation panels, this created a more “pro” user environment with all the tools close at hand.
The main navigation links follows those on the existing web app, with a second layer of navigation links behind the main three:
Channels: These are workspaces similar to Discord Channels where users can generate images
File: Here the user can manage their creations in folders. Uploaded images are available here, as well as saved or liked images.
Explore: This section gives the user access to curated pages of all Midjourney’s user creations, allowing the user to explore the vast image database Modjourney has.
The Tool Menu is a new addition and holds some of the proposed new functionalities. This space could also be used in the future to add more tools or even allow users to add plugins.
When a user selects a tool, the Tool Panel will expand from the right, allowing the user to interact with the tool while still using their Main Workspace.
The tools themselves are explain in more detail later in the case study.
Channels and Showcase/Explore Pages
One of the major changes I decided to make with the user experience for my approach is related to image generation channels and Showcase/Explore pages.
While using Midjourney via Discord, both generation and showcase use the default Discord Text Channel layout and interactions. This is great for community engagement and interaction with the ability to start comment threads, share links and other post. However, it is not a great experience for users wanting to explore other users images.
As mentioned previously, the webapp goes the opposite direction, completely removing the social community interactions in favour of better, curated explore pages
For my approach, I decided to try and make space for both approach's and as such seperated the two use cases into two different page types.
Image Generation Channels
Use case: Image generation and community interactions
These pages aim to maintain the full experience users get from Discord channels.
Possible Actions:
Image generation basics
/imagine, /blend, /describe, /show
Upscaled images + actions
Save images
Download images
Copy/save prompts
Comment on images
(Optional) Add comments directly into channel
Create channels
Invite others to create in your channels
Favourite channels
Change channel view options
Explore Pages
Use case: Image curation and exploration.
These pages follow the web app explore pages. Best used for inspiration and browsing Midjourney’s vast library of user generated content.
Possible Actions:
Save images
Download images
Search images
Copy/save prompts
Comment on images
Favourite explore pages
Change page view options
Channels and Showcase/Explore Pages
Channel workspace - Image Generation output cards
Possibly the most important aspect of the design of Channels is the image output cards. These are fundamental to the user experience as they are the artifact that the Generative process outputs.
With this in mind, I put a huge amount of work into the design focusing on the following:
Allow users to manipulate the view of the output (View Options)
Considering space - in discord, one of the main opportunities I noted was to improve the use of space, particularly focusing on pushing the images to the forefront.
Functionality density - these cards need to contain a large amount of functionality (see analysis here) and in a way that doesn't overwhelm the user.
Explore Pages
Image Overlay
For both Image Generation Channels and Explore Pages, users can click on an image to see more details. For this, I decided to use the same overlay.
Generating Images
Basic Prompt Bar use
The prompt bar is a fundamental element on any AI generation tool, however, with the huge amount of other features and functionality included in this web app, I tried to maintain its prominence but not make it overbearing in its use of space.
I really love the Discord experience of using command, but wanted to make the experience more user friendly and refined. I did this by trying to separate the different steps of creating a prompt.
Open prompt -> chose prompt type -> write prompt
Basic In-Channel Image Generation
Another way user generate images is by interacting with existing images. This includes actions like upscale, vary, zooms etc.
I found this experience to be lacking in the Discord app and wanted to make it more streamlined.
Advanced Prompt - Tool Panel
Advanced prompt (Parameters) can also be opened in the Tool Panel on the left side of the screen.
Prompt Assistant
The prompt assistant tool is an expansion to the Advanced Prompt bar. Its aimed at beginners who are unaware of these tools.
It can also be customised to provide a powerful tool for experienced users who want to quickly add, adjust and track their parameters easily.
Prompt assistant can also be opened in the Tool Panel on the left side of the screen, allowing the users to see more of the options stored in its huge library.
Prompt Assistant - Tool Panel
Just like advanced prompt, the Prompt Assistant can also be opened in the Tool Panel on the left side of the screen, allowing the users to see more of the many options stored in its huge library.
As this is a side project with quite a lot of components and features, I am still working through the design of many pages. Currently, my backlog consists of pages where the concept and wireframing is done, but the UI design hasn't begun yet
Vary Region editor
I plan to create an updated vary region selector with a focus on giving user more control.
More accurate selection controls
Smart object detection
My Profile
Planned sections:
Profile details
Public channels
Creations
Social controls
Application Preferences
Account settings
Subscription management
Notifications
Planned sections:
Channel notifications/alerts
Community notifications
Image notifications
Edit notification details
Community pages
Planned sections:
Changelog/updates feed
Events
Announcements
Feature feedback
Help and Support pages
Planned sections:
Midjourney user manual
Subscription/payment support
Character Management/Contextual prompting
A major use case within the image generation community is the ability to reuse the same character over multiple prompts. This can be useful for those who are using Midjourney to create comics, graphic novels or creating consistent marketing material.
Midjourney recent released an update with this functionality built into the model, and I hope to create a UI element that will help user manage this via the application.