How to clean up Confluence in six steps
- Outdated pages with unreliable content
- Unnecessary content using up precious storage
- Cluttered and confusing navigation
- Use a Confluence labels system to keep content under control
- Using Confluence templates and blueprints to better structure your instance
There Is No Silver Bullet
While there are common symptoms, ultimately every messy Confluence instance is messy in its own way. For some, the solution will be deleting old content; others will need a complete re-arrangement of their instance; some might need to go after add-ons and macros as part of a cleanup, while others will focus entirely on pages.
‘What’s the best way to clean up my Confluence instance?’ is a question without a one-size-fits-all answer. However, based on our consulting experience and the feedback we get from ScriptRunner for Confluenceusers in organisations of all shapes and sizes, we’ve created a step-by-step plan you can adapt to your unique situation.
Step 1: categorise all your spaces
- In the header bar, click the spaces menu, and select "Space directory" at the bottom of the menu. This will give you a complete list of all spaces in your instance.
2. Using a simple spreadsheet, identify:
As you classify spaces using the method above, you will start noticing patterns and see the same questions emerge for different groups of people. Some teams might have a lot of spaces that seem to be essentially the same thing - maybe they can be combined? Other spaces appear to be really old - have they been completely abandoned? Remember, at this stage you are just taking stock and understanding what belongs to whom, so don’t take any action just yet.
It’s good practice for a general cleanup operation to also take stock of the apps used on your instance. Are you really using all the apps you’re paying for? Can you save money by replacing multiple smaller apps by one app with multiple features?
Step 2: investigate content quality
Once you have categorised your spaces, you will have a better understanding of the main things you need to tackle. The next step is a deeper dive into your content:
Find pages and spaces that are not getting views and updates.
Finding pages and spaces that get little to no views is a good way to find potentially unnecessary content. You can use the Confluence Usage Stats plugin from Atlassian (note that it is known to cause performance problems on large installations) or other apps available on the Marketplace. If you are already tracking your Confluence instance with Google Analytics, you can see that information in the analytics console.
If pages (and spaces) haven’t been updated in a long time, you may no longer need them. An easy way to find these pages is by using the ‘Old Content notifier’ job in ScriptRunner for Confluence. Use a CQL query to focus on the content you're interested in and find pages that haven't been modified in a long time. Then either run the job now or schedule it so you're regularly notified of pages that are getting old and less relevant to your users. This is a great way to ensure you can easily stay on top of your content in Confluence without needing to do the heavy lifting yourself.
The Space Statistics built-in script in ScriptRunner for Confluence will let you further compare and contrast your spaces based on: number of pages, comments, and volume of attachments, number of labels used, distinct number of users that commented, creation date. This extra information will give you more understanding on the problems you’re facing, and which spaces are most affected.
Now is the time to ask the space owners to get involved in the process. For example, they can tell you for sure what can be safely cleaned up and what needs to be kept, if the four spaces which seem to contain more or less the same thing can be reorganised into a single one, or if the huge quantity of attachments taking up storage space can safely be deleted.
If you’re using the Display Pages with Labels macro, you can create a Confluence page displaying the content you identified by label, and point your admins in that direction.
If you have ScriptRunner for Confluence, take a look at the Custom CQL Macro. It will allow you to use a CQL search to display to each space admin all the content they need to check organised by label, in a neat table on a single Confluence page.
Step 4: Removing unnecessary content
Once your space admins have given their feedback, you’re finally ready to clean up. This is the time where you decide what to delete, and what to archive.
Archiving unnecessary content
When it comes to Confluence, the term ‘archiving’ can be a bit confusing because it can mean different things to different people. Depending on what article you are reading or what expert you are talking to, the same term is used to either mean ‘moving content in a reserved space on the same instance’, or ‘moving old content to a different server in order to save space’.
If the main purpose of your cleanup is to make it easier for people to use and find things in Confluence, it may be enough to archive old content in a separate space on the same instance so it doesn’t confuse users.
As an admin or space admin, you can even ‘hide’ old pages in the same space by using Space tools>Reorder pages, then moving them outside the page tree and restricting them. This is a fast, easy way to ‘clean’ the space by hiding content just outside the space, but keeping it accessible.
If the main purpose of your cleanup is to make more space on your instance, you will have to either completely delete unnecessary content, or to take it off the instance.
While you’re moving content around, you can restrict access to pages - either for moving them to a separate server later or for deleting them. If you’re using ScriptRunner, use the Add/Remove Restrictions to Parent & Child Pages built-in script to add viewing and editing restrictions to a given page and all of its descendants at the same time.
With out-of-the box Confluence, deleting content will be very time-consuming: you will have to delete page by page, comment by comment, attachment by attachment, etc. ScriptRunner for Confluence offers a series of easy-to use built-in scripts that will help you save a lot of time by allowing you to perform deletions in bulk:
- Bulk Delete Attachments - deletes all attachments (or attachments of a specified age) for a page or multiple pages
- Bulk Delete Comments - deletes all comments (or comments of a specified age) for a page or multiple pages
- Bulk Purge Trash - permanently deletes all trash for one or more specific spaces or all spaces
ScriptRunner allows Confluence admins to give access to these built-in scripts to space admins. So, once you have agreed on what needs to be deleted, you can just instruct space admins to move on to the deletion process in their own spaces.
Once enabled, space admins have access to built-in scripts from Space tools> Advanced Space Functionality
Step 5: Reorganise the remaining content
Once you get rid of the content you no longer need, it’s time to reorganise what’s left so it becomes easily accessible and readable.
This can take one week, six months, or more, depending on complexity. Maybe you will only have to organise a few spaces in new templates, but you might have to create a whole new structure and move thousands of pages to it. Whatever you need to do, here are a few things that will make your work easier moving forward.
If you’re looking at reorganising your cleaned-up content in a completely new structure, it’s easier to use the same Confluence instance rather than moving everything to a brand-new instance. Creating new spaces in the same instance and copying content in them is easier than fiddling with export and import across two instances. Once you’ve copied everything, just delete your old structure.
If you’re a ScriptRunner user, take a look at the Copy Space built-in script. You can use it to copy a new template space you created in its entirety - so that once you’ve created a space with a new, better structure, you can copy it again and again.
Add metadata to spaces and organise them using Space Categories
Your reorganised spaces should have a short description that makes it clear to unfamiliar users what the space is about. Also, grouping your spaces with Space Categories will let you locate them much easier in the future.
Add descriptions and categorise all your Confluence spaces
Use templates and blueprints for your spaces and pages
Organising content based on a pre-defined template will ensure you have no more pages and spaces that are lacking any context, and all the information needed for easy navigation is accessible. What’s more, having a consistent style will make it simple to find and digest content. For a deep dive into how to use templates and blueprints, have a look at this article on using Confluence templates and blueprints to better structure your instance.
Basic template for a project plan
Use a standardised label system
Create and document a labelling convention, then clearly communicate it to all users. All new pages added after the cleanup will be easy to find by filtering search results based on the label. You can read more about how to use a Confluence labels system here.
Lack of consistency in Confluence page labelling creates chaos
Step six: Set up processes and educate your users
- Decide which standard templates each department needs. Can you replicate templates across departments to achieve more standardisation? For example, every home page can have the same layout and required pieces of info.
- Besides a standardised labelling system, you might also need standardised naming conventions of spaces and pages.
- To prevent clutter in the future, you might want your space admins to regularly review content for archiving and deleting.
- Clearly document all the agreed-upon processes.
Standardisation doesn’t need to stop at Confluence. Do your project teams need its project space to correspond to a Jira project? With ScriptRunner, you can automatically generate a Jira project when a new Confluence space is created - or the other way around.
Automate your processes
- Use the Script Jobs function to create a check that runs regularly, flags old pages and tags the space owner or admin to take action
- Create a scheduled job that runs a regular script to archive or delete flagged pages which haven’t been actioned after a time you set has passed
- Use Prune old page versions to remove old page versions on a regular schedule