Access Keys:
Skip to content (Access Key - 0)

Description/Features

Confluence has lacked a cluster-ready, enterprise scaleable, remotely accessible statistically gathering and analysis plugin ... not any more!
Ready for the Enterprise
The primary objective for this plugin was to build something that can handle 1000 sessions an hour (a moderate load for any seriously sized organisation). That sounds quite a reasonable number, and you would have thought that would be easy to deal with until you crunch a few numbers.

  • 1000 sessions comes to a median average of around 30 events per session
  • 30,000 events an hour
  • 720,000 events a day
  • 5,040,000 events a week
  • 262,040,000 events a year!

Considering that we also have to be cluster-safe, I opted to go for the database to store in – and that's a lot of data to store!

Intelligent Caching
After wiping the sweat from my brow after crunching those numbers (there's something nicely recursive about getting stats on the stats plugin) it was clear: we're going to need some funky caching to make this useable.

The concept of Data Widgets will be explained later, but what matters here is that they expose a hash of their configuration, so that each interval they expose can be cached either in final or partial forms. Partial forms can be cleaned up / updated and final datasets will sit there forever.

  • On a 400,000 event dataset on my MacBook Pro laptop, an uncached widget processing the entire data set into 120 unique queries and took: 36,450ms
  • Repeating the same widget but allowing it to use the cache, including updating any partial intervals reduces the queries to just 3 and took: 783ms
  • It would be even faster if I didn't let it update partial intervals!

Store now, Report later
This plugin listens to events that are fired in Confluence and stores them in an automatic configuring and self updating database (defaulting to the Confluence database). The information gathered is done quickly and grabs as much data as it can without impacting performance – it would rather capture and store too much, than too little, so the reports of tomorrow can utilise the data mined today. It defaults to using Confluence's internal database, but larger installations can choose their own database schema.

Cross Database
The plugin works with many databases and has an architecture to allow individual database types to be tuned and optimised. Upgrades to the database schema are dealt with by a funky library called LiquiBase. The plugin is architectured so that if you decide to use LiquiBase in your own plugins, it will place nice.

Event Queue
Events happen synchronously, so they are queued and processed in the background every 5 minutes (or when the queue grows too rapidly). There is a full event queue manager which allows administrators to kick items off the queue if there are issues, or just allow you to be nosy!

Exception Queue
Should there be exceptions during processing (such as a SQL exception, or a runtime exception) then there is a similar exception manager, which captures the exceptions thrown allowing you determine the association between the exception and the data being processed at the time. This should significantly reduce error diagnosis.

Externally Accessible
Where-ever possible, data will be provided both over a HTML UI, and over a REST API. The aim of this plugin is to ensure that the information is easily exposes allowing third party applications to harness the power of the stats.

For example: exposing popular statistical information to JIRA Studio's dashboard.

Concept: Data Widgets
A data widget is the plugin's terminology for the business logic that goes into processing of the data. It's wrapped by generic functionality, such as the ability to filter each widget by Start & End date, optionally repeat a given interval between those dates, filter further by space (inclusive or exclusive) and finally selecting a number caching options.

For example: you might want Widget A to give you the data from the month of March, broken down into daily/weekly intervals.

There is a default DataWidgetRunner accessible through the UI which gives raw access to run the registered widgets, with the default options selectable and the result outputted as a dynamic table. Pass in output=xml as well and it'll output it as XML instead of HTML, voila - an inst-o-matic RESTful API!

I've also exposed the the runner through a {statsDataWidget} macro, which takes all the customisation options you expect with sensible defaults.

Reports & Report Widgets (not yet implemented)
Report Widgets will take Data Widget and render out the information, possibly processing it further before turning into something else - e.g. a chart of some form.

Reports are simply a collection of Report Widgets backed by a series of Data Widgets, with Report configuration setting global options filtering them down into the data widgets.

Other

  • It is fully internationalised, allowing porting to any language.
  • There are database tools for managing the raw database.
  • Detail debugging to enable you to target certain packages / functionality to debug and diagnose issues. This should also allow better problem diagnosis result in a lower overhead, thus faster bug fixing.
  • Readily accessible information (such as JVM Memory usage, the logged in user, the HTTP session, remote IP address, user agent, referrer etc) is all captured with each event, allowing non-event specific information to be reported on and used later.
  • Fully documented macro in the Notation Guide under Advanced macros.
    This plugin is designed to be a solid foundation, and will likely need modifications to fit on a platform we've not tested - such is the joy of so-called database agnostic SQL statements and structures.

Tested Environments

I have added a Generic database profile which uses standard unoptimised SQL statements. If this profile is used, a warning will be logged with the information needed to identify your database type.

Please note: data widgets can do complex things and sometimes implement their own SQL, thus the support level may vary from widget to widget. Problems? Report them.

It is not recommended that you use HSQL for production systems!

Confluence v2.7.1, v2.8.0, v2.10.3
Databases HSQL (lightly tested), MySQL (heavily tested), MSSQL (known issues)

Macro Parameters:

{statsDataWidget}

Param Value(s) Default Description
widget Widget Class / FQCN ReadWriteRatioWidget The widget class to use (will have the standard package prepended without a package), pick from:
ConcurrentSessionsWidget, MostEditedContentWidget, MostViewedContentWidget, ReadWriteRatioWidget, MemoryUsageWidget
startDate Date matching: "M/d/yyyy" first date on record The first date to include
endDate Date matching: "M/d/yyyy" last date on record The last date to include
intervalType int Calendar.MONTH (2) Integer matching Calendar.TYPE
intervalCount int 0 intervals to count up each iteration (0 means just a single iteration from startDate to endDate)
intervalTitles title 1,title 2,... none Comma separated titles - when it runs out of titles, it'll revert to the interval start date.
spaces spaceKey,spaceKey,... none Comma separated space keys (empty means all included / none excluded)
excludeSpaces boolean false Exclude instead of including the space keys (ignored if above is empty)
cacheRead boolean true Read from the cache
cacheWrite boolean true Write to the cache
cacheUpdatePartial boolean true Update partial intervals in the cache
showHeader boolean false Show widget information
theme basic / horizontal basic The theme you wish to use for the output template.
intervalDateFormatter see SimpleDateFormat d-MMM-yyyy The date formatter you want used for the intervals.
hideIntervals boolean false Hide the interval dates.

Widgets
See separate section which gives more details on the data widget.

Common Interval Types

When setting the intervalType field, here are a list of common interval types.

Calendar Field Integer Value
YEAR 1
MONTH 2
WEEK_OF_YEAR 3
DAY_OF_YEAR 6
HOUR_OF_DAY 11

Common SimpleDateFormat Patters

When setting the intervalDateFormatter field, here are a list of patterns. Examples show 9th April 2009 at 10:28pm.

Pattern Result Description
d-MMM-yyyy 9-Apr-2009 Simple date
MM/d/yy HH:mm 04/9/2009 22:28 Numeric date with 24hr time
MMMM ''yy April '08 Month only with abbreviated year

Data Widgets:

Data Widgets are the number crunchers of the plugin - they are what take the raw data and interpret it into something useful. The AbstractDataWidget API has been made deliberately extensible so that more widgets can be added over time; hopefully we'll end up with a nice collection enabling rich reports to be built on top of them.

When executed, a widget processes the startDate and endDate, dividing the work up into chunks called intervals. The size and quantity of intervals can be specified in the general widget parameters.

Once the list of intervals has been created, the cache is optionally consulted and the remaining intervals are passed through to the widgets bespoke logic for execution. This bespoke logic is described below.

During the bespoke logic a widget implementation is expected to process the data for the given interval with the constraints provided, and generate result data. This data can be simple (like a number or simple text), in some cases it's more complex where it might return a collection of more complex objects.

Read:Write Ratio
This widget takes all the Page/BlogPost/Attachment events for creation, editing and viewing and combines them into the three columns. This should give you a good impression of your read:write ratio, as well as the overall usage of areas of your site.

Class ReadWriteRatioWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none yet
Fields Outputted Creates (int), Edits (int), Views (int)

Concurrent Sessions
This widget finds the number of unique Session IDs (regardless of event type) and totals them up. This will tell you how many unique visitors (not hits) you've had during that period.

Class ConcurrentSessionsWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted Sessions (int)

Most Viewed & Edited Pages
This widget looks at all the viewed/edited pages and compiles a list of the top 10 CEOs for that time period. This widget is typically run without intervals (i.e. intervalCount=0). It doesn't currently render too well with the current themes, but the information is there.

Class MostViewedPagesWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted List<CEOResult> (a list of the top 10 Pages)
Class MostEditedPagesWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted List<CEOResult> (a list of the top 10 Pages)

Memory Usage Ratio
This widget was written in the last 30 minutes of Codegeist 2008 and demonstrates how you can quickly add a new processing widget (the atomic commit revision should be useful for coders wanting to explore). The information outputted is the average free, max and total memory across the interval.

Class MemoryUsageWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none yet
Fields Outputted Free (int), Max (int), Total (int)

Example Usages:

Want to put the read:write ratio output covering all the content in your system on a page? Easy!

{statsDataWidget}

Would you prefer to count the number of sessions you've ever had?

{statsDataWidget:widget=ConcurrentSessionDataWidget}

Prefer to break down the sessions per day?

{statsDataWidget:widget=ConcurrentSessionDataWidget|intervalType=6|intervalCount=1}

Prefer to break down the read:write ratio per month, formatting the interval time (if there is more than one) to "April '08" style, shown horizontally??

{statsDataWidget:widget=ReadWriteRatioWidget|intervalType=2|intervalCount=1|intervalDateFormatter=MMMM ''yy|theme=horizontal}

Want to put the above into a line chart?

{chart:type=line}
{statsDataWidget:widget=ReadWriteRatioWidget|intervalType=2|intervalCount=1|intervalDateFormatter=MMMM ''yy|theme=horizontal}
{chart}

or maybe tweaked to fit my test data set and be a bit prettier:

{chart:type=line|width=650}
{statsDataWidget:intervalType=6|intervalCount=1|theme=horizontal|intervalDateFormatter=dd/MM}
{chart}

which produces:

or you can now go one better and specify the interval titles yourself, and include a nice chart:

{statsDataWidget:hideIntervals=true}

{chart:type=bar}
{statsDataWidget:theme=horizontal|intervalTitles=Read:Write Ratios}
{chart}

which produces:

You can even get funky memory graphs - which will eventually evolve into a proper Confluence health monitoring toolkit! Here is an example which breaks down the statistical data into daily chunks and produce a nice graph:

{chart:type=line|width=650}
{statsDataWidget:widget=MemoryUsageWidget|intervalType=6|intervalCount=1|theme=horizontal|intervalDateFormatter=dd/MM}
{chart}

which produces:

You may want to consult the [documentation for the chart plugin] too.

Enjoy!

Database Tables:

The following database tables are created and used by the Statistical Analysis Plugin

Table Name Purpose
plugin_stats_cache Cache results of running the widgets - contains no useful information outside the stats plugin
plugin_stats_data Every event in confluence results in a row in this table. Events are only placed into this table when the stats plugin sees them. No retrospective data is added to the table and no data will be in this table previous to installation of the plugin (or when the plugin is disabled).
plugin_stats_rpt_pagestats The space report is generated from this table. It contains retrospective data from both imports. Data in this table is added at the same time as inserts to plugin_stats_data. Because of the Confluence Data Retrospective import the data in this table may not be consistent with plugin_stats_data as it will contain data pulled from confluence from before the plugin was installed
plugin_stats_report_pageoverview Superseded by plugin_stats_rpt_pagestats and no longer used. May be removed.

Future Plans:

This plugin is growing on a weekly basis, here is what we're looking to implement (in no particular order):

  • Background report creation (pre-caching of data)
  • Space tabs with the information you find in the Activity plugin
  • Report Widgets and Macros for them (using {chart} works well enough atm)
  • An option not to generate data and to only use the cache?
  • More system or non-event state information - such as the number of threads used, and a popular one: page generation time.
  • Get more events into Confluence (e.g. on RSS read) and improve the data available through the events (e.g. set the request & response on ServletActionContext in events originating outside of xwork - like attachment downloads)
  • Add some analytics which can identify wiki patterns (such as wiki gnomes, heavy users etc)

Screenshots:

Toggle Sidebar

Get Support

Adaptavist provide commercial support to help resolve any problems you may have using our open source Confluence plugins [Find out more...]

Need Hosting?

Adaptavist can host your Confluence wiki and open source plugins [Find out more...].

Author

Pinned Pages

  • No pinned pages.

Popular Pages

Browse


Ask questions, get help and report bugs & issues on our Community Site


View old comments

Adaptavist Theme Builder Powered by Atlassian Confluence