Here's a little servlet I wrote which pulls the Purelytics useraction data from elasticsearch, parses it into a tree structure, and spits out some JSON. The JSON is then converted into a Sankey diagram, similar to Google Analytics' user flow report.
The servlet is attached if you want to try this for yourself.
To run this, unpack the zip file into the webapps directory of a web container like tomcat or jetty. You'll also need to copy the .jar files from the elasticsearch/lib directory into visitflow/WEB-INF/lib.
If your elasticsearch node is on another host, you will need to edit visitflow/WEB-INF/web.xml, and set the elasticserver parameter to point to your elasitcsearch node.
Edit: Added the ability to focus on a particular user action.
Edit 2: Added the ability to generate goal flows. Goal flows are accessed at goalflow.html.
Quite a nice way to visualize this information!
However it seems the current query needs to iterate over all results one-by-one (in batches), so it will likely become quite slow when more data is looked at together with lots of network traffic for all the data, e.g. for me it was not able to handle a few ten-thousand entries in reasonable time.
You would probably need to store the data in a slightly different way to make use of aggregations in Elasticsearch to retrieve the results for larger amounts of data. Probably storing "next"/"previous" user action information could make it work to some degree already.
Thanks for your comment Dominik.
I totally agree. It becomes too slow when the query set becomes larger. I guess a solution to this would be to build the next/previous data set using entity-centric indexing: