In this article we explore how AI is shaping CiviCRM. We look at a number of approaches - from SearchKit assistants to local models. At the heart of this is the big question of, how can we reap the potential benefits of AI without risk to sensitive data?
At the recent CiviCon in San Francisco, there were a few discussions about how we might increase the use of AI in CiviCRM. San Francisco seemed like the right place to be talking about this! For exmple, on my first day there, I came across a disturbing and provocative billboard that read "Stop Hiring Humans"! but it seemed like AI was already becoming part of the city, with Waymo driverless taxis cruising around and robots making your coffee. For the record: Waymo was actually a really capable driver but the robot was nowhere near as good as a human barista on just so many human levels! It really brought home to me that AI could be better used to increase how productive we could be, without replacing actual humans.
So what might AI look like in CiviCRM? And is it even likely to happen any time soon? Well we already have DocBot helping you find out how to do things based on the documentation. At the Sprint we started improving the documentation with the use of the popular ChatGPT which is really good at summarising docs, ordering things nicely and giving writing a consistent tone of voice that matches your desired style (we did produce the style guide for it). As the documentation improves, this should make DocBot even better.
Insights and agents
The next steps then seem to be in two main areas: 1) using some kind of AI to produce insights from the data within the system and 2) giving DocBot (or similar) some level of agency so it doesn’t just tell you how to build a SearchKit but can actually build it for you (referred to as agentic AI).
In this article I’m going to focus on the insights we can obtain from the data held as this is something we're working on at Circle and it feels like we can make some pretty quick progress. We recently did a project that involved using an external tool to build Data Visualisation dashboards with CiviCRM data and our key concerns were data security and our trust in that system. Considering the scope and potential of the uses of data by AI, these concerns have to be the forefront of our minds when considering letting an AI near your CRM database. Do we really trust that other system?
B(I) before A(I)
In the case of the BI tools used for building the visualisation dashboards, we had to go through a process known as ETL (Extract, Transform and Load) and in the process we removed names, phone numbers, addresses and any obvious PII. We think it's important that something similar is happens before any AI is allowed access to the data. This can also add the advantage of producing a simpler data structure which will could make it easier for the AI to extract information.
Alain Benbassat from Business and Code has done some really interesting work on an ETL process based on using CiviCRM as a potential visualisation platform. Taking advantage of entities and SearchKit, his code creates a star schema in CiviCRM with the advantage that the interface is familiar and roles can be easily transferred. Having CiviCRM cid references means it’s also easy to construct links back to actual contacts in order to view further details or mark them in some way based on the patterns from the anonymised database.
We also think that people will probably want to use different AI’s depending on their purposes. There are many options available now and this number is very likely to increase. Therefore, we think the best option is to enable one connection to a safe set of partially anonymised data on the database. Following strict authentication, Claude, GPT et al could use this data without any risk to sensitive client information. The end user can make a query using a prompt - for example, “Hey GPT, can you tell me what my membership renewal looks like next year if we increase our fees by X%", or "Give me a list of members most at risk of not renewing?”.
I have used terms like safe and anonymised but if we pass through contact IDs, those can be taken back into CiviCRM to reveal the actual individuals. Actions could be defined, such as putting this set of contacts into a group for further processing or comms.
Keeping it in-house
Another approach that we’re exploring involves not letting external AIs anywhere near the data and getting any input from a self-hosted LLM that we’ve trained on CiviCRM’s data structures. At Circle we already operate something like this which helps us with support tickets. It reads the incoming ticket and suggests similar tickets and documentation that may help our team deal with the issue more efficiently. It already works pretty well and we can see that it is capable of much more.
So currently we are looking at how we connect this tool to a CiviCRM database and get it to run in the background with some pre-determined prompts so that it produces useful insights. In this case the prompts might be along the lines of, “Highlight unusual trends in membership, donations and event registrations and summarise in no more than 3 sentences for each notable pattern”. The prompts would evolve over time based on user feedback but would be more curated than the first example where we imagine giving the user full access to ask whatever questions they come up with.
In this case, the AI is operated by us and as we are already processing that data, no additional agreements should be needed. The trust issue should dissipate as we are already the trusted guardians of the source. Also, because we don’t need to train our model on the entirety of human knowledge, it doesn’t need so much processing power. In fact, our current ticketing model runs on a modest virtual machine.
Next steps
We hope to have some progress on both these approaches for the October CiviCamp in the Netherlands. We’re also hoping that the documentation push will continue there.
If you have any questions or thoughts on any of this or if you’d like to be involved in any way, we’d love to discuss it. There is no such thing as a stupid question and it’s mainly through discussions around new ideas that we make breakthroughs. As we make any concrete progress we’ll make further announcements. We very much hope to show something in October.