Briefcase — Blog
Posts
💼 How will multimodal AI transform the accounting industry?

💼 How will multimodal AI transform the accounting industry?

PLUS: Xero announces an AI assistant, Shadow AI infiltrates the workplace and Embedded Accounting raises millions.

May 28, 2024

Hey there!

Welcome to Briefed, your go-to source for the latest Accounting x AI news. In today’s post, we cover Xero’s new AI assistant, the rise of shadow AI, embedded accounting, and the potential for multimodal AI to transform the industry.

Let’s dive right in…

[Read time: 5 minutes]

📰 Accountech news

Xero joins the party with AI copilot JAX

After Quickbooks announced Intuit Assist in September last year and Sage announced Sage Copilot in February (both still in early access), Xero has finally announced its own contender, “Just Ask Xero” or JAX.

The release video was very marketing-y, so we aren’t reading into it too much. One thing that did stand out was an emphasis on JAX's ability to complete tasks in addition to collating and reporting on data. Examples included editing and sending an invoice or running payroll. The beta version of JAX is coming later in 2024.

Another thing caught my eye — is this a new Xero dashboard UI in the works?

Just Ask Xero

Are you employees entering sensitive data to ChatGPT?

Employees are jumping the gun with AI adoption.

A recent report by Cyberhaven highlights that the use of AI in workplaces has surged, with employees increasingly inputting sensitive data into AI tools. From March 2023 to March 2024, the amount of corporate data entered into AI tools increased by 485%. Notably, 27.4% of this data is sensitive, up from 10.7% a year earlier. Despite the availability of secure corporate AI accounts, 73.8% of ChatGPT usage in workplaces involves personal accounts, which lack proper security measures.

This rapid, unsanctioned adoption, termed “shadow AI,” poses significant risks to data security.

🚀 Startup corner

The hottest topic in accounting tech: embedded ledger software

VC money has been funnelled into embedded accounting in the past weeks with two notable rounds:

Teal raised an $8m seed round to build the “Stripe for Accounting”
Layer raised a $2.3m pre-seed round to “disrupt QuickBooks’ moat.”

Teal’s website

Layer’s website

Teal and Layer both sell to developers who want to integrate accounting software into their existing products. For example, many new e-commerce companies start by using Shopify. Imagine if the Shopify platform also offered accounting so that customers could manage their books in the same place they manage their sales. I like the idea in theory, but I’m curious about their strategy for getting accountants on board. There's no point in creating new ledger software if there are no trained professionals willing to use it, especially in the SMB market where most accounting is outsourced rather than done in-house.

🤖 AI debrief

Multimodal is the new norm

Recent AI advancements have made headlines with the introduction of Gemini 1.5 and GPT-4o. These updates highlight a significant trend: multimodal capabilities are becoming the new standard in AI.

Gemini 1.5, Google’s flagship AI foundation model, was released in February this year with two standout features:

Multimodality across text and images
A context window of over 1 million tokens

This allows the AI to complete some pretty impressive tasks, such as identifying a specific scene in a movie from a hand-drawn image. In the past two weeks, Google announced an update to Gemini 1.5 pro which will introduce audio as a new modality. Currently, audio needs to be transcribed into text (speech-to-text) for the AI to process it. However, when the new update goes public, Gemini will be able to process audio natively and concurrently alongside text and images.

GPT-4o was released a couple of weeks ago by OpenAI with significant improvements in multimodal capability. First, GPT-4o will soon be able to process audio in addition to images and text (much like Gemini) and will have more efficient image-processing capabilities to better understand and generate responses based on visual inputs. More importantly, the AI also has better contextual understanding when dealing with multimodal inputs. For example, it can integrate and correlate information from text, images and audio to provide more coherent and contextually accurate responses. Here’s one of my favourite examples — an AI maths tutor:

Oh, and the GPT-4o API is 50% cheaper than its predecessor, GPT-4. A cheaper API means cheaper AI products for everyone, which is welcome news.

What does this mean for accountants?

“Multimodal” refers to AI’s ability to handle multiple types of data simultaneously. Traditional AI models focus mainly on text, but real-world applications often need a mix of text and images (and sometimes even audio).

This is especially true in the accounting industry. Think about the variety of data formats that get inputted into ledger software: Images, PDFs, and CSVs, just to name a few. More often than not, accountants are faced with a combination of the above, such as an email with an image attachment and text body saying, “This invoice has been paid”. AI can now both analyse the image and read the accompanying text simultaneously in order to process the invoice appropriately.

We believe the capability of multimodal AI will facilitate a step-change in efficiency for accountants utilising AI-native products. For example, rather than using OCR technology to scan a receipt, leaving a bookkeeper to review and post it manually, AI will read, understand, and process the receipt from start to finish.

That’s all for today! Subscribers will receive new posts directly in their inbox every other Tuesday. In the meantime, feedback is always welcome—hit reply and let me know what you think.

Until next time 🫡

Reuben