Articles

How Generative AI Changes the Role of the Analyst - a conversation with Tom Davenport

By Jim Sterne on May 14, 2023 in Articles by Jim Sterne

Recently (Friday, May 05, 2023), Tom Davenport and I had a super interesting discussion about how Generative AI is changing what the analyst does for a living. This field is changing so fast that this conversation has a limited lifespan... so, I'm publishing it here. You'll be able to listen to the excitement in our voices when it gets posted to Data Driven Leaders Studio Public Conversations.

Jim and Tom

Note: Sign up (for free for now) to get notified and to see previous conversations - and to join us every other Wednesday as we tackle new topics.

We talked about the potential of AI in text, image, and video creation, as well as its impact on productivity and competitive advantage - and considered data quality and organization as a competitive AI moat. We talked about the potential for stakeholders to ask better questions and make better decisions based on data, and the switch from Prompt Engineering to systems prompting users in a conversational way; a new, *new* user interface.

Note: I used Otter.ai to create the transcript and then (heavily) edited it by hand for language processing issues, length, and readability. I spent far too long trying to find an AI to do it for me due to so many tools, so little time, and the quality of the output. So, for now, all typos are my own.

Jim Sterne

Welcome to Data Driven Leaders Studio Public Conversation number seven on Wednesday, May 3, 2023. I'm Jim Sterne, founder of the Marketing Analytics Summit and the Digital Analytics Association, and author of Artificial Intelligence for Marketing, among others. My partner in crime is Tom Davenport, author of 23 books on data and analytics, Distinguished Professor of Information Technology and Management at Babson College, and senior adviser to Deloitte's AI Practice. Our intent here is to discuss all things data analytics and AI strategy. With you, our audience. If there's a topic you'd like us to cover, just email me at [email protected]. Today, we're talking about Generative AI and data democratization.

<banter>

Jim

What is the impact on data citizen data scientists and citizen analysts? Tom, this is the topic of your upcoming book, so set it up for us. What is your definition of data democratization? What do you mean when you say citizen analysts?

Tom

The citizen idea is that you can do data related tasks, data science related tasks, development related tasks, and automation related tasks, without being an IT professional.

Jim

I'm thinking of the people who are responsible for making decisions, whom the data people have been supporting. Whom do you democratize first? What criteria do you use for prioritizing training and tools? Is there some criteria other than just looking for data enthusiast?

Tom

It's been a volunteer related activity - it's whoever raises their hands. You have to be curious and technology oriented and a logical thinker. Motivation is a pretty important criterion.

Jim

I think we're going to see a giant shift because, with chat-based, interactive user interfaces, the person who is logical and business oriented and asks good questions, no longer has to be interested in the tech. They don't have to understand the data. I'm assuming that the first impact that it's going to have on analysts is a giant diminishment in ad hoc questions; the, "Hey, I'm just curious about this thing." Because now businesspeople will able to simply ask the system. "What are our top five highest margin products sold in the Northwest in the last four months? What's the projection for the next three months? And how does that compare year-over-year? And email me the results once a month?" And that's programming, right?

Tom

Yes, it is. You can generate whatever code you want, the points and clicks necessary to make this happen pivot tables, etc. But - back to your issue about what will differentiate organizations in this space - that only works if you have clarity about who your customer is. What do we mean by the Northeast, etc.? If you have multiple definitions of the truth, Chat GPT is not going to be able to sort through them.

Jim

Well, when you add the messiness of data to Davenports Law of Common Definitions, which I'm going to ask you to repeat right now.

Tom

Davenports Law of Common Information suggests that the more an organization knows or cares about a particular business entity, the less likely it is to agree on a common term and meaning for it. It's my favorite, cynical law.

Jim

My favorite example of yours is the airline who has 24 definitions for an airport, because it means different things to different departments.

Tom

Only eleven!  But still scary.

Jim

We've always known that data quality matters. Now we need data dictionaries for the AI system, so that it knows when we say Northeast, it knows what we mean. I'm going to quote Scott Brinker's recent newsletter.

The range of questions anybody in business will immediately be able to self-serve will be stunning, because access to all the content on the open web will be commoditized by foundational models such as GTP 4. Competitive advantage will go to those with unique and proprietary data. Everybody will have the same AI engines, not everybody will have the same data to feed them, their growth will be more nurture than nature.

Your own internal data is crucial but I contend that there's another advantage that Scott didn't cover and that's the ability to ask a good question. I don't just mean prompt engineering. I mean, how do you get somebody to think analytically? How do you get them to think data issue thoughts, so that they ask valuable, impactful questions?

Tom

It takes some rudimentary education. If you don't know what a regression equation is, the difference between correlation and causation, or simple statistical terms, I think you're going to have a hard time even describing what you want out of your data. You need to know that there are dependent and independent variables. I suppose we could build a warning statement into Chat GPT that says, "Just because they're correlated doesn't mean there's a causal relationship" every time it expresses a statistical association. But I do think that some rudimentary training in how statistics work will be helpful.

On the other hand, I was reviewing the new Khan Academy GPT 4, add on that's fine tuned for educating people. There'll be some interaction between Kahnmigo and your system for accessing Tableau or Click or DataRobot or whatever. It will probably tell you, "Are you interested in the drivers of your company's performance?" And you'll say, "What do you think might be the factors?" and it will lead you through the process. I don't think that you're going to have to know that much to be a good prompter, because I think the system is going to prompt you to create better prompts.

Jim

Let's talk about data being the new mat. Data is crucial. Your own data is the most important of all. I will make a guess - not a prediction - that data quality and observability tools can be more autonomous. Can't we just ask them to look for trends and tell them to set up alerts? Just say, "Hey, computer; Keep an eye on my data and make sure it's clean"?

Tom

I think we're a ways away from that now. There are systems that can tell you that your data is dirty and that is the state of the art at the moment. I think many of the problems occur when you're entering data into a transactional system. The system could come back to you and say, "That address doesn't make sense", or "That looks remarkably similar to this address. Is it the same one?"

Jim

I'm curious if you feel that we're going to be able to do a prompt, such as, "Design and implement a scalable and maintainable data pipeline to ingest process and store data from our multiple sources, including real time streaming and batch processing, ensuring data quality and integrity and making it easy for access and accessible for our various analytical and machine learning applications."

Tom

Wow, did you make that up? Or are you reading it from somewhere?

Jim Sterne

I just asked chat GPT to give me a good prompt for creating a data pipeline.

Tom

Wow. Let's just say if that works then there are an awful lot of people who are going to be on the street looking for a job.

Jim

This is why my head keeps spinning. I just came across a newsletter Ben's Bites, which talked about Chat GPT Code Interpreter.

It is able to read files, let you download files, and executes its own code. It really comes into its own when you ask it to analyze some data, give it a big old dataset, and it will come up with hypotheses, run statistical tests, do regressions, write about its results, and even provide you with plots and visualizations. All of what you can simply download. The results are actually neat work with style and accuracy, potentially hours of data analyst work in minutes.

Tom

I knew that all of the major analytics and even machine learning tool providers were adding kind of chat-like front ends. Some people have talked about back end related things, as you suggested, like writing a report about it or summarizing it. I think that is in the fairly near future. I don't think it's necessarily the present yet for a lot of these tools. But it's pretty close.

Jim

How long until as you suggested, the system pushes back a little bit, asking for clarifications saying - like an analyst would - "Hey, that's an interesting question. What problem are you trying to solve? Why is that interesting to you? How else can I help you?"

Tom

It'll just be dialogue, not taking your prompt as a given.

Jim

And "Tell me more about this situation in your organization." And now all of us consultants are out of a job.

Tom Davenport

In the past, it's always taken a while before these things have massive effects on labor markets. This time, I'm not sure.

Jim Sterne 25:03

I sold the Apple II e's out of a retail store and had to explain computers to people. Then I had to explain business computers to companies that never owned one before. Adoption was slow. There's a great diagram of the curve of exponential growth with technology and how long it takes people to figure it out, and how long it takes society to end and how long it takes government.

No alt text provided for this image

                          Image: Pat Scannell

Jim

Let's roll it back to the democratization process. If I'm an individual stakeholder, I have to learn a new way to communicate with the computer. I got started on the command line, then there was the WYSIWYG mouse point-and-click, and then I had to train my brain to think, oh, you know what? I could Google that. So now it's, "How do I talk to the computer?" We need to teach all of our business stakeholders how to ask a good question. I'm not sure that they need to understand the difference between Frequentist and Bayesian or what a regression analysis is.  Causality versus correlation is legit .. but how the math works? I don't know how important that is anymore. As long as they understand that they need to ask a business question that they can use to make a decision.

Tom

Maybe I'm just trying to ensure continued employment for teaching professionals like me, but I think there is this idea of kind of statistical association based on probability that's really quite important. If you give people a forecasted range, they say, "I don't want to range. Tell me the answer!" If these systems are drawing lines and curves among dots on multivariable axes and you don't have any sense of how that works and what probability is, I think you're doomed even with these tools.

Jim

I could ask, "Does that chart mean that I'm better off sending my email in the afternoon than in the morning?" And it can tell me.

Tom

Yeah, if you're doing something relatively trivial. But if you're a doctor you might want to question whether that result means my patient is going to die. The patient may want an explanation and you have to be prepared to give them one.

Jim

Yes, use case matters. Absolutely. My biggest problem is all of the articles that my friends and family keep sending me - but I've already read - talking about how awful all of this AI is.  I push back with how they're using it wrong. It's not a search engine. It's an idea engine. Once I explain that it's just guessing what the next word should be and they finally wrap their heads around it, they go, "Well, in that case, it's really stupid." So. you have to have the domain knowledge to know if whether the output passes the smell test.

Tom

Yeah, exactly. I used to say that AI was like analytics on steroids. And now I think Generative AI is like analytics on LSD. It's really quite astounding. But you really need to know when it will get you in trouble.

No alt text provided for this image
"Generative AI is like analytics on LSD" https://lexica.art/

Jim Sterne

Bard and Bing are starting to include references and footnotes for each answer. That's useful. But it's a matter of training people to ask the right question in the right way. It turns out that Chat GPT is really good at multiplying two 20-digit numbers. But if you ask it to multiply a 20-digit number and a 12-digit number, it just goes right off the rails because it doesn't have concepts. We have this interesting task of fine tuning. Success will be when we take this general GTP 4, Large Language Model and imbue it with our internal first party data and guide rails. Samsung just said nobody can use Chat GPT. No! That's wrong! Yes, your humans made a mistake, so you train them, you have a policy, you don't just say No.

Tom

Coming back to what we were saying before about competitive advantages, the only organization that's really given me much access to what they're doing with fine tuning is Morgan Stanley. And they say a), it takes highly curated content and their case, you know, documents that they trained it on, and b) it's technically complex. You need a data scientist. Your prep, proprietary knowledge is going to be absolutely critical to the value of your systems. You can dramatically reduce the frequency of hallucinations when you fine tune on your own content.

Jim

Here's a question from the audience: How are companies starting to use generative AI to create a competitive advantage? Is it just text? Or are they starting to use images, sound, et cetera?

Tom

It's very industry specific. Any industry that uses text or images or video or whatever better be looking at this stuff for competitive advantage or at least productivity anyway.

Jim

Productivity is going to be number one. Here's a use case. On a podcast six or eight months ago, Paul Daugherty, Chief Technology & Innovation Officer at Accenture, said,

We're doing interesting stuff in the HR area within Accenture. We have 400,000 employees and we're using AI in a creative way - still experimental - but we're using Machine Learning and AI based on a person's profile to understand their current job experience, their resume, and their assignments. Based on the changes in technology, it learns and recommends how soon that individual might need to change their profession, because what they're doing is becoming obsolete, and to see how long their skills will be relevant, and what they should start learning based on what they already know and where they want to go from there.

So, there's a use case of keeping track of who my people are and what they know, and how they want to grow and making recommendations of what to learn. I think a that's a competitive advantage right there.

What are the most important things an analyst needs to be aware of when democratizing this sort of user interface? The relationship used to be: I've got this data, you got this problem; let's get together. Now that stakeholders can ask their own questions, the analyst becomes a little bit more of a strategic adviser. How do you think the analysts' job changes?

Tom

I think they'll become certifiers of systems. We're going to create tons and tons of analyses, models, programs of various types, and some of them will have enterprise value. It will be incumbent upon those former analysts - in addition to making sure the data that goes into them is good - to certify this really is a good model and it works. Here are the conditions under which it might no longer work in the future. Here's who created it. Here's what prompt I used to create it; all the metadata for the model.

Jim

We start with the data dictionary, we have a data catalog, we're tracking data legacy. Now we need a model management so that we've got a model catalog, and then we need a prompt dictionary or a prompt catalog. I'm kind of interested in a question like this and the machine can say, "Well, here's an interesting prompt that somebody asked last week, would that be useful?"

Tom

That would be a useful thing. "People who asked this often asked this", or, "Did people who asked this really mean this?" There's also the idea of a feature store, which would have commonly used features or variables in machine learning models that are clean and reliable and ready to go.

Jim

My pre-Chat GTP brain says, oh, a drop down menu, but my post Chat GPT brain says  that is now part of the response that the chat presents. It says, "Which kind of airport do you mean?"

Tom

Yeah, because it would know that there are 11 different meanings of the term airport.

Jim

And then say, "Did you know we also have these attributes or features or variables? Do you think this might be a more informative question to ask? It might impact these other business outcomes. Would you like for me to pursue that?"

Tom

It could say, "You've asked for a report on the past for sales in this particular quarter in this particular region. Have you thought about a prediction of what might happen in the future?" and "What factors might drive that prediction as far as you're concerned?"

Jim

As people create value on top of LLM's how do software as a service companies survive? If I can say, "Do an analysis and give me a graphic output", what happens to Tableau and Looker Studio? What happens to SAS and SPSS?

Tom

Vendors that prosper will be the ones that have the transaction data embedded in their systems. You know: the SAPs, Salesforce for CRM, Workday for HR. The statistical programming itself is quite trivial. It may be that they disappear from the scene. Everybody thinks transaction processing is less relevant, but that's where the data is stored.

Jim

And that's the AI moat; that's the gold and the rest of this is just the sieve and the shovel and the pan. The value is now the data itself. It used to be the systems and software as a product might not be viable anymore. I can just say, "Run payroll."

Tom

It will put a premium on ensuring that the data are entered correctly to start with, ensuring data quality. We have to keep reminding ourselves that it's not just traditional data in terms of, you know, structured rows and columns of numbers, but documents and knowledge as well; content of various types.

Jim

If your content of various types is highly curated and vetted, then it is the data itself - rather than the system - that's the most important because the system has become sophisticated enough. It can do whatever you want. But if your data is not right, you're lost.

Tom

Yeah, you're screwed.

Jim

If data itself is the moat, then how does any smaller company create a competitive advantage versus larger players like Microsoft or Google?

Tom

Jasper, for example, did not develop GPT 3, or 4, but they did gather a lot of marketing-oriented content to fine tune and train on. There will be a number of niches where you know more about - I don't know; septic tank servicing - than anybody else. You may be a relatively small organization, but you can maintain that moat by just ensuring that your content on that particular topic is really high quality.

Note: Next time (May 17, 2023) Tom and I will talk about Generative AI and the Chief Artificial Intelligence Transformation Officer - a new role that Tom and I spoke with Vipin Gopal about in Episode 3 of Data Driven Leaders Studio Public Conversations.
No alt text provided for this image

Full Sterne Ahead #1: Marketing Analytics Live Online interviewees

Marketing Analytics Summit

Data Driven Leaders Studio

Analytics Cohorts

Jim Sterne

Sterne Measures Newsletter

Sign up and stay up to date on Jim's papers, articles, podcasts, and in-person appearances.

You'll receive a confirmation email because double-opt-in is in my blood.


Videos by Jim Sterne