Large Language Models (LLMs) like ChatGPT and Claude, and Artificial Intelligence in general, are rightfully getting a serious amount of attention. Companies are investing billions of dollars in server time and engineering to compete in the “foundation model” space.

While a lot of media attention is on big investments, the fear of jobs being replaced, and how cool it is to generate art, what about business owners and enterprise leaders who want to make an impact today?

I decided to make a demo to show off some things that LLMs can do, TODAY. The point of the demo is to show some practical applications for how they can be used on your private business data, in a way that can be seen and touched and used by you. It’s easy to see results, since it is a web application; you don’t have to install any complicated software or run any code.

Private data, not generic models

All of the demos use private data. The demos are not searching the internet or relying on facts already embedded in the model. The demos are showing the power of reasoning, question answering, search, and summarization on your own data. I want you to see that you can take private, “messy” data — files, emails, powerpoints – and make sharp insights and decisions much faster than without LLMs.

I loaded file types that you have around the office: PDF’s, excel spreadsheets, word docs.

https://demo.bigcloudcountry.com

The demo shows a few capabilities I want to outline.

Reasoning over private data

Reasoning is the ability to think critically and step-by-step about a problem. In the demo, we show how ChatGPT compares two basketball teams and comes up with a prediction about who would win if they played. The reasoning is laid out and transparent. It uses statistics from the NCAA website, which have been downloaded and saved in a database. So, picture your own databases being used as a source of data, and asking a LLM to reason about that data.

You give it two college basketball teams. It then pulls up a bunch of team statistics like points per game, rebound margin, bench points, and a dozen others. It also pulls individual statistics for 3 or 4 key players from those teams. The individual statistics are things like rebounds, triple doubles, 3-point percentage, free throws, and so on.

The LLM is then presented with the data and a custom prompt, shown below (prompts are very important when using LLMs, and there is a whole field of activity around “prompt engineering”).

Team stats:\nTeam,GM,AST,TO,Ratio\nUConn,34,630,335,1.88\n\n\n***\n\nTeam,BLKS,BKPG\nUConn,183,5.4\n\n\n***\n\nTeam,FGM,FGA,FG% 
\nUConn,989,1994,49.6\n\n\n***\n\nTeam,FT,FTA,FT%\nStetson,459,599,76.63\n\n\n***
\n\nTeam,REB,RPG,OPP REB,OPP RPG,REB MAR\nUConn,1308,38.5,1022,30.1,8\n\n\n***\n\nTeam,PTS,PPG,OPP PTS,OPP PPG,SCR MAR\nUConn,2770,81.5,2190,64.4,17\n\n\n***\n\nTeam,W,L,Pct\nUConn,31,3,91.2\n\n\n***\n\n\n\nPlayer Stats:\n\nName,Team,Cl,Height,Position,G,FGM,3FG,FT,PTS,PPG\nJalen Blackmon,Stetson,Jr,6-3,G,34,240,109,141,730,21.5\n\n\n***\n\nName,Team,Cl,Height,Position,G,FGM,FGA,Class,Field Goals Made,Field Goals ... [truncated]

Above is a collection of college basketball statistics for UConn and Stetson.
You are an expert basketball analyst.
Your specialty is making deeply-informed predictions about who will win games based on available recent statistics.
Available statistics may include team and player stats.
Using these statistics, predict who will win a contest between the two teams, and explain your reasoning. If player statistics are available, mention how radical the players are and whether they will impact the game.

The stats along with this prompt are sent to GPT-4, which has the most advanced reasoning. They are also sent with a high “temperature” which is a way of telling the LLM how creative to be.

The results are something like this:

AI-generated reasoning over records and statistics that are private to you

First-round upsets happen from time to time, but GPT-4 thinks Uconn will prevail based on the provided stats. The point is, if you need reasoning and analysis done on a bunch of data in excel spreadsheets or databases, the cost and wait time will rapidly decrease if you use an LLM.

Question answering

This demo allows you to ask a free-form question about March Madness. The context data is Wikipedia articles summarizing the tournaments going back about 20 years. Picture a big pile of Microsoft Word documents or PDFs that normally you’d have to search one by one. In fact, the March Madness articles were saved as PDF and then read by our systems, and finally stored in a special type of database called a vector database. A vector database stores text in big chunks (like full paragraphs or documents or even as small as sentences), but the text is represented as vector arrays – big lists of numbers like [1.23476, -0.43987, 5.98734, …] which is easier for a large language model to understand.

Once you have your PDFs or word documents or excel spreadsheets represented in a vector database, you can do a very effective kind of search called semantic search. Semantic search, as opposed to keyword search, lets you look up source documents based on meaning, not keywords. That means, you can just say what you want, and not worry about keyword matching (we’ve all done awkward Google searches trying to guess what will match).

After you submit a question, your question is converted into a vector. Then the vector database is searched with your question. The text stored in the vector database has all sorts of metadata on it to improve its chances of an accurate match. For example, this chunk of march madness history …

ESPN and NCAA Productions
John Saunders (NCAA Tournament Today) and Bob Ley (NCAA Tournament Tonight) served as studio hosts and Dick Vitale served as studio analyst.
Mike Gorman and Ron Perry – first round (Temple–Lehigh, Georgia Tech–Iowa State) at Hartford, Connecticut
Bob Carpenter and Dan Belluomini – first round (Indiana–Richmond, Georgetown–LSU) at Hartford, Connecticut
Ralph Hacker and Bucky Waters – first round (Duke–Boston University, Missouri–Rhode Island) at Chapel Hill, North Carolina
Bob Rathbun and Dan Bonner – first round (Syracuse–North Carolina A&T, SMU–Notre Dame) at Chapel Hill, North Carolina
Fred White and Larry Conley – first round (Oklahoma–Chattanooga, Louisville–Oregon State) at Atlanta, Georgia
Mike Patrick and Bob Ortegel – first round (Brigham Young–Charlotte, Auburn–Bradley) at Atlanta, Georgia
Tom Hammond and Mike Pratt – first round (Kentucky–Southern, Illinois–UTSA) at Cincinnati, Ohio
Mick Hubert and Jack Givens – first round (Villanova–Arkansas, Maryland–UC Santa Barbara) at Cincinnati, Ohio

… has this metadata attached to it:

{
  "metadata": {
    "datePublished": "2006-03-13T20:21:25Z",
    "dateModified": "2024-02-03T19:00:46Z",
    "image": "https://upload.wikimedia.org/wikipedia/en/7/74/1988_Final_Four_logo.png",
    "headline": "United States top collegiate-level basketball tournament for 1988; 50th Anniversary of the NCAA Tournament",
    "url": "https://en.wikipedia.org/wiki/1988_NCAA_Division_I_men%27s_basketball_tournament"
  },
  "questions_this_excerpt_can_answer": "1. Who were the broadcasters for the 1988 NCAA Division I men's basketball tournament, and where did they cover the various rounds of the tournament?\n2. What were the locations for the different rounds of the 1988 NCAA Division I men's basketball tournament, including the Final Four?\n3. What other basketball tournaments took place in 1988 alongside the NCAA Division I men's tournament, and what were their respective divisions and genders?",
  "section_summary": "The key topics and entities of the section include the 1988 NCAA Division I men's basketball tournament, the locations for the different rounds of the tournament, the broadcasters covering the various rounds, the Final Four in Kansas City, Missouri, and other basketball tournaments that took place in 1988 alongside the NCAA Division I men's tournament, such as the Division II and Division III tournaments, as well as women's basketball tournaments and other invitation tournaments. The section also mentions the studio hosts and analysts for the tournament broadcasts.",
  "excerpt_keywords": "1988, NCAA Division I, men's basketball tournament, Final Four, broadcasters"
}

A really important assumption here is that I don’t assume people know exactly what they want to find. So, there are some recommendations on the demo about what kinds of things to ask. These recommendations were actually generated by ChatGPT while the articles were being processed.

Recommended questions get you going from scratch, and they were generated by ChatGPT as a way to make it easier to find content

Next Steps

Do you want to explore how to use powerful AI to make an impact at your business? Big Cloud Country has starter kits that get you up and running quickly so you see impact fast. Contact me and let’s start building.

Practical AI for Business

Private data, not generic models

Reasoning over private data

Question answering

Next Steps

Architecture: Retrieval system

Planning Checklist for RAG projects (retrieval-augmented generation)