GenAI and Data Analytics
Introduction: Large language models (LLMs) are the heavyweights of generative AI (GenAI). They can handle a variety of tasks, and with the advent of multimodal AI, they are capable of generating images and interpreting images. Important aspects include:
Speed: GenAI has has made most administrative tasks and coding much faster
Education: GenAI can serve as a tutor for any topic, including data analysis and coding. The Khan Academy has Khanmigo that individualizes tutoring for K-12 students. There is no reason this could not be expanded to colleges, graduate, and professional schools.
Code generation and debugging
Explainable insights from text, images, audio and video
Generate synthetic data
Create dashboards and a variety of reports
Create data from images and videos
Content creation
Brainstorming
Graphic designs
Web scraping
Perform supervised and unsupervised learning
Anomaly detection
Perform descriptive, diagnostic, predictive and prescriptive analytics
Examples of AI analytical companies: Polymer, Tableau. Altair RapidMiner, Datalab, DataRobot, SproutSocial. Microsoft BI, Salesforce AI, Qlik, H2O.ai, Clarifai, and Dataiku
Future of AI Analytics:
Advanced simulations: AI can test thousands of simulations and is central to digital twins
Real-time problem detection: Leveraging the Internet of Things (IoT), edge computing, and live streaming, problems are discovered before humans realize a problem exists. We are moving towards this with ICU medicine.
Embedded Analytics: Seamlessly embedded models continuously monitor services and products involuntarily or autonomically
Prescriptive Analytics: This approach is advanced and rarely taken. With AI-recommended solutions, multiple options can be analyzed for improved outcomes.
AI Analytics with Tabular Data
Essentially, every modern (frontier) LLM is capable of analyzing data. Just upload a CSV or Excel file and give it some standard orders, such as "summarize this dataset," and most will do a good job. The most common programming language to analyze data with LLMs is Python. Most LLMs display the code for each step and they often give you the option to copy the code. Why? Because most are not capable of generating data visualization plots such as a box plot or scatter plot. The user has to copy the code and paste it into a programming notebook like Jupyter Notebook or Google Colab.
If you don't need data visualizations and are happy with descriptive statistics and simple models, standard LLMs such as GPT-4o, Claude 3.5 Sonnett, Mistal, etc., are adequate. The programs that will generate data visualizations without the need to paste code into a notebook (as of February 2025) are GPT-40 with the advanced data analytics package obtained with an OpenAI subscription, Vizly, and Julius AI. There are a few newer programs that I consider beta at this point.
In another section, I will discuss use cases based on Gemini embedded in Google Sheets and Vizly.

