GenAI and Data Analytics

Introduction: Large language models (LLMs) are the heavyweights of generative AI (GenAI). They can handle a variety of tasks, and with the advent of multimodal AI, they are capable of generating images and interpreting images. Important aspects include:

  • Speed: GenAI has has made most administrative tasks and coding much faster

  • Education: GenAI can serve as a tutor for any topic, including data analysis and coding. The Khan Academy has Khanmigo that individualizes tutoring for K-12 students. There is no reason this could not be expanded to colleges, graduate, and professional schools.

  • Code generation and debugging

  • Explainable insights from text, images, audio and video

  • Generate synthetic data

  • Create dashboards and a variety of reports

  • Create data from images and videos

  • Content creation

  • Brainstorming

  • Graphic designs

  • Web scraping

  • Perform supervised and unsupervised learning

  • Anomaly detection

  • Perform descriptive, diagnostic, predictive and prescriptive analytics

Examples of AI analytical companies: Polymer, Tableau. Altair RapidMiner, Datalab, DataRobot, SproutSocial. Microsoft BI, Salesforce AI, Qlik, H2O.ai, Clarifai, and Dataiku

Future of AI Analytics:

  • Advanced simulations: AI can test thousands of simulations and is central to digital twins

  • Real-time problem detection: Leveraging the Internet of Things (IoT), edge computing, and live streaming, problems are discovered before humans realize a problem exists. We are moving towards this with ICU medicine.

  • Embedded Analytics: Seamlessly embedded models continuously monitor services and products involuntarily or autonomically

  • Prescriptive Analytics: This approach is advanced and rarely taken. With AI-recommended solutions, multiple options can be analyzed for improved outcomes.

AI Analytics with Tabular Data

Essentially, every modern (frontier) LLM is capable of analyzing data. Just upload a CSV or Excel file and give it some standard orders, such as "summarize this dataset," and most will do a good job. The most common programming language to analyze data with LLMs is Python. Most LLMs display the code for each step and they often give you the option to copy the code. Why? Because most are not capable of generating data visualization plots such as a box plot or scatter plot. The user has to copy the code and paste it into a programming notebook like Jupyter Notebook or Google Colab.

If you don't need data visualizations and are happy with descriptive statistics and simple models, standard LLMs such as GPT-4o, Claude 3.5 Sonnett, Mistal, etc., are adequate. The programs that will generate data visualizations without the need to paste code into a notebook (as of February 2025) are GPT-40 with the advanced data analytics package obtained with an OpenAI subscription, Vizly, and Julius AI. There are a few newer programs that I consider beta at this point.

In another section, I will discuss use cases based on Gemini embedded in Google Sheets and Vizly.