Business Analytics (BUSI4720)
Background
BUSI4720 is a course in the undergraduate Bachelor of Commerce (BComm) degree at Memorial University of Newfoundland. I developed the course initially for an in-class delivery, and subsequently for an online delivery. The first section of the course was taught in the winter semester of 2024 (January to April, in-class) and the fall semester of 2024 (September to December, online). The course is a practical introduction to the very broad area of business analytics, data science, big data analytics, machine learning and related disciplines. Consistent with my stated teaching philosophy, it is based on experiential learning and students are required to use a number of software packages for course assignments.
Course Goals and Learning Objectives
The following goals and objectives are defined for this course. By the end of the course, students will be able to:- List and explain core statistical, computational, and mathematical concepts in business analytics and demonstrate corresponding techniques.
- Specify data requirements and evaluate the quality and suitability of various, possibly heterogeneous data sources for different analytics techniques and analytics problems.
- Evaluate the applicability of different descriptive, predictive, and prescriptive business analytics techniques to a variety of business problems.
- Apply descriptive, predictive, and prescriptive business analytics methods to identify and solve business problems.
- Present and interpret analysis results for use in decision making.
- Identify and explain potential ethical and legal problems with different applications of business analytics.
- Specify appropriate limitations and governance/oversight mechanisms to business analytics to address ethical and legal concerns.
Course Textbook
I chose to develop my own textbook for this course as I could not find a single textbook that covers the set of materials required. This course is a core, required course for the Bachelor of Commerce program at Memorial University of Newfoundland, Canada. As students receive only a single course in business analytics, and this course is in the fourth and final year of the program, the material coverage is intentionally broad, and covers aspects that may be outside some narrower conceptions of analytics. Additionally, students taking the course generally have little to no exposure to computer applications or statistical software, necessitating a rather comprehensive approach that not only introduces computer and programming basics, such as data and data types that students may encounter in business analytics, but also covers introduction to R and Python as well as a brief coverage of relational and graph databases, that are typically not considered part of business analytics. On the other hand, this course also contains advanced topics, such as interpretable machine learning, analytics at industrial scale, reinforcement and MLOps, that are not usually found in a business analytics course. However, these topics are gaining importance and it is essential that students have at least some exposure to them. In summary, the textbook was written because no other single book offers the necessary broad perspective.
The textbook is written from an applied perspective. I believe that students, even business students, should not only be able to talk about analytics, but must also be able to do analytics. This means that, together with the concepts, every chapter also contains R or Python code showing how the concepts can be applied. Looking at this from another perspective, I believe students must not rely solely on software tools and statistical libraries, but it is crucial that they understand, at least in principle and at an intuitive level, how these tools work. This is necessary to allow an informed use of tools, to be able to select the appropriate tool for a given situation, and to be aware of the shortcomings, drawbacks, boundary conditions, and other limitations of tools. In short, formulas in this book are to explain what happens ''behind the scenes'' of the code, and code is in this book to show how formulas can be applied; both are necessary.
How to Use the Textbook
For instructors, the book is written for a 24 class semester of 75 minutes each (the chapter on visualization should be covered in two classes), with two classes dedicated to mid-term exams. If time is short, some of the later, more advanced chapters could be omitted, for example, the two chapters on reinforcement learning, and/or the chapter on MLOps. A slide set for 22 classes is available, as is a question bank of multiple-choice questions for each chapter, e.g. for quizzes. Each chapter also contains a set of short hands-on exercises that can be used during class to keep students engaged or can form the basis for a computer lab setting. Also available is a set of example exam questions. Given the extensive set of online materials on programming in general, and data science and data analytics in particular, ranging from the traditional \href{Stack Overflow}{https://stackoverflow.com/} site, to Google and YouTube, to the most recent ChatGPT or other LLMs, it is easy for students to complete any technical homework assignment or course project using such tools. Instructors should therefore focus on data and results interpretation and use new or unpublished data sets, if they wish to set such assessment or evaluation exercises at all. Consequently, the example exam questions are long-answer questions that focus on conceptual understanding of the material, and less on technical programming skills.
Using R and Python
The focus on R and Python, over commercially available tools, is due to multiple reasons. First, the use of open-source software makes the material more easily accessible to students, independent of the availability of campus-wide licenses, or the use of limited ''evaluation'' licenses for some commercial tools. A second reason is the cross-platform nature of these software tools. Computing hardware in practice, and in the classroom, is a heterogeneous mix of different chip sets (Intel, Apple/ARM) and different operating systems (Windows, MacOS, Linux, etc.) so that is essential to work with software tools that are available and interoperable across these hardware and operating system platforms. A third reason is that R and Python are widely used in production environments. They tend to be more flexible than commercial offerings, and are also at the forefront of new developments in the area of business analytics. New methods and techniques are typically implemented directly by their inventor in open-source libraries and packages for R or Python, before they mature and are included in commercial offerings. The focus on command line tools is to avoid the complexities of graphical user interfaces that tend to change more rapidly than application programming interfaces (APIs), it is focus on the essentials and not be distracted by graphical environments. Scripting with command line tools generally also leads to better replicability of analyses and easier integration into production environments. For example, while it is all well and good to explore customer purchasing predictions on a small data set using the desktop edition of SPSS (a commercial, graphical, statistics software application), implementing real-time dynamic pricing in the global web-based ordering system will require the model to be implemented and integrated with very different tools.