Lab: AI Systems in the Wild

Unit 1: Foundations of Artificial Intelligence — Lab Assignment

You have spent the week reading and thinking about AI. Now it is time to observe AI systems directly and analyze what you find. This assignment asks you to select three AI-powered systems, test them systematically, and produce a professional analysis report.

Lab Objectives

By completing this lab, you will be able to:

Identify which of the four definitional approaches (acting humanly, thinking humanly, thinking rationally, acting rationally) a real AI system most closely follows
Document AI system capabilities and limitations through structured testing
Conduct basic bias and fairness testing on AI systems
Analyze the privacy and data practices of AI-powered products
Synthesize findings into a professional comparative analysis

This lab draws directly on concepts from all five content sections of Unit 1:

The four approaches to defining AI (Section 1.1)
The distinction between narrow AI and AGI (Section 1.3)
Real-world application domains (Section 1.4)
Algorithmic bias, privacy, and ethics frameworks (Section 1.5)

Review those sections if you need a refresher before beginning.

Assignment Overview

Total Points: 100
Estimated Time: 4-6 hours
Format: Written report (8-12 pages, not counting screenshots)
Submission: Upload to Brightspace by the deadline posted in the course schedule

What to Submit

A single PDF document organized into four parts:

System selection and rationale (brief; establish which three systems you chose and why)
Testing documentation (the bulk of the report; 45 points)
Ethical assessment (30 points)
Professional synthesis (25 points)

Selecting Your Three AI Systems

Choose three publicly accessible AI-powered systems from different categories. Suggested options include:

Conversational AI: ChatGPT, Claude, Gemini, Microsoft Copilot
Image generation: DALL-E, Midjourney, Stable Diffusion (via a web interface)
Image recognition: Google Lens, Apple Visual Look Up, Clarifai demo
Search AI: Perplexity, Google AI Overviews, Bing AI
Recommendation systems: A streaming service (Netflix, Spotify, YouTube) — document and analyze its recommendations
Translation: DeepL, Google Translate, Microsoft Translator
Writing assistance: Grammarly, Hemingway Editor, any LLM-based tool

Diversity requirement: Your three systems should come from at least two different application categories (e.g., not three chatbots).

Part 2: Testing Documentation (45 Points)

For each of your three systems, complete the following testing protocol. Each system is worth 15 points.

System Classification (5 points per system)

Identify which of the four AI approaches the system most closely follows: acting humanly, thinking humanly, thinking rationally, or acting rationally. Provide specific evidence — screenshots and concrete examples — that support your classification. A system may blend approaches; explain which dominates and why.

Capability Testing (4 points per system)

Design and run three tests that probe what the system does well. For each test, document:

What you asked or submitted (screenshot or verbatim input)
The system’s response (screenshot)
Your assessment of performance quality and why

Limitation Testing (4 points per system)

Design and run three tests that expose genuine limitations. Effective limitation tests try to push the system outside its training distribution:

Ask a chatbot about a very recent event it may not have data on
Submit an unusual or ambiguous image to a vision system
Ask a translation system to handle regional slang or code-switching
Ask a system a question that requires common-sense reasoning it likely lacks

For each test, document the input, output, and your analysis of why the system failed — not just that it did.

Narrow AI Evidence (2 points per system)

Explicitly demonstrate that the system is narrow AI. Find at least one task that appears related but that the system handles poorly or refuses — evidence of the specialization limits that distinguish narrow AI from AGI.

Effective testing strategy:

Start with tests the system was designed for — this establishes a baseline of capability.
Then vary the inputs systematically: change the domain, the language, the complexity, the ambiguity.
For bias testing (Part 3), create parallel inputs that differ only in demographic markers.
Document everything with screenshots. Your reader should be able to reproduce every test.
Analyze patterns, not just individual results.

Part 3: Ethical Assessment (30 Points)

Bias and Fairness Testing (10 points)

Design tests that look for differential treatment based on demographic characteristics.

For conversational AI: ask the same question using names associated with different demographic groups. For image recognition: test with images of people from different racial backgrounds, genders, and age groups. For recommendation systems: consider whether the recommendations you receive seem to reflect assumptions about your identity.

Document your methodology, results, and analysis of real-world implications for any bias you detect.

Privacy and Data Use Analysis (10 points)

Read the privacy policy and terms of service for at least one of your three systems. Document:

What data the system collects (inputs, outputs, usage patterns, account information)
How data is used (training, advertising, third-party sharing)
How long data is retained and whether it can be deleted
Whether the data practices align with what users likely expect

Provide specific quotes from the policy to support your analysis. Identify at least two concrete privacy risks associated with the system.

Broader Societal Impact (10 points)

For one of your three systems, analyze its broader societal implications:

Benefits: Identify at least three concrete benefits, with specific stakeholder groups who gain.
Risks/concerns: Identify at least three concrete concerns, including at least one from Section 1.5 (bias, privacy, job displacement, misinformation, power concentration).
Balanced assessment: Does the evidence suggest benefits outweigh risks, or vice versa? Acknowledge the complexity and avoid oversimplification.
Recommendations: Suggest two or three practical steps the system’s developers could take to reduce risks while preserving benefits.

Part 4: Professional Synthesis (25 Points)

Executive Summary (5 points)

Write a 200-300 word summary of your key findings across all three systems. Identify your most important takeaway about current AI capabilities and limitations. Name the one ethical concern from your testing that you consider most significant and explain why.

Comparative Analysis (10 points)

Create a table comparing all three systems across:

Which AI approach it uses
Key capabilities (2-3 per system)
Key limitations (2-3 per system)
Primary ethical concern

Following the table, write 2-3 paragraphs identifying patterns: What do your three systems have in common? Where do they differ? What does this tell you about the current state of AI?

Connect your observations to the four AI approaches from Section 1.1 — which approaches seem most common in deployed systems, and why do you think that is?

User Recommendations (5 points)

Write practical guidance for a non-technical user considering using one of your three systems. Be specific and evidence-based:

How can a user recognize when the system is performing well vs. poorly?
What types of tasks should users be cautious about delegating to this system?
How can a user identify potential bias in its outputs?
What steps can a user take to protect their privacy?

Personal Reflection (5 points)

Write a 200-300 word reflection connecting your hands-on testing experience to concepts from the course. Questions to address:

Did your testing confirm or challenge anything you read this week?
What surprised you most?
Did testing the systems change how you feel about using them yourself?

Use specific examples from your testing to illustrate your points.

Scoring Guide

Section	Points	Quick Reference
Testing Documentation (3 systems × 15 pts)	45	System classification + capability testing + limitation testing + narrow AI evidence
Ethical Assessment	30	Bias testing (10) + privacy analysis (10) + societal impact (10)
Professional Synthesis	25	Executive summary (5) + comparative analysis (10) + user recommendations (5) + reflection (5)
Total	100

Section

Points

Quick Reference

Testing Documentation (3 systems × 15 pts)

System classification + capability testing + limitation testing + narrow AI evidence

Ethical Assessment

Bias testing (10) + privacy analysis (10) + societal impact (10)

Professional Synthesis

Executive summary (5) + comparative analysis (10) + user recommendations (5) + reflection (5)

Total

100

Score	Grade	Description
90-100	A	Exceptional work. Thorough testing, insightful analysis, strong ethical assessment, excellent synthesis. Portfolio-quality.
80-89	B	Strong work. Good testing and analysis. Solid ethical assessment. Good synthesis with minor gaps.
70-79	C	Adequate work meeting basic requirements. Some areas need more depth.
60-69	D	Significant gaps. Superficial testing or weak analysis.
Below 60	F	Incomplete or serious deficiencies. Does not meet requirements.

Score

Grade

Description

90-100

Exceptional work. Thorough testing, insightful analysis, strong ethical assessment, excellent synthesis. Portfolio-quality.

80-89

Strong work. Good testing and analysis. Solid ethical assessment. Good synthesis with minor gaps.

70-79

Adequate work meeting basic requirements. Some areas need more depth.

60-69

Significant gaps. Superficial testing or weak analysis.

Below 60

Incomplete or serious deficiencies. Does not meet requirements.

Potential Deductions

Poor formatting or unprofessional presentation: -3 points
Missing or poorly labeled screenshots: -3 to -5 points
Significantly under length (under 6 pages): -5 points
Missing citations (privacy policies, external sources): -2 points
Poor writing quality (numerous errors): -3 points

Bonus Extensions (Up to 10 Points)

Optional extensions for students who want to go deeper:

Cross-system challenge (5-10 pts): Design a single task and test all three systems on it. Compare their approaches, outputs, and failures. Analyze what the differences reveal about their architectures.

Historical context (5-10 pts): Research the history of one of your chosen technologies. Connect its development to the AI history timeline from Section 1.2.

Industry analysis (5-10 pts): Investigate the company behind one of your systems. What AI ethics framework, if any, do they claim to follow? Compare their stated values against the behavior you observed in testing.

Bias deep dive (5-10 pts): Conduct a comprehensive bias audit using an established framework (Google PAIR Guidebook, Microsoft Responsible AI toolkit, or the NIST AI RMF’s MAP function). Produce portfolio-quality documentation.

Bonus points are awarded only for truly exceptional additional work.

Academic Integrity

This assignment is individual work.

Prohibited:

Sharing screenshots, test results, specific inputs, or analysis with other students
Collaborating on writing or analysis
Using AI tools to write the report (you may use AI as a test subject, not as an author)
Fabricating screenshots or results

Violations: -50 to -100 points plus academic integrity referral.

Permitted:

Discussing which AI systems you chose
Helping classmates troubleshoot technical access issues
Discussing general testing strategies from course material

Submission Instructions

Compile your report as a single PDF document.
Use clear headings matching the four-part structure above.
Label all screenshots with the system name, date, and what you were testing.
Submit via the Brightspace assignment link for "Week 1 Lab: AI Systems in the Wild."
Questions? Post in the course Q&A forum or bring them to office hours.

Next: Unit 1 Wrap-Up & Self-Assessment →

Original content for CSC 114: Artificial Intelligence I, Central Piedmont Community College.

This work is licensed under CC BY-SA 4.0.