Lab: AI Systems in the Wild
Unit 1: Foundations of Artificial Intelligence — Lab Assignment
You have spent the week reading and thinking about AI. Now it is time to observe AI systems directly and analyze what you find. This assignment asks you to select three AI-powered systems, test them systematically, and produce a professional analysis report.
Lab Objectives
By completing this lab, you will be able to:
-
Identify which of the four definitional approaches (acting humanly, thinking humanly, thinking rationally, acting rationally) a real AI system most closely follows
-
Document AI system capabilities and limitations through structured testing
-
Conduct basic bias and fairness testing on AI systems
-
Analyze the privacy and data practices of AI-powered products
-
Synthesize findings into a professional comparative analysis
This lab draws directly on concepts from all five content sections of Unit 1:
-
The four approaches to defining AI (Section 1.1)
-
The distinction between narrow AI and AGI (Section 1.3)
-
Real-world application domains (Section 1.4)
-
Algorithmic bias, privacy, and ethics frameworks (Section 1.5)
Review those sections if you need a refresher before beginning.
Assignment Overview
Total Points: 100
Estimated Time: 4-6 hours
Format: Written report (8-12 pages, not counting screenshots)
Submission: Upload to Brightspace by the deadline posted in the course schedule
What to Submit
A single PDF document organized into four parts:
-
System selection and rationale (brief; establish which three systems you chose and why)
-
Testing documentation (the bulk of the report; 45 points)
-
Ethical assessment (30 points)
-
Professional synthesis (25 points)
Selecting Your Three AI Systems
Choose three publicly accessible AI-powered systems from different categories. Suggested options include:
-
Conversational AI: ChatGPT, Claude, Gemini, Microsoft Copilot
-
Image generation: DALL-E, Midjourney, Stable Diffusion (via a web interface)
-
Image recognition: Google Lens, Apple Visual Look Up, Clarifai demo
-
Search AI: Perplexity, Google AI Overviews, Bing AI
-
Recommendation systems: A streaming service (Netflix, Spotify, YouTube) — document and analyze its recommendations
-
Translation: DeepL, Google Translate, Microsoft Translator
-
Writing assistance: Grammarly, Hemingway Editor, any LLM-based tool
Diversity requirement: Your three systems should come from at least two different application categories (e.g., not three chatbots).
Part 2: Testing Documentation (45 Points)
For each of your three systems, complete the following testing protocol. Each system is worth 15 points.
System Classification (5 points per system)
Identify which of the four AI approaches the system most closely follows: acting humanly, thinking humanly, thinking rationally, or acting rationally. Provide specific evidence — screenshots and concrete examples — that support your classification. A system may blend approaches; explain which dominates and why.
Capability Testing (4 points per system)
Design and run three tests that probe what the system does well. For each test, document:
-
What you asked or submitted (screenshot or verbatim input)
-
The system’s response (screenshot)
-
Your assessment of performance quality and why
Limitation Testing (4 points per system)
Design and run three tests that expose genuine limitations. Effective limitation tests try to push the system outside its training distribution:
-
Ask a chatbot about a very recent event it may not have data on
-
Submit an unusual or ambiguous image to a vision system
-
Ask a translation system to handle regional slang or code-switching
-
Ask a system a question that requires common-sense reasoning it likely lacks
For each test, document the input, output, and your analysis of why the system failed — not just that it did.
Narrow AI Evidence (2 points per system)
Explicitly demonstrate that the system is narrow AI. Find at least one task that appears related but that the system handles poorly or refuses — evidence of the specialization limits that distinguish narrow AI from AGI.
Effective testing strategy:
-
Start with tests the system was designed for — this establishes a baseline of capability.
-
Then vary the inputs systematically: change the domain, the language, the complexity, the ambiguity.
-
For bias testing (Part 3), create parallel inputs that differ only in demographic markers.
-
Document everything with screenshots. Your reader should be able to reproduce every test.
-
Analyze patterns, not just individual results.
Part 3: Ethical Assessment (30 Points)
Bias and Fairness Testing (10 points)
Design tests that look for differential treatment based on demographic characteristics.
For conversational AI: ask the same question using names associated with different demographic groups. For image recognition: test with images of people from different racial backgrounds, genders, and age groups. For recommendation systems: consider whether the recommendations you receive seem to reflect assumptions about your identity.
Document your methodology, results, and analysis of real-world implications for any bias you detect.
Privacy and Data Use Analysis (10 points)
Read the privacy policy and terms of service for at least one of your three systems. Document:
-
What data the system collects (inputs, outputs, usage patterns, account information)
-
How data is used (training, advertising, third-party sharing)
-
How long data is retained and whether it can be deleted
-
Whether the data practices align with what users likely expect
Provide specific quotes from the policy to support your analysis. Identify at least two concrete privacy risks associated with the system.
Broader Societal Impact (10 points)
For one of your three systems, analyze its broader societal implications:
-
Benefits: Identify at least three concrete benefits, with specific stakeholder groups who gain.
-
Risks/concerns: Identify at least three concrete concerns, including at least one from Section 1.5 (bias, privacy, job displacement, misinformation, power concentration).
-
Balanced assessment: Does the evidence suggest benefits outweigh risks, or vice versa? Acknowledge the complexity and avoid oversimplification.
-
Recommendations: Suggest two or three practical steps the system’s developers could take to reduce risks while preserving benefits.
Part 4: Professional Synthesis (25 Points)
Executive Summary (5 points)
Write a 200-300 word summary of your key findings across all three systems. Identify your most important takeaway about current AI capabilities and limitations. Name the one ethical concern from your testing that you consider most significant and explain why.
Comparative Analysis (10 points)
Create a table comparing all three systems across:
-
Which AI approach it uses
-
Key capabilities (2-3 per system)
-
Key limitations (2-3 per system)
-
Primary ethical concern
Following the table, write 2-3 paragraphs identifying patterns: What do your three systems have in common? Where do they differ? What does this tell you about the current state of AI?
Connect your observations to the four AI approaches from Section 1.1 — which approaches seem most common in deployed systems, and why do you think that is?
User Recommendations (5 points)
Write practical guidance for a non-technical user considering using one of your three systems. Be specific and evidence-based:
-
How can a user recognize when the system is performing well vs. poorly?
-
What types of tasks should users be cautious about delegating to this system?
-
How can a user identify potential bias in its outputs?
-
What steps can a user take to protect their privacy?
Personal Reflection (5 points)
Write a 200-300 word reflection connecting your hands-on testing experience to concepts from the course. Questions to address:
-
Did your testing confirm or challenge anything you read this week?
-
What surprised you most?
-
Did testing the systems change how you feel about using them yourself?
Use specific examples from your testing to illustrate your points.
Scoring Guide
| Section | Points | Quick Reference |
|---|---|---|
Testing Documentation (3 systems × 15 pts) |
45 |
System classification + capability testing + limitation testing + narrow AI evidence |
Ethical Assessment |
30 |
Bias testing (10) + privacy analysis (10) + societal impact (10) |
Professional Synthesis |
25 |
Executive summary (5) + comparative analysis (10) + user recommendations (5) + reflection (5) |
Total |
100 |
| Score | Grade | Description |
|---|---|---|
90-100 |
A |
Exceptional work. Thorough testing, insightful analysis, strong ethical assessment, excellent synthesis. Portfolio-quality. |
80-89 |
B |
Strong work. Good testing and analysis. Solid ethical assessment. Good synthesis with minor gaps. |
70-79 |
C |
Adequate work meeting basic requirements. Some areas need more depth. |
60-69 |
D |
Significant gaps. Superficial testing or weak analysis. |
Below 60 |
F |
Incomplete or serious deficiencies. Does not meet requirements. |
Potential Deductions
-
Poor formatting or unprofessional presentation: -3 points
-
Missing or poorly labeled screenshots: -3 to -5 points
-
Significantly under length (under 6 pages): -5 points
-
Missing citations (privacy policies, external sources): -2 points
-
Poor writing quality (numerous errors): -3 points
Bonus Extensions (Up to 10 Points)
Optional extensions for students who want to go deeper:
Cross-system challenge (5-10 pts): Design a single task and test all three systems on it. Compare their approaches, outputs, and failures. Analyze what the differences reveal about their architectures.
Historical context (5-10 pts): Research the history of one of your chosen technologies. Connect its development to the AI history timeline from Section 1.2.
Industry analysis (5-10 pts): Investigate the company behind one of your systems. What AI ethics framework, if any, do they claim to follow? Compare their stated values against the behavior you observed in testing.
Bias deep dive (5-10 pts): Conduct a comprehensive bias audit using an established framework (Google PAIR Guidebook, Microsoft Responsible AI toolkit, or the NIST AI RMF’s MAP function). Produce portfolio-quality documentation.
Bonus points are awarded only for truly exceptional additional work.
Academic Integrity
|
This assignment is individual work. Prohibited:
Violations: -50 to -100 points plus academic integrity referral. Permitted:
|
Submission Instructions
-
Compile your report as a single PDF document.
-
Use clear headings matching the four-part structure above.
-
Label all screenshots with the system name, date, and what you were testing.
-
Submit via the Brightspace assignment link for "Week 1 Lab: AI Systems in the Wild."
-
Questions? Post in the course Q&A forum or bring them to office hours.
Original content for CSC 114: Artificial Intelligence I, Central Piedmont Community College.
This work is licensed under CC BY-SA 4.0.