Commit 6504744f authored by Allen Bose(UST's avatar Allen Bose(UST

Add notebook

parents
{
"cells": [
{
"metadata": {},
"cell_type": "raw",
"source": "",
"id": "924893c1049cf0c7"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:32:13.291888Z",
"start_time": "2025-10-08T13:32:13.282373Z"
}
},
"cell_type": "code",
"source": [
"reviews = [\n",
" # Review 1: Positive\n",
" \"I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!\",\n",
"\n",
" # Review 2: Negative with specific issue\n",
" \"The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed.\",\n",
"\n",
" # Review 3: Mixed with a question\n",
" \"The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?\",\n",
"\n",
" # Review 4: Negative with multiple issues\n",
" \"My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive for 3 days.\",\n",
"\n",
" # Review 5: Positive but mentions a minor issue\n",
" \"Overall, I'm happy with the PureGlow Air Purifier. It's quiet and effective. My only complaint is that the replacement filters are a bit expensive.\"\n",
"]"
],
"id": "fd312978b3783683",
"outputs": [],
"execution_count": 1
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:33:27.500868100Z",
"start_time": "2025-10-08T13:32:38.700941Z"
}
},
"cell_type": "code",
"source": [
"# Simple Tokenization Example\n",
"from transformers import BertTokenizer, GPT2Tokenizer\n",
"\n",
"# Load tokenizers\n",
"\n",
"gpt2_tokenizer = GPT2Tokenizer.from_pretrained(r\"C:\\Users\\281879\\Documents\\AI\\week_1_assaingmnets\\Gpt-2\")\n",
"bert_tokenizer = BertTokenizer.from_pretrained(r\"C:\\Users\\281879\\Documents\\AI\\week_1_assaingmnets\\BERT-Certi\")\n",
"\n",
"\n",
"\n",
"print(\"Review:\", reviews[2])\n",
"print(\"\\n\" + \"=\"*50)\n",
"\n",
"# Tokenize with GPT-2\n",
"gpt2_tokens = gpt2_tokenizer.tokenize(reviews[2])\n",
"print(f\"\\nGPT-2 Tokens ({len(gpt2_tokens)} tokens):\")\n",
"print(gpt2_tokens)\n",
"\n",
"# Tokenize with BERT\n",
"bert_tokens = bert_tokenizer.tokenize(reviews[2])\n",
"print(f\"\\nBERT Tokens ({len(bert_tokens)} tokens):\")\n",
"print(bert_tokens)"
],
"id": "c9812515ebc6678f",
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\281879\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Review: The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?\n",
"\n",
"==================================================\n",
"\n",
"GPT-2 Tokens (41 tokens):\n",
"['The', 'ĠTitan', 'Ġsmart', 'watch', 'Ġis', 'Ġdecent', '.', 'ĠThe', 'Ġscreen', 'Ġis', 'Ġbright', 'Ġand', 'Ġthe', 'Ġfeatures', 'Ġare', 'Ġgood', ',', 'Ġbut', 'Ġthe', 'Ġstep', 'Ġcounter', 'Ġseems', 'Ġinaccurate', '.', 'ĠIt', \"'s\", 'Ġoff', 'Ġby', 'Ġat', 'Ġleast', 'Ġ20', '%.', 'ĠIs', 'Ġthere', 'Ġa', 'Ġway', 'Ġto', 'Ġcalibr', 'ate', 'Ġit', '?']\n",
"\n",
"BERT Tokens (44 tokens):\n",
"['the', 'titan', 'smart', '##watch', 'is', 'decent', '.', 'the', 'screen', 'is', 'bright', 'and', 'the', 'features', 'are', 'good', ',', 'but', 'the', 'step', 'counter', 'seems', 'inaccurate', '.', 'it', \"'\", 's', 'off', 'by', 'at', 'least', '20', '%', '.', 'is', 'there', 'a', 'way', 'to', 'cal', '##ib', '##rate', 'it', '?']\n"
]
}
],
"execution_count": 2
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"##### In this the token list are not identical got GPT-2 Tokens there are 41 tokens and BERT Tokens there are 44 tokens and the splitting pattern is also different\n",
"##### In GPT-2 it uses the symbol Ġ for tockenising and it refers that before this symbol space is there and in BERT it uses ## symbol and defines like one word splitted into 2\n",
"##### These models have distinct splitting patterns because of variation in Algorithm,training data,tried to balance how many words it can understand and how fast it can work"
],
"id": "b547dc499b4309f9"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:35:36.200411Z",
"start_time": "2025-10-08T13:35:36.163158Z"
}
},
"cell_type": "code",
"source": [
"from dotenv import load_dotenv\n",
"import os\n",
"\n",
"# Load environment variables from the .env file\n",
"load_dotenv()\n",
"api_key = os.getenv(\"GEMINI_API_KEY\")\n"
],
"id": "10dbb1049b5d1966",
"outputs": [],
"execution_count": 7
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:36:03.447262Z",
"start_time": "2025-10-08T13:36:01.686080Z"
}
},
"cell_type": "code",
"source": [
"from google import genai\n",
"from google.generativeai import configure, GenerativeModel\n",
"client = genai.Client(api_key=api_key)\n",
"for m in client.models.list():\n",
" print(m.name, m.supported_actions)\n",
"model = GenerativeModel(model_name=\"gemini-2.5-flash-lite\")\n"
],
"id": "cee160daa377d0d2",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"models/embedding-gecko-001 ['embedText', 'countTextTokens']\n",
"models/gemini-2.5-pro-preview-03-25 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-preview-05-20 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-lite-preview-06-17 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-pro-preview-05-06 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-pro-preview-06-05 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-pro ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-exp ['generateContent', 'countTokens', 'bidiGenerateContent']\n",
"models/gemini-2.0-flash ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-001 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-exp-image-generation ['generateContent', 'countTokens', 'bidiGenerateContent']\n",
"models/gemini-2.0-flash-lite-001 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-lite ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-preview-image-generation ['generateContent', 'countTokens', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-lite-preview-02-05 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-lite-preview ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-pro-exp ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-pro-exp-02-05 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-exp-1206 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-thinking-exp-01-21 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-thinking-exp ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.0-flash-thinking-exp-1219 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-preview-tts ['countTokens', 'generateContent']\n",
"models/gemini-2.5-pro-preview-tts ['countTokens', 'generateContent']\n",
"models/learnlm-2.0-flash-experimental ['generateContent', 'countTokens']\n",
"models/gemma-3-1b-it ['generateContent', 'countTokens']\n",
"models/gemma-3-4b-it ['generateContent', 'countTokens']\n",
"models/gemma-3-12b-it ['generateContent', 'countTokens']\n",
"models/gemma-3-27b-it ['generateContent', 'countTokens']\n",
"models/gemma-3n-e4b-it ['generateContent', 'countTokens']\n",
"models/gemma-3n-e2b-it ['generateContent', 'countTokens']\n",
"models/gemini-flash-latest ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-flash-lite-latest ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-pro-latest ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-lite ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-image-preview ['generateContent', 'countTokens']\n",
"models/gemini-2.5-flash-image ['generateContent', 'countTokens']\n",
"models/gemini-2.5-flash-preview-09-2025 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-2.5-flash-lite-preview-09-2025 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']\n",
"models/gemini-robotics-er-1.5-preview ['generateContent', 'countTokens']\n",
"models/gemini-2.5-computer-use-preview-10-2025 ['generateContent', 'countTokens']\n",
"models/embedding-001 ['embedContent']\n",
"models/text-embedding-004 ['embedContent']\n",
"models/gemini-embedding-exp-03-07 ['embedContent', 'countTextTokens', 'countTokens']\n",
"models/gemini-embedding-exp ['embedContent', 'countTextTokens', 'countTokens']\n",
"models/gemini-embedding-001 ['embedContent', 'countTextTokens', 'countTokens', 'asyncBatchEmbedContent']\n",
"models/aqa ['generateAnswer']\n",
"models/imagen-3.0-generate-002 ['predict']\n",
"models/imagen-4.0-generate-preview-06-06 ['predict']\n",
"models/imagen-4.0-ultra-generate-preview-06-06 ['predict']\n",
"models/imagen-4.0-generate-001 ['predict']\n",
"models/imagen-4.0-ultra-generate-001 ['predict']\n",
"models/imagen-4.0-fast-generate-001 ['predict']\n",
"models/veo-2.0-generate-001 ['predictLongRunning']\n",
"models/veo-3.0-generate-preview ['predictLongRunning']\n",
"models/veo-3.0-fast-generate-preview ['predictLongRunning']\n",
"models/veo-3.0-generate-001 ['predictLongRunning']\n",
"models/veo-3.0-fast-generate-001 ['predictLongRunning']\n",
"models/gemini-2.5-flash-preview-native-audio-dialog ['countTokens', 'bidiGenerateContent']\n",
"models/gemini-2.5-flash-exp-native-audio-thinking-dialog ['countTokens', 'bidiGenerateContent']\n",
"models/gemini-2.0-flash-live-001 ['bidiGenerateContent', 'countTokens']\n",
"models/gemini-live-2.5-flash-preview ['bidiGenerateContent', 'countTokens']\n",
"models/gemini-2.5-flash-live-preview ['bidiGenerateContent', 'countTokens']\n",
"models/gemini-2.5-flash-native-audio-latest ['countTokens', 'bidiGenerateContent']\n",
"models/gemini-2.5-flash-native-audio-preview-09-2025 ['countTokens', 'bidiGenerateContent']\n"
]
}
],
"execution_count": 9
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:36:12.466937Z",
"start_time": "2025-10-08T13:36:10.142939Z"
}
},
"cell_type": "code",
"source": [
"from google.generativeai import configure, GenerativeModel\n",
"\n",
"# Step 2: Configure with your API key\n",
"configure(api_key=api_key)\n",
"\n",
"# Step 3: Create the model\n",
"model = GenerativeModel(model_name=\"gemini-2.0-flash-exp\");\n",
"\n",
"# Step 4: Generate content\n",
"response = model.generate_content(f\"\"\"from this input {reviews} i need the output in the format positive,negative and mixed review like example 1: the moview is amazing example 2: positive\n",
"example 2:\n",
"Input: servie was terrible and slow\n",
"output: Negative\n",
"example 3:\n",
"Input:product is okay nothing special\n",
"output: Mixed\"\"\")\n",
"\n",
"\n",
"# Step 5: Print the result\n",
"print(response.text)\n"
],
"id": "59e929211ee96fa6",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Here's the classification of each review from your input, based on your provided examples:\n",
"\n",
"1. **Input:** 'I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!'\n",
" **Output:** Positive\n",
"\n",
"2. **Input:** 'The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed.'\n",
" **Output:** Negative\n",
"\n",
"3. **Input:** \"The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?\"\n",
" **Output:** Mixed\n",
"\n",
"4. **Input:** 'My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive for 3 days.'\n",
" **Output:** Negative\n",
"\n",
"5. **Input:** \"Overall, I'm happy with the PureGlow Air Purifier. It's quiet and effective. My only complaint is that the replacement filters are a bit expensive.\"\n",
" **Output:** Mixed\n",
"\n"
]
}
],
"execution_count": 10
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:36:25.269513Z",
"start_time": "2025-10-08T13:36:22.731210Z"
}
},
"cell_type": "code",
"source": [
"\n",
"response = model.generate_content(f\"\"\"From this review, I want the product_name, issue_summary and sentiment (positive, negative or mixed) in a JSON format.\n",
"\n",
"Examples:\n",
"Input: \"The Sony Speaker is so good, but battery drains so quickly\"\n",
"Output:\n",
"{{\n",
"\"product_name\": \"Sony Speaker\",\n",
"\"issue_summary\": \"battery drains so quickly\",\n",
"\"sentiment\": \"Mixed\"\n",
"}}\n",
"\n",
"Input: \"Movie is so good\"\n",
"Output:\n",
"{{\n",
"\"product_name\": \"N/A\",\n",
"\"issue_summary\": \"N/A\",\n",
"\"sentiment\": \"Positive\"\n",
"}}\n",
"\n",
"Now analyze this review:\n",
"Input: \"{reviews}\"\n",
"Output:\"\"\")\n",
"\n",
"\n",
"# Step 5: Print the result\n",
"print(response.text)\n"
],
"id": "eeccac983f0b013f",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"```json\n",
"[\n",
" {\n",
" \"product_name\": \"QuantumX Pro camera\",\n",
" \"issue_summary\": \"N/A\",\n",
" \"sentiment\": \"Positive\"\n",
" },\n",
" {\n",
" \"product_name\": \"SonicWave earbuds\",\n",
" \"issue_summary\": \"left earbud stopped charging after just one week\",\n",
" \"sentiment\": \"Negative\"\n",
" },\n",
" {\n",
" \"product_name\": \"Titan smartwatch\",\n",
" \"issue_summary\": \"step counter seems inaccurate\",\n",
" \"sentiment\": \"Mixed\"\n",
" },\n",
" {\n",
" \"product_name\": \"AeroDrone\",\n",
" \"issue_summary\": \"arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive\",\n",
" \"sentiment\": \"Negative\"\n",
" },\n",
" {\n",
" \"product_name\": \"PureGlow Air Purifier\",\n",
" \"issue_summary\": \"replacement filters are a bit expensive\",\n",
" \"sentiment\": \"Mixed\"\n",
" }\n",
"]\n",
"```\n"
]
}
],
"execution_count": 11
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-10-08T13:48:51.613267Z",
"start_time": "2025-10-08T13:48:49.018569Z"
}
},
"cell_type": "code",
"source": [
"\n",
"response = model.generate_content(f\"\"\"Analyze the following customer review to identify the root cause of their issue. First, state the main problem. Second, explain your reasoning in a single sentence. Let's think step by step.\n",
"from \"{reviews}\" i need to get like first the product name then state the main problem and then the reasoning\n",
"\n",
"\n",
"Analysis:\n",
"step 1 - : find main problem of that product\n",
"step 2 - : fin the reasoning part\"\"\")\n",
"\n",
"\n",
"# Step 5: Print the result\n",
"print(response.text)\n"
],
"id": "bab82e5965d4830a",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Here's the breakdown of each review, identifying the product, main problem, and reasoning:\n",
"\n",
"* **Product:** SonicWave earbuds\n",
" * **Main Problem:** Left earbud stopped charging after one week.\n",
" * **Reasoning:** The customer states the earbud failed after only a week, expressing disappointment given the price, indicating a quality control or design flaw.\n",
"\n",
"* **Product:** Titan smartwatch\n",
" * **Main Problem:** Inaccurate step counter.\n",
" * **Reasoning:** The customer states the step counter is off by at least 20%, suggesting a calibration or sensor issue.\n",
"\n",
"* **Product:** AeroDrone\n",
" * **Main Problem:** Arrived damaged and non-functional; unresponsive customer service.\n",
" * **Reasoning:** The drone arrived with a broken propeller and a dead battery, implying damage during shipping or a faulty product, compounded by the lack of customer service, indicating a problem with both product quality and support.\n",
"\n",
"* **Product:** PureGlow Air Purifier\n",
" * **Main Problem:** Replacement filters are expensive.\n",
" * **Reasoning:** The customer expresses a complaint about the high cost of replacement filters, implying that the ongoing cost of maintaining the product is a concern.\n",
"\n"
]
}
],
"execution_count": 17
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "23855b3226facfaf"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment