Settings Results in 4 milliseconds

Error: com.android.build.gradle.tasks.factory.Andr ...
Category: Android

When building a project in android studio and the project is ionic cordova android project, somet ...


Views: 242 Likes: 72
How to get current date in angular in HTML inline
Category: Research

In Angular, you can easily get the current date using the `Date` object. Here's how to do it</p ...


Views: 0 Likes: 33
ExecuteDataReader Error: An error occured while at ...
Category: SQL

Question How do I solve the ExecuteDataReader Error that says System.DataException. An error occ ...


Views: 0 Likes: 29
How do you perform Math in SQL
Category: Research

Math in SQL is an essential skill for anyone working with databases. It allows you to manipulate ...


Views: 0 Likes: 28
How to solve problems
Category: Software Development

Instead of asking what problems should I solve. Ask, what problems do I wish someone else would s ...


Views: 303 Likes: 121
How do you find a Product to sale online
Category: Research

Finding the right product to sell online can be a challenging task, but with the right approach, ...


Views: 0 Likes: 30
java.lang.IllegalArgumentException: unknown settin ...
Category: Questions

java.lang.IllegalArgumentException unknown setting [discovery.type ] did you mean [discovery.typ ...


Views: 225 Likes: 44
The endpoint is configured to use HTTP1.1 and HTT ...
Category: Research

IntroductionHTTP (Hypertext Transfer Protocol) is the protocol used for transferr ...


Views: 0 Likes: 33
Solved!! Refused to Execute Script from "domain na ...
Category: HTML5

Problem Refused to Execute Script from "domain name" because its MIME type ('text/html') is not ...


Views: 179 Likes: 76
The difference between iptable and nftables
Category: Research

Iptables and nftables are two powerful network security tools used in Linux systems. While they ...


Views: 0 Likes: 40
Benefit of eating Dandelions
Category: Research

Dandelions are often seen as weeds, but they are actually a nutritious and beneficial plant that ...


Views: 0 Likes: 24
Open Education On Google edu.google.com
Category: Education

Google is hidden with some awesome features that a typical user doesn't know about, for example, ...


Views: 260 Likes: 132
Incorrect syntax near the keyword 'with'
Category: SQL

Question Incorrect syntax near the keyword 'with'. If this statement is a common table expressio ...


Views: 0 Likes: 30
What is Go2RTC in Frigate NVR
Category: Research

Go2RTC is a feature of Frigate NVR that allows users to remotely control their cameras using Web ...


Views: 0 Likes: 33
How do I turn Black and White Movie into Color
Category: Research

Turning a black and white movie into color can be a challenging task, but it is definitely possi ...


Views: 0 Likes: 39
What is the difference between Cookie based Authen ...
Category: Research

Cookie-based authentication and bearer token-based authentication are two common methods used fo ...


Views: 0 Likes: 26
deepstack error: Docker CPU face recognize request ...
Category: Machine Learning

Question Docker CPU face recognize request with rand ...


Views: 0 Likes: 10
recurring neural network in c#
Category: Research

Recurrent Neural Networks (RNNs) are a type of Artificial Neural Network (ANN) that are designed ...


Views: 0 Likes: 28
Best Calculus Books to Learn Machine-Learning and ...
Category: Other

Learning calculus is an essential foundation for understanding machine learning, as it is used in ...


Views: 0 Likes: 8
Big Tech Blogs recommended by AI
Category: Computer Programming

If you're looking for blogs similar to the Twitter Engineering Blog, here are some suggestions</ ...


Views: 0 Likes: 29
Subquery returned more than 1 value. This is not p ...
Category: Research

When working with databases, it's important to understand how to write efficient and effective q ...


Views: 0 Likes: 30
Best White Paper Sources to Learn Machine-Learning ...
Category: Other

There are several sources where you can find white papers, including Researc ...


Views: 0 Likes: 8
Cannot insert the value NULL into column 'Id', tab ...
Category: Research

When working with databases, it is important to ensure that all data being inserted into the dat ...


Views: 0 Likes: 33
[Google Bard]: What is the Credible Sources I can ...
Category: Machine Learning

Here are some credible sites to read white papers TechRepublic&n ...


Views: 0 Likes: 27
How to Automate Income for a small Business in 202 ...
Category: Research

Diversifying income streams is a smart strategy for small businesses to reduce risk and explore ...


Views: 0 Likes: 6
Home Assistant Assist Commands are not working
Category: Research

Home Assistant is an open-source home automation platform that allows users to control various d ...


Views: 0 Likes: 25
What you should be doing as an LLC Owner
Category: Research

As an LLC owner, it is important to understand the legal requirements and best practices for man ...


Views: 0 Likes: 28
What is a Host in Asp.Net 8 Application
Category: Research

IntroductionIn an ASP.NET Core application, a host is responsible for starting an ...


Views: 0 Likes: 31
Why would a Vector Embedding presented as a single ...
Category: Research

Vector embeddings are a type of representation of data that use high-dimensional vectors to repr ...


Views: 0 Likes: 24
Bold text printing unbold
Category: Hardware

#If you're experiencing the #issue where #bold text is #printing unbold, you're not alone. This i ...


Views: 0 Likes: 30
What is the best way to learn AI in 2024?
Category: Research

IntroductionArtificial Intelligence (AI) has been around for decades, but it's only in r ...


Views: 0 Likes: 26
Farewell to Firewalls: Wi-Fi bugs open network dev ...
Category: Research

The rise of wireless networks has made it easier for people to connect their devices and access ...


Views: 0 Likes: 29
Computer Vision in PyTorch (Part 2): Preparing Data, Training, and Evaluating Your CNN for Pneumonia Detection
Computer Vision in PyTorch (Part 2) Preparing Dat ...

In Part 1 of this tutorial series, we explored the fundamentals of Convolutional Neural Networks (CNNs) and built a complete CNN architecture using PyTorch for pneumonia detection in chest X-rays. We learned why CNNs excel at image tasks, examined each component in detail, and implemented a custom PneumoniaCNN class by taking an OOP approach and subclassing PyTorch's nn.Module class. Now it's time to bring our model to life! In this tutorial, we'll complete our pneumonia detection system by Preparing and preprocessing the chest X-ray dataset Training our CNN model with a complete training loop Evaluating model performance using metrics like precision, recall, and F1 Interpreting evaluation results with a focus on visualizing predictions Addressing common CNN training issues like overfitting, underfitting, and class imbalance By the end of this tutorial, you'll have transformed your CNN architecture into a working medical diagnostic tool and gained practical skills for implementing and evaluating deep learning models. Let's get started! Prerequisites Before proceeding, make sure you've Read Part 1 of this tutorial series Installed PyTorch (follow PyTorch's official installation instructions) Reviewed fundamental deep learning concepts such as Layers Activation functions Loss Optimization Below are the modules, functions, and classes we’ll need for this tutorial. Be sure to install any libraries you’re missing after running this code import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader, Dataset import torchvision import torchvision.transforms as transforms from torchvision.datasets import ImageFolder import matplotlib.pyplot as plt import numpy as np import tarfile import os import collections import random from sklearn.metrics import confusion_matrix, classification_report from sklearn.model_selection import train_test_split from PIL import Image import seaborn as sns 1. Preparing and Preprocessing the X-ray Image Dataset We’ll start by preparing our chest X-ray dataset. The dataset contains X-ray images of lungs classified into two categories NORMAL and PNEUMONIA. Downloading and Extracting the Dataset The data for this tutorial is available for download here as a compressed tar.gz file. After downloading, you'll need to extract it to access the images # Path to the downloaded tar.gz file dataset_path = "xray_dataset.tar.gz" # If saved to your current directory # Extract the dataset with tarfile.open(dataset_path, "rgz") as tar tar.extractall() print("Dataset extracted successfully") After extraction, you should have this directory structure chest_xray/ +-- test/ ¦ +-- NORMAL/ ¦ +-- PNEUMONIA/ +-- train/ +-- NORMAL/ +-- PNEUMONIA/ Verifying Dataset Structure and File Counts After extracting the dataset, it's good practice to verify the contents and get a count of the image files. This ensures we're working with the correct data and helps identify potential issues early on. We'll create a small helper function to scan the train and test directories. This function will gather the file paths for all valid JPEG images and their corresponding class labels (0 for NORMAL, 1 for PNEUMONIA). Collecting these paths and labels now will also prepare us for the next step in preparing our data for training. # Define base directories relative to your notebook/script location data_dir = "chest_xray" train_dir = os.path.join(data_dir, "train") test_dir = os.path.join(data_dir, "test") # Define the classes based on the subfolder names class_names = ['NORMAL', 'PNEUMONIA'] class_to_idx = {cls_name i for i, cls_name in enumerate(class_names)} # Helper function to scan directories, filter JPEG images, and collect paths/labels def get_image_paths_and_labels(data_dir) image_paths = [] labels = [] print(f"Scanning directory {data_dir}") for label_name in class_names class_dir = os.path.join(data_dir, label_name) count = 0 # List files in the class directory for filename in os.listdir(class_dir) # Keep only files ending with .jpeg (case-insensitive) if filename.lower().endswith('.jpeg') image_paths.append(os.path.join(class_dir, filename)) labels.append(class_to_idx[label_name]) count += 1 print(f" Found {count} '.jpeg' images for class '{label_name}'") return image_paths, labels # Get paths and labels for the training set all_train_paths, all_train_labels = get_image_paths_and_labels(train_dir) train_counts = collections.Counter(all_train_labels) total_train_images = len(all_train_paths) print(f"Training Set Counts") print(f" NORMAL (Class 0) {train_counts[class_to_idx['NORMAL']]}") print(f" PNEUMONIA (Class 1) {train_counts[class_to_idx['PNEUMONIA']]}") print(f" Total Training Samples {total_train_images}") # Get paths and labels for the test set all_test_paths, all_test_labels = get_image_paths_and_labels(test_dir) test_counts = collections.Counter(all_test_labels) total_test_images = len(all_test_paths) print(f"Test Set Counts") print(f" NORMAL (Class 0) {test_counts[class_to_idx['NORMAL']]}") print(f" PNEUMONIA (Class 1) {test_counts[class_to_idx['PNEUMONIA']]}") print(f" Total Test Samples {total_test_images}") Running this code will scan the directories and produce the following counts Scanning directory chest_xray/train Found 1349 '.jpeg' images for class 'NORMAL' Found 3883 '.jpeg' images for class 'PNEUMONIA' Training Set Counts NORMAL (Class 0) 1349 PNEUMONIA (Class 1) 3883 Total Training Samples 5232 Scanning directory chest_xray/test Found 234 '.jpeg' images for class 'NORMAL' Found 390 '.jpeg' images for class 'PNEUMONIA' Test Set Counts NORMAL (Class 0) 234 PNEUMONIA (Class 1) 390 Total Test Samples 624 Excellent! Our helper function scanned the directories and gave us clean lists of all the usable JPEG images and their corresponding labels for both training and testing (stored in variables like all_train_paths, all_train_labels, etc.). Now, looking at the training counts (1349 NORMAL to 3883 PNEUMONIA), something immediately stands out there are almost three times as many pneumonia examples in our training data! This situation, where one class significantly outnumbers another, is called class imbalance. While techniques exist to directly address class imbalance during training (we’ll talk about those later), our plan for now is to first train the model using the data as-is. That said, this imbalance means we'll need to be especially careful when we get to evaluating the model's performance. We can't just rely on overall accuracy; we'll need to use specific metrics that tell us how well the model identifies both classes fairly. But we’ll get to that too. Having prepared the lists of training image paths and labels, we're now ready for the next important step in preparing our data splitting off a portion of the training images to create a validation set. Why You Need a Validation Set Now that we have lists of our training images and associated labels, you might be thinking, “Why did we need those specific lists?” Well, before we train our model, we need to set aside a portion of that training data to create a validation set. It might seem strange to not use all available data for training, but this validation split is vital for trustworthy model development. Here's why Tuning & Monitoring While training, we need to monitor how well the model is learning and potentially tune things like the learning rate or decide when to stop training. We need a dataset for this that the model isn't directly training on, but which isn't our final, untouched test set. That's the validation set's job. Avoiding Data Leakage If we used the test set to make these tuning decisions, we'd essentially be "leaking" information about the test set into our model development process. The model might end up looking good on that specific test set simply because we optimized for it, but fail to generalize to new, truly unseen data. Unbiased Final Test The test set should only be used once, at the very end, after all training and tuning are complete, to get an unbiased estimate of the final model's performance. So, we reserve the test_paths/test_labels for the final evaluation and split our all_train_paths/all_train_labels into two new subsets one for actual training, and one for validation during development. We'll use the train_test_split function from scikit-learn for this. Because we identified a class imbalance earlier, we'll use the stratify option to ensure both the new training subset and the validation set maintain the original proportion of NORMAL and PNEUMONIA images. Using random_state ensures the split is the same every time the code runs. # Define proportion for validation set val_split_ratio = 0.2 SEED = 42 # Perform stratified split train_paths, val_paths, train_labels, val_labels = train_test_split( all_train_paths, all_train_labels, test_size=val_split_ratio, stratify=all_train_labels, random_state=SEED ) # Print the number of samples in each resulting set print(f"Original training image count {len(all_train_paths)}") print(f"--> Split into {len(train_paths)} training samples") print(f"--> Split into {len(val_paths)} validation samples") Running this will perform the split and show the resulting counts Original training image count 5232 --> Split into 4185 training samples --> Split into 1047 validation samples We now have distinct lists of file paths and corresponding labels for our training data (train_paths, train_labels) and our validation data (val_paths, val_labels). These lists tell us which images belong in each set. But simply having file paths isn't enough to feed data into a PyTorch model. Each image needs to be loaded and undergo several processing steps first. These include standard operations like resizing all images to a consistent 256×256 dimension and converting them into the correct format (single-channel grayscale tensors). Additionally, to help our model learn more robust features from a smaller dataset and generalize better from our specific training images, we'll apply a technique called data augmentation, but only on the training set. Understanding Data Augmentation We have our training images identified, but deep learning models often benefit from seeing a large variety and quantity of data. What if our training set, particularly after splitting, isn't large or diverse enough to teach the model to generalize well to all possible variations it might encounter in new X-rays? This is where data augmentation comes in. What is Data Augmentation? Data augmentation is a technique used to artificially increase the diversity of your training dataset without actually collecting new images. It involves applying random, yet realistic, transformations to the images during the training process. Each time the model sees an image from the training set, it might see a slightly altered version (e.g., flipped horizontally or slightly rotated). Why Use Data Augmentation? Improved Generalization & Robustness By exposing the model to these variations (like different orientations or flips), it learns to focus on the underlying patterns relevant to the task (e.g., signs of pneumonia) rather than potentially irrelevant characteristics like the exact positioning of the patient. This helps the model generalize better to new, unseen images that might have similar slight variations. Reduced Overfitting It effectively increases the perceived size of the training set, making it harder for the model to simply memorize the training examples. This is particularly valuable when working with specialized datasets (like medical images) that might be smaller than general-purpose image datasets. Our Chosen Augmentations For this tutorial, we'll apply two simple and common augmentation techniques using torchvision.transforms transforms.RandomHorizontalFlip(0.5) This randomly flips the image horizontally (left-to-right) with a default probability of 50%. transforms.RandomRotation(10) This randomly rotates the image by a small angle, in this case, up to 10 degrees in either direction. These simple variations help the model learn features that aren't dependent on perfect orientation or specific left-right positioning. Many other augmentation techniques exist, like adjusting brightness/contrast, zooming, or shearing, but we'll stick to these two for now. Important Training Only! Crucially, data augmentation is applied only to the training set. We do not apply random augmentations to the validation or test sets. Why? Because we need a consistent and unbiased measure of the model's performance on unmodified data during validation (for tuning) and testing (for final evaluation). Augmenting validation/test data would introduce randomness that makes performance measurement unreliable. Now that we understand the concept and benefits of data augmentation, let's define the complete image transformation pipelines for our training, validation, and test sets. Defining Image Transforms Now that we have lists specifying which images belong to our training and validation sets, we need to define how to process each image file into a standardized tensor format suitable for our PyTorch model. This involves creating processing pipelines using torchvision.transforms. We'll need slightly different pipelines for training data (which includes random augmentation) and for validation/test data (which does not). These pipelines need to perform several key operations consistently for every image Resize to Fixed Dimensions To meet our model's required 256×256 input size, the first step is transforms.Resize((256, 256)). Be aware that because the original X-rays vary in size, this forces a square aspect ratio and will distort non-square images by stretching or squashing them. While this could potentially obscure subtle diagnostic cues related to shape or proportion, CNNs can often adapt and learn effectively from such consistently distorted data. We'll use this standard resizing approach for our fixed-input model, but keep in mind that if evaluation reveals performance issues potentially linked to shape distortion, exploring aspect-preserving alternatives (like padding before resizing) would be a logical next step to investigate. Ensure Grayscale The model architecture also expects single-channel grayscale images (in_channels=1). To guarantee this format for all images processed, we include transforms.Grayscale(num_output_channels=1). Data Augmentation (Training Only) For the training pipeline (train_transforms), we'll insert the transforms.RandomHorizontalFlip() and transforms.RandomRotation(10) steps discussed in the previous section to help the model generalize better. These are not included in the validation/test pipeline. Convert to Tensor & Scale The final step is transforms.ToTensor(). This performs two critical functions it converts the processed PIL Image object into a PyTorch tensor, and it scales the pixel values from the original integer range [0, 255] down to a floating-point range of [0.0, 1.0]. This [0, 1] scaling acts as our input normalization for this tutorial. We are opting for this simpler approach instead of standardizing with a separate transforms.Normalize(mean, std) step, relying partly on the BatchNorm2d layers within our model to help adapt to the input distribution during training. Creating the Pipelines With these steps decided, we define two distinct data process pipelines using transforms.Compose # Transformations for the training set (including augmentation) train_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.Grayscale(num_output_channels=1), transforms.RandomHorizontalFlip(), transforms.RandomRotation(10), transforms.ToTensor() # Converts to tensor AND scales to [0, 1] ]) # Transformations for the validation and test sets (NO augmentation) val_test_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.Grayscale(num_output_channels=1), transforms.ToTensor() # Converts to tensor AND scales to [0, 1] ]) print("Transformation pipelines defined.") Now that we've defined how to process the images with train_transforms and val_test_transforms, we need an efficient way to connect these pipelines to our lists of image paths (train_paths, val_paths, etc.). Specifically, we need a structure that can take an index, find the corresponding image path and label, load the image file, apply the correct transformations, and provide the resulting tensor and label to PyTorch for training or evaluation. This requires creating a custom PyTorch Dataset. Let's build that next. Creating a Custom PyTorch Dataset We have our lists of image paths, and we have our processing pipelines, so now let’s bring them together so PyTorch can load and transform images during training and evaluation. While PyTorch offers built-in datasets like ImageFolder, they assume a specific directory structure and aren't ideal for using pre-split lists of file paths with different transforms assigned to each split. Thankfully, PyTorch makes it straightforward to create our own custom dataset handling logic by inheriting from the base torch.utils.data.Dataset class. A custom Dataset needs to implement three essential methods __init__(self, ...) Initializes the dataset, typically by storing file paths, labels, and any necessary transformations. __len__(self) Returns the total number of samples in the dataset. __getitem__(self, idx) Loads and returns a single sample (usually an image tensor and its label) from the dataset, given an index idx. This method is where the image loading and transformations are actually applied, often "just-in-time" when the sample is requested. Let's define our XRayDataset class class XRayDataset(Dataset) """Custom Dataset for loading X-ray images from file paths.""" def __init__(self, image_paths, labels, transform=None) """ Args image_paths (list) List of paths to images. labels (list) List of corresponding labels (0 or 1). transform (callable, optional) Optional transform to be applied on a sample. """ self.image_paths = image_paths self.labels = labels self.transform = transform def __len__(self) """Returns the total number of samples in the dataset.""" return len(self.image_paths) def __getitem__(self, idx) """ Fetches the sample at the given index, loads the image, applies transformations, and handles potential errors. Args idx (int) The index of the sample to fetch. Returns tuple (image_tensor, label) if successful. None If an error occurs (e.g., file not found, processing error), signalling to skip this sample. """ # Get the path and label for the requested index img_path = self.image_paths[idx] label = self.labels[idx] try # Load the image using PIL within a context manager with Image.open(img_path) as img # Apply transforms ONLY if they exist if self.transform # Apply the entire transform pipeline image_tensor = self.transform(img) # Return the processed tensor and label return image_tensor, label else # This branch indicates a setup error, as the transform # pipeline should at least contain ToTensor(). raise ValueError(f"Dataset initialized without transforms for {img_path}. " "Transforms (including ToTensor) are required.") except FileNotFoundError # Handle cases where the image file doesn't exist print(f"Warning Image file not found at {img_path}. Skipping sample {idx}.") return None # Returning None signals to skip except ValueError as e # Catch the specific error we raised for missing transforms print(f"Error for sample {idx} at {img_path} {e}") raise e # Re-raise critical setup errors except Exception as e # Catch any other PIL loading or transform errors print(f"Warning Error processing image {img_path} (sample {idx}) {e}. Skipping sample.") return None # Returning None signals to skip Explanation __init__ The constructor (__init__) is straightforward. It simply stores the essential information passed when we create an instance of XRayDataset the list of image paths, the corresponding list of labels, and the specific torchvision.transforms pipeline that should be applied to images from this dataset. __len__ This method allows PyTorch code to easily get the total size of the dataset by simply returning the number of image paths provided during initialization. __getitem__ This is the core method where the actual data loading and processing happens for a single sample. When requested by its index (idx), it performs the following steps Retrieves the image file path and label using the index. Opens the image file using the PIL library. Applies the entire transformation pipeline (like train_transforms or val_test_transforms) stored in self.transform. Returns the processed image tensor and its integer label if successful. Crucially, this loading and transforming happens "on demand" or "lazily." The implementation also includes basic error handling if an image file is missing or fails during processing, it prints a warning and returns None, signaling that this sample should be skipped. This XRayDataset class gives us a blueprint for handling our image data. With this class defined, we can now create the specific Dataset instances we need one for our training data using train_paths and train_transforms, one for validation using val_paths and val_test_transforms, and one for our test set. Let's instantiate these datasets next. Creating Final Datasets and DataLoader Objects With our XRayDataset class ready, we can now instantiate it for each of our data splits. We'll pair the appropriate lists of image paths and labels with the corresponding transformation pipelines we defined earlier. # Instantiate the custom Dataset for each split train_dataset = XRayDataset( image_paths=train_paths, labels=train_labels, transform=train_transforms # Apply training transforms (incl. augmentation) ) val_dataset = XRayDataset( image_paths=val_paths, labels=val_labels, transform=val_test_transforms # Apply validation transforms (no augmentation) ) test_dataset = XRayDataset( image_paths=all_test_paths, # Using all_test_paths from verification step labels=all_test_labels, # Using all_test_labels from verification step transform=val_test_transforms # Apply validation/test transforms ) # Print dataset sizes to confirm print("Final Dataset objects created") print(f" Training dataset size {len(train_dataset)}") print(f" Validation dataset size {len(val_dataset)}") print(f" Test dataset size {len(test_dataset)}") This gives us three Dataset objects, each knowing how to access and transform its specific set of images. Final Dataset objects created Training dataset size 4185 Validation dataset size 1047 Test dataset size 624 Introducing DataLoader While Dataset objects allow us to access individual processed samples via dataset[index], we typically train neural networks on mini-batches of data, not one sample at a time. Processing batches is more computationally efficient and helps stabilize the learning process. PyTorch's torch.utils.data.DataLoader class is designed precisely for this. It takes a Dataset object and provides an iterable that yields batches of data. Key features include Batching Automatically groups individual samples from the Dataset into batches of a specified size (batch_size). Shuffling Can automatically shuffle the training data at the beginning of each epoch (shuffle=True) to ensure the model doesn't learn based on the order of examples. Shuffling is typically disabled for validation and testing for consistent evaluation. Parallel Loading Can use multiple background worker processes (num_workers) to load data concurrently, preventing data loading from becoming a bottleneck during training, especially when using a GPU. The num_workers argument specifies how many subprocesses to use for data loading. While values > 0 can speed things up by loading data in parallel, they can sometimes cause issues in certain environments (like Colab notebooks). If you encounter errors during training related to workers, try setting num_workers=0, which loads data in the main process. Memory Pinning Can use pin_memory=True to speed up data transfer from CPU to GPU memory when training on CUDA-enabled devices. Creating the DataLoader Instances Let's create DataLoader instances for each of our datasets # Define batch size (can be tuned depending on GPU memory) batch_size = 32 # Create DataLoader for the training set train_loader = DataLoader( dataset=train_dataset, batch_size=batch_size, shuffle=True, # Shuffle data each epoch for training num_workers=2, # Number of subprocesses to use for data loading (adjust based on system) pin_memory=True # Speeds up CPU-GPU transfer if using CUDA ) # Create DataLoader for the validation set val_loader = DataLoader( dataset=val_dataset, batch_size=batch_size, shuffle=False, # No need to shuffle validation data num_workers=2, pin_memory=True ) # Create DataLoader for the test set test_loader = DataLoader( dataset=test_dataset, batch_size=batch_size, shuffle=False, # No need to shuffle test data num_workers=2, pin_memory=True ) print(f"DataLoaders created with batch size {batch_size}.") With train_loader, val_loader, and test_loader created, our data preparation pipeline is complete! These loaders are now ready to efficiently supply batches of preprocessed image tensors and labels to our model during the training, validation, and testing phases. A good next step is often to visualize a few images from the train_loader to visually inspect the results of the transformations and augmentations before proceeding to model training. Visualizing Sample Images Before we start training, it's crucial to visually inspect the output of our DataLoader objects. This acts as a sanity check to ensure our data loading, preprocessing, and augmentation steps are working correctly – essentially, we get to "see what the model will see." Let's create a helper function to display a batch of images def show_batch(dataloader, class_names, title="Sample Batch", n_samples=8) """Displays a batch of transformed images from a DataLoader.""" try images, labels = next(iter(dataloader)) # Get one batch except StopIteration print("DataLoader is empty or exhausted.") return # Limit number of samples to display if batch is smaller than n_samples actual_samples = min(n_samples, images.size(0)) if actual_samples <= 0 print("No samples found in the batch to display.") return images = images[actual_samples] labels = labels[actual_samples] # Tensors are likely on GPU if device='cuda', move to CPU for numpy/plotting images = images.cpu() labels = labels.cpu() # Determine subplot layout if actual_samples <= 4 ncols = actual_samples; nrows = 1; figsize = (3 * ncols, 4) else ncols = 4; nrows = 2; figsize = (12, 6) fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) if nrows == 1 and ncols == 1 axes = np.array([axes]) # Handle single plot case axes = axes.flatten() # Flatten axes array for easy iteration fig.suptitle(title, fontsize=16) for i in range(actual_samples) ax = axes[i] img_tensor = images[i] # Shape is [C=1, H, W], scaled [0.0, 1.0] # Reminder ToTensor scaled pixels to [0, 1] # Matplotlib can directly display tensors in this range with cmap='gray' # Permute dimensions from [C, H, W] to [H, W, C] for matplotlib img_display = img_tensor.permute(1, 2, 0).numpy() # Display the image, removing the channel dimension using squeeze() for grayscale # Specify vmin/vmax ensures correct display range for float data ax.imshow(img_display.squeeze(), cmap='gray', vmin=0.0, vmax=1.0) ax.set_title(f"Class {class_names[labels[i]]}") # Use passed class_names ax.axis('off') # Hide any unused subplots if the grid is larger than needed for j in range(actual_samples, len(axes)) axes[j].axis('off') plt.tight_layout() plt.subplots_adjust(top=0.88 if title else 0.95, hspace=0.3) # Adjust for suptitle plt.show() # Visualize training samples (should show augmentations) print("Visualizing a batch from train_loader...") show_batch(train_loader, class_names, title="Sample Processed Training Images") # Visualize validation samples (should NOT show augmentations) print("Visualizing a batch from val_loader...") show_batch(val_loader, class_names, title="Sample Processed Validation Images") Visualizing a batch from train_loader... Visualizing a batch from val_loader... Interpreting the Visualizations The images displayed above are samples drawn directly from our train_loader and val_loader. They reflect the full preprocessing pipeline Resized to 256×256 pixels. Converted to single-channel grayscale. If from train_loader Randomly flipped horizontally and/or slightly rotated due to data augmentation. Converted to PyTorch tensors with pixel values scaled to the [0.0, 1.0] range via transforms.ToTensor. What to Look For Format You should see grayscale chest X-rays, all uniformly sized. Labels Each image should have the correct class title ('NORMAL' or 'PNEUMONIA'). Augmentation Images from train_loader might show random variations (flips, rotations) each time you run this visualization. Images from val_loader should appear consistent without these random effects. Intensity Range The images are displayed directly from the [0, 1] scaled tensors. Ensure they look reasonable (not all black or all white, details visible). Orientation Marker & Augmentation You'll likely notice a letter marker, commonly an 'R', often placed in an upper corner of the X-rays. This marker indicates the patient's right side. Since standard chest X-rays are taken with the patient facing the detector, their right side appears on the left side of the image. Now, look closely at the samples from the train_loader if RandomHorizontalFlip was applied to an image, you’ll see this 'R' marker appearing reversed and on the right side of the image! This is a perfect visual confirmation that your training data augmentation is active. Images from the val_loader should consistently show the marker in its standard position (patient's right on the image's left). This visualization step confirms that our data loaders are correctly yielding processed image tensors in the format and range our model expects, with augmentations applied appropriately. With this confirmation, our data is ready for the main event training the CNN model. 2. Training Our CNN Model With our data prepared and loaded efficiently using DataLoader objects, we're ready to move on to the main event training the model to distinguish between NORMAL and PNEUMONIA chest X-rays. For this, we'll use the PneumoniaCNN architecture we carefully designed together previously. Instantiating the Model and Setting the Device The first step is to use the PneumoniaCNN class definition we built in Part 1 of this tutorial series. You'll need to make sure that Python class definition is available in your current environment, typically by copying the class PneumoniaCNN(nn.Module) ... block into a code cell and running it here if you haven't already. Once the PneumoniaCNN class is defined, we can create an instance of it. Don’t forget that we must then immediately move this model instance to the appropriate computing device (cpu or cuda) that we set up earlier. Performing operations between the model and data requires them both to reside on the same device. # Instantiate the model model = PneumoniaCNN() # Check if CUDA (GPU support) is available, otherwise use CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Move the model to the chosen device (GPU or CPU) model.to(device) print(f"Model '{type(model).__name__}' instantiated and moved to '{device}'.") Now our model object is created and resides on the correct device. Before we can start the training loop itself, we need two more key components A Loss Function To measure how inaccurate the model's predictions are compared to the true labels. An Optimizer To define the algorithm used to update the model's weights based on the calculated loss. Let's define these next. Defining the Loss Function and Optimizer With our model instantiated and placed on the correct device, we need two final components before building the training loop Loss Function This measures how far the model's predictions (logits) are from the actual target labels. The computed loss value is what the model tries to minimize during training. Optimizer This implements an algorithm (like Stochastic Gradient Descent or variations thereof) that updates the model's weights based on the gradients computed during the backward pass, aiming to reduce the loss. Let's define these for our task # Define loss function criterion = nn.CrossEntropyLoss() # Define optimizer optimizer = optim.Adam(model.parameters(), lr=0.0001) print("Loss function and optimizer defined.") Explanation Loss Function (criterion) We instantiate nn.CrossEntropyLoss. This is the standard choice for multi-class classification problems like ours (Normal vs. Pneumonia). It's particularly convenient because it expects the raw, unnormalized scores (logits) directly from the model's final layer and internally applies the necessary calculations (like LogSoftmax and Negative Log-Likelihood loss) to determine the error. Optimizer (optimizer) We select optim.Adam, a very popular and often effective optimization algorithm. It's known for its adaptive learning rate capabilities, meaning it can adjust the learning rate for each parameter during training, which frequently leads to faster convergence compared to simpler optimizers like basic SGD. model.parameters() We pass this to the optimizer to tell it exactly which tensors within our model are the learnable weights and biases that it should be updating. lr=0.0001 This argument sets the initial learning rate. It's a crucial hyperparameter controlling how large the updates to the weights are on each step. A value between 0.001 and 0.0001 is often a good starting point for the Adam optimizer, but it might need tuning later. Alright, all the preparatory pieces are in place! We have our instantiated model ready on the correct device, our DataLoaders (train_loader, val_loader, test_loader) prepared to serve batches of processed data, our criterion defined to measure loss, and our optimizer configured to update the model's parameters. We're finally ready to orchestrate the actual learning process by implementing the training loop. Implementing the Training Loop Now let's implement a complete training loop def train_model(model, train_loader, val_loader, criterion, optimizer, device, num_epochs=20) """Trains and validates the model.""" # Initialize lists to track metrics train_losses = [] val_losses = [] train_accuracies = [] val_accuracies = [] print("Starting Training...") # Training loop for epoch in range(num_epochs) # Training Phase model.train() # Set model to training mode (enables dropout, batch norm updates) running_loss = 0.0 correct_train = 0 total_train = 0 # Iterate over training data for i, (images, labels) in enumerate(train_loader) # Move data to the specified device images, labels = images.to(device), labels.to(device) # Zero the parameter gradients optimizer.zero_grad() # Forward pass outputs = model(images) loss = criterion(outputs, labels) # Backward pass and optimize loss.backward() optimizer.step() # Track training loss and accuracy running_loss += loss.item() * images.size(0) # loss.item() is the avg loss per batch _, predicted = torch.max(outputs.data, 1) total_train += labels.size(0) correct_train += (predicted == labels).sum().item() # Calculate training statistics for the epoch epoch_train_loss = running_loss / len(train_loader.dataset) epoch_train_acc = correct_train / total_train train_losses.append(epoch_train_loss) train_accuracies.append(epoch_train_acc) # Validation Phase model.eval() # Set model to evaluation mode (disables dropout, uses running stats for batch norm) val_loss = 0.0 correct_val = 0 total_val = 0 # Disable gradient calculations for validation with torch.no_grad() for images, labels in val_loader images, labels = images.to(device), labels.to(device) outputs = model(images) loss = criterion(outputs, labels) val_loss += loss.item() * images.size(0) _, predicted = torch.max(outputs.data, 1) total_val += labels.size(0) correct_val += (predicted == labels).sum().item() # Calculate validation statistics for the epoch epoch_val_loss = val_loss / len(val_loader.dataset) epoch_val_acc = correct_val / total_val val_losses.append(epoch_val_loss) val_accuracies.append(epoch_val_acc) # Print statistics for the epoch print(f"Epoch {epoch+1}/{num_epochs}") print(f" Train Loss {epoch_train_loss.4f}, Train Acc {epoch_train_acc.4f}") print(f" Val Loss {epoch_val_loss.4f}, Val Acc {epoch_val_acc.4f}") print("-" * 30) print("Finished Training.") # Return performance history return { 'train_losses' train_losses, 'train_accuracies' train_accuracies, 'val_losses' val_losses, 'val_accuracies' val_accuracies } This training function Tracks performance metrics (training and validation) over time. Switches between training (model.train()) and evaluation (model.eval()) modes correctly. Handles device placement for tensors (.to(device)). Implements the full train-validate cycle for each epoch. Returns a dictionary of training and validation history for later analysis. Now, let's start training our model # Train the model num_epochs = 20 history = train_model( model=model, train_loader=train_loader, val_loader=val_loader, criterion=criterion, optimizer=optimizer, device=device, num_epochs=num_epochs ) During training, you'll see output showing the model's progress Starting Training... Epoch 1/20 Train Loss 0.5181, Train Acc 0.8282 Val Loss 0.1428, Val Acc 0.9484 ------------------------------ Epoch 2/20 Train Loss 0.2066, Train Acc 0.9221 Val Loss 0.0897, Val Acc 0.9685 ------------------------------ Epoch 3/20 Train Loss 0.1632, Train Acc 0.9379 Val Loss 0.0708, Val Acc 0.9780 ------------------------------ ... (output for subsequent epochs) ... ------------------------------ Epoch 20/20 Train Loss 0.0832, Train Acc 0.9699 Val Loss 0.0468, Val Acc 0.9819 Finished Training. Training is complete! The output above gives us a snapshot of the loss and accuracy progress for each epoch on both the training and validation sets. We can see the model is learning, but to get a full picture of the trends over all 20 epochs, like how quickly the model converged, whether overfitting occurred, and how the validation performance truly compared to training, we should visualize these metrics instead. So let's plot the history next. Visualizing the Training Process Visualizing the training and validation metrics is the best way to understand how our model is learning. Plotting their loss/accuracy curves over epochs provides valuable insights into the learning dynamics. def plot_training_history(history) """Plots the training and validation loss and accuracy.""" fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5)) # Plot losses ax1.plot(history['train_losses'], label='Train Loss') ax1.plot(history['val_losses'], label='Validation Loss') ax1.set_xlabel('Epoch') ax1.set_ylabel('Loss') ax1.set_title('Training and Validation Loss') ax1.legend() ax1.grid(True) # Plot accuracies ax2.plot(history['train_accuracies'], label='Train Accuracy') ax2.plot(history['val_accuracies'], label='Validation Accuracy') ax2.set_xlabel('Epoch') ax2.set_ylabel('Accuracy') ax2.set_title('Training and Validation Accuracy') ax2.legend() ax2.grid(True) plt.tight_layout() plt.show() # Plot the training and validation history plot_training_history(history) These plots provide a clear visual summary of the entire training process over the 20 epochs. Overall Learning We can clearly see the learning trend both the blue (training) and orange (validation) loss curves decrease significantly from the start and then begin to level off, particularly towards the end. Correspondingly, both accuracy curves rise quickly and plateau at a high level. This confirms the model successfully learned from the data. Validation vs. Training Performance Notice how the orange validation loss curve consistently stays below the blue training loss curve, and the orange validation accuracy curve stays above the blue training accuracy curve. This pattern, where validation metrics appear better than training metrics, is often expected when using regularization techniques like Data Augmentation and Dropout. These techniques are applied only during the training phase (model.train()), making that phase slightly harder, but are turned off during validation (model.eval()), allowing the model's full capacity to be assessed on the consistent validation data. Overfitting Check We visually inspect the gap between the training and validation curves. Signs of significant overfitting would include the validation loss (orange) clearly starting to rise while the training loss (blue) continues to fall, or the validation accuracy stalling/dropping while training accuracy keeps climbing. Based on these plots, while there are minor fluctuations, the validation loss remains low and generally trends downwards or flat near the end. The gap between the curves doesn't appear to be dramatically widening, suggesting significant overfitting hasn't set in within these 20 epochs. Optimal Epoch & Training Duration Looking closely at the orange validation loss curve, it appears to reach its minimum value very late in training, around epoch 19 or 20. Similarly, validation accuracy plateaus at its peak in the last few epochs. This suggests that training for the full 20 epochs was beneficial for this specific run and learning rate, and stopping much earlier might have resulted in slightly suboptimal validation performance. TL;DR The plots show stable training with good convergence over 20 epochs. They visually confirm the expected impact of our training-only regularization (Val > Train metrics) and indicate that the model reached its best validation performance near the end of this training run without showing strong signs of overfitting yet. 3. Evaluating Our Pneumonia Detection CNN After training, we need to rigorously evaluate our model to understand its strengths and weaknesses, especially for a medical application like pneumonia detection. Calculating Key Metrics For medical diagnosis tasks, accuracy alone is insufficient. We need to consider Precision Of all cases predicted as pneumonia, how many actually have pneumonia? Recall Of all actual pneumonia cases, how many did we correctly identify? F1-score The harmonic mean of precision and recall Confusion Matrix A table showing true positives, false positives, true negatives, and false negatives Let's implement a detailed evaluation def evaluate_model(model, test_loader, device, class_names) """ Evaluates the model on a given dataloader (e.g., test set). Computes confusion matrix and classification report. """ model.eval() # Set model to evaluation mode all_preds = [] all_labels = [] with torch.no_grad() # Disable gradient calculation for images, labels in test_loader images, labels = images.to(device), labels.to(device) outputs = model(images) _, predictions = torch.max(outputs, 1) all_preds.extend(predictions.cpu().numpy()) all_labels.extend(labels.cpu().numpy()) all_preds = np.array(all_preds) all_labels = np.array(all_labels) # Calculate confusion matrix cm = confusion_matrix(all_labels, all_preds) # Calculate classification report class_report = classification_report( all_labels, all_preds, target_names=class_names, digits=4, zero_division=0 ) # Calculate overall accuracy from the report accuracy = np.trace(cm) / np.sum(cm) # Simple accuracy from confusion matrix return { 'confusion_matrix' cm, 'classification_report' class_report, 'accuracy' accuracy, 'predictions' all_preds, 'true_labels' all_labels } # Evaluate the model eval_results = evaluate_model(model, test_loader, device, class_names) # Print results print("Classification Report") print(eval_results['classification_report']) print(f"Overall Accuracy {eval_results['accuracy'].4f}") You should see output similar to Classification Report precision recall f1-score support NORMAL 0.9780 0.3803 0.5477 234 PNEUMONIA 0.7280 0.9949 0.8407 390 accuracy 0.7644 624 macro avg 0.8530 0.6876 0.6942 624 weighted avg 0.8217 0.7644 0.7308 624 Overall Accuracy 0.7644 Visualizing the Confusion Matrix A confusion matrix provides a clear visual representation of our model's performance def plot_confusion_matrix(confusion_matrix, class_names) plt.figure(figsize=(8, 6)) sns.heatmap( confusion_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names ) plt.xlabel('Predicted Label') plt.ylabel('True Label') plt.title('Confusion Matrix') plt.tight_layout() plt.show() # Plot confusion matrix plot_confusion_matrix(eval_results['confusion_matrix'], class_names) The confusion matrix shows True Negatives (top-left) Normal X-rays correctly identified as normal False Positives (top-right) Normal X-rays incorrectly identified as pneumonia False Negatives (bottom-left) Pneumonia X-rays incorrectly identified as normal True Positives (bottom-right) Pneumonia X-rays correctly identified as pneumonia Interpreting Results in a Medical Context In medical diagnosis, different types of errors have different consequences False Negatives (missing pneumonia) These are particularly dangerous as a patient with pneumonia might not receive necessary treatment, potentially leading to serious complications. Minimizing these is often a high priority (i.e., maximizing Recall/Sensitivity for the PNEUMONIA class). False Positives (diagnosing pneumonia when it's absent) These may lead to unnecessary treatment, causing stress and potential side effects, but are generally less immediately harmful than false negatives. Minimizing these relates to maximizing Recall/Specificity for the NORMAL class. Examining our actual results from the Classification Report above, we see Pneumonia Detection (Class 1) The model achieves extremely high Recall (Sensitivity) of ~0.9949. This is excellent, meaning it correctly identifies nearly 99.5% of the actual pneumonia cases in the test set, effectively minimizing dangerous False Negatives. However, its Precision for pneumonia is ~0.7280, meaning that when it predicts pneumonia, it's correct only about 73% of the time – the other 27% are False Positives (NORMAL cases misclassified as PNEUMONIA). Normal Case Detection (Class 0) The model still has very low Recall (Specificity) of ~0.3803. This indicates it only correctly identifies about 38% of the actual normal cases; the remaining 62% are misclassified as pneumonia (contributing to the lower precision for the PNEUMONIA class). The Precision for normal cases remains high (~0.9780), meaning if it predicts normal, it's very likely correct, but this model rarely makes that prediction for normal cases. Interpretation These results indicate the model is significantly biased towards predicting PNEUMONIA. It's highly sensitive but lacks specificity. In a real medical scenario The high sensitivity (~99.5%) is valuable for ensuring potential cases aren't missed. The low specificity (~38%) remains highly problematic, likely leading to a large number of unnecessary follow-ups for healthy individuals. While prioritizing sensitivity is common for screening, this level of specificity would likely be impractical. These results strongly suggest that the class imbalance in our training data is heavily influencing the model's predictions. To create a more balanced and clinically useful model, addressing this imbalance directly (using techniques like weighted loss or resampling, as discussed in Section 5) would be the most logical next step. 4. Visualizing Model Predictions Let's visualize some of our model's predictions to better understand its behavior def visualize_predictions(model, dataloader, device, class_names, num_samples=8) """Displays a batch of test images with their true labels and model predictions.""" model.eval() try images, labels = next(iter(dataloader)) except StopIteration print("DataLoader is empty.") return # Ensure we don't request more samples than available in the batch actual_samples = min(num_samples, images.size(0)) if actual_samples <= 0 print("No samples in batch to display.") return images, labels = images[actual_samples], labels[actual_samples] images_device = images.to(device) # Move input data to the correct device # Get model predictions with torch.no_grad() outputs = model(images_device) _, preds = torch.max(outputs, 1) probs = F.softmax(outputs, dim=1) # Move data back to CPU for plotting preds = preds.cpu().numpy() probs = probs.cpu().numpy() images = images.cpu() # Determine subplot layout if actual_samples <= 4 ncols = actual_samples; nrows = 1; figsize = (4 * ncols, 5) else ncols = 4; nrows = 2; figsize = (16, 10) fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) # Ensure axes is iterable if nrows == 1 and ncols == 1 axes = np.array([axes]) axes = axes.flatten() fig.suptitle("Sample Test Set Predictions", fontsize=16) for i, ax in enumerate(axes) if i < actual_samples img_tensor = images[i] true_label = class_names[labels[i]] pred_label = class_names[preds[i]] confidence = probs[i][preds[i]] # Prepare image for display (C, H, W) -> (H, W, C) img_display = img_tensor.permute(1, 2, 0).numpy() # Display image ax.imshow(img_display.squeeze(), cmap='gray', vmin=0.0, vmax=1.0) # Set title with prediction info and color coding title_color = 'green' if pred_label == true_label else 'red' title = f"True {true_label}Pred {pred_label}Conf {confidence.2f}" ax.set_title(title, color=title_color) ax.axis('off') else ax.axis('off') plt.tight_layout() plt.subplots_adjust(top=0.92) # Adjust layout for suptitle plt.show() # Visualize model predictions on the test set print("Visualizing sample predictions from the test set...") # Create a TEMPORARY DataLoader with shuffling enabled JUST for visualization # This helps ensure we see a mix of classes in the first batch we grab. # The 'test_loader' used for actual evaluation remains unshuffled. temp_vis_loader = DataLoader( dataset=test_dataset, # Use the same test_dataset batch_size=batch_size, # Use the same batch size shuffle=True # Shuffle ON for this temporary loader ) visualize_predictions(model, temp_vis_loader, device, class_names) This visualization provides concrete examples of the model's behavior on the test set We can clearly see examples of both correct (green titles) and incorrect (red titles) predictions made by the model. It allows us to observe the model's confidence for each prediction. Notice in this batch that the confidence scores are generally quite high (often >0.80), even for some of the incorrect classifications. Most importantly, we can identify potential patterns in the errors. In this specific sample batch, the errors primarily consist of True NORMAL images being incorrectly classified as PNEUMONIA, sometimes with high confidence. This visually reinforces the low Specificity (low Recall for the NORMAL class) identified in our quantitative evaluation metrics and highlights the model's tendency to misclassify normal cases. Making Predictions on Random Images Evaluating metrics like precision and recall gives us an overall sense of performance, but looking at individual predictions can provide more intuition. Let's see how our trained model performs on a randomly selected individual X-ray image from the NORMAL class in the test set. First, here's the helper function we'll use to load, preprocess, and get a prediction for a single image path def predict_image(model, image_path, transform, device, class_names) """Loads a single image, preprocesses it, and returns model prediction details.""" try # Load the image using PIL image = Image.open(image_path) except FileNotFoundError print(f"Error Image file not found at {image_path}") return None except Exception as e print(f"Error opening image {image_path} {e}") return None # Preprocess Apply validation/test transforms, add batch dimension, move to device image_tensor = transform(image).unsqueeze(0).to(device) # Make prediction model.eval() # Ensure model is in evaluation mode with torch.no_grad() # Disable gradient calculations output = model(image_tensor) # Output raw logits probabilities = F.softmax(output, dim=1) # Probabilities # Get the highest probability score and the corresponding class index confidence, predicted_class_idx = torch.max(probabilities, 1) # Extract results class_idx = predicted_class_idx.item() class_name = class_names[class_idx] # Map index to class name confidence_score = confidence.item() # Return results as a dictionary return { 'class_id' class_idx, 'class_name' class_name, 'confidence' confidence_score, 'probabilities' probabilities[0].cpu().numpy() # All class probabilities } Now, let's use this function on a random image from the test/NORMAL directory try normal_dir = os.path.join(test_dir, "NORMAL") # Target the NORMAL directory # Get only .jpeg files from the directory normal_test_files = [f for f in os.listdir(normal_dir) if f.lower().endswith('.jpeg')] if not normal_test_files print(f"No NORMAL test images found in {normal_dir}.") else # Select a random image file random_filename = random.choice(normal_test_files) test_image_path = os.path.join(normal_dir, random_filename) print(f"Predicting on random NORMAL image {random_filename}") # Get prediction using the function result = predict_image(model, test_image_path, val_test_transforms, device, class_names) if result # Display the prediction details print(f" Actual class NORMAL") # State the true class print(f" Predicted class {result['class_name']}") print(f" Confidence {result['confidence'].4f}") print(f" Class probabilities Normal={result['probabilities'][0].4f}, Pneumonia={result['probabilities'][1].4f}") # Visualize the image with prediction try img = Image.open(test_image_path) plt.figure(figsize=(6, 6)) plt.imshow(img, cmap='gray') # Include TRUE label in title for clarity plt.title(f"True NORMAL | Prediction {result['class_name']} ({result['confidence'].4f})") plt.axis('off') plt.show() except Exception as e print(f"Error displaying image {test_image_path} {e}") except FileNotFoundError print(f"Error Directory {normal_dir} not found.") except Exception as e print(f"An error occurred during prediction example {e}") Predicting on random NORMAL image IM-0011-0001-0001.jpeg Actual class NORMAL Predicted class PNEUMONIA Confidence 0.7053 Class probabilities Normal=0.2947, Pneumonia=0.7053 Here, we took a random test X-ray image (IM-0011-0001-0001.jpeg) known to be NORMAL. Our model, however, incorrectly predicted it as PNEUMONIA with moderate confidence (approx. 70.5%). This specific misclassification provides a clear example of the main weakness identified in our evaluation metrics the model's difficulty in correctly recognizing NORMAL cases (achieving only ~38.0% recall/specificity according to the Classification Report). Errors like this, where NORMAL images are falsely predicted as PNEUMONIA (False Positives), are why the overall Precision for the PNEUMONIA class was limited to ~72.8%. ****When the model predicts PNEUMONIA, roughly 27% of those predictions are actually NORMAL cases being misclassified. While the model remains excellent at catching actual PNEUMONIA (with ~99.5% recall/sensitivity), this tendency to misclassify NORMAL images highlights the impact of the class imbalance. Looking at the specific image, we can see prominent normal structures (bronchial/vascular markings); it's plausible that the model, biased by the imbalance, struggles to differentiate these complex normal patterns from potential abnormalities. Addressing this bias to improve specificity would clearly improve the model's clinical utility. This leads us nicely into exploring common training issues and techniques to mitigate them. 5. Addressing Common CNN Training Issues Now that we've trained and evaluated our model, we've seen some promising results but also potential areas for improvement (like the low specificity driven by class imbalance). Let's explore common issues encountered during CNN development and strategies to address them, considering our specific pneumonia detection task. Diagnosing and Addressing Overfitting Overfitting occurs when a model learns the training data too well, including its noise and specific quirks, rather than the underlying general patterns. This leads to poor performance on new, unseen data. Signs of overfitting in our training plots would include Training accuracy becoming much higher than validation accuracy. Training loss continuing to decrease significantly while validation loss plateaus or starts increasing. Strategy Early Stopping If you observe validation loss starting to increase (a clear sign of overfitting), one effective strategy is early stopping. Concept Monitor the validation loss after each epoch. Save the model's state whenever the validation loss reaches a new minimum. If the validation loss fails to improve for a predefined number of epochs (e.g., 5 or 10, known as "patience"), stop the training process. Finally, load the saved model state that achieved the best validation loss. Example Application For our project, this would involve modifying the training loop to keep track of the best epoch_val_loss seen so far, saving model.state_dict() at that point, and halting if the loss doesn't improve for the specified patience period. Handling Underfitting Underfitting is the opposite problem the model fails to learn the training data well enough, resulting in poor performance on both the training and validation/test sets. This often suggests the model is too simple or hasn't trained sufficiently. Potential Strategies Increase Model Complexity Make the model more powerful so it can capture more complex patterns. Example Application We could add a fourth convolutional block to our PneumoniaCNN definition or increase the number of output channels in the existing nn.Conv2d layers (e.g., going from 32 -> 64 -> 128 to perhaps 64 -> 128 -> 256). Train Longer Give the model more time to learn by increasing the number of training epochs. Example Application Simply call our train_model function with a larger value, like num_epochs=30 or num_epochs=50, while carefully monitoring for signs of overfitting using the validation metrics. Reduce Regularization Techniques like dropout prevent overfitting but can hinder learning if applied too aggressively when the model is underfitting. Example Application We could try lowering the dropout probability in our fully connected layers, for instance, changing nn.Dropout(p=0.5) to nn.Dropout(p=0.3). If using weight decay in the optimizer, we might reduce its strength. Learning Rate Adjustment The learning rate might be too low (slow learning) or too high (preventing convergence). Experimenting or using a scheduler can help. Example Application We could try initializing the Adam optimizer with a slightly different learning rate, like lr=0.001 or lr=0.005. Alternatively, we could implement a learning rate scheduler (e.g., torch.optim.lr_scheduler.ReduceLROnPlateau) that automatically reduces the learning rate if the validation loss stagnates. Addressing Class Imbalance As our verification step showed, the training set has a roughly 31 ratio of PNEUMONIA to NORMAL samples. This imbalance likely contributed to our model's bias towards predicting PNEUMONIA (high sensitivity, low specificity). Common strategies include Weighted Loss Function Modify the loss calculation to penalize errors on the minority class (NORMAL) more heavily. Example Application Calculate weights inversely proportional to class frequency (e.g., assign a weight of ~3 to the NORMAL class and ~1 to the PNEUMONIA class) and pass these weights to the weight parameter of nn.CrossEntropyLoss when defining our criterion. Resampling Adjust the sampling process during training to create more balanced batches. Example Application Oversampling the minority class involves drawing more samples (with replacement) from the NORMAL images during each epoch, perhaps using PyTorch's WeightedRandomSampler with the DataLoader. Undersampling the majority class involves randomly discarding some PNEUMONIA samples to match the number of NORMAL samples, though this risks losing potentially useful information. Generate Synthetic Data Create artificial examples of the minority class. Example Application This often involves more advanced techniques like SMOTE (Synthetic Minority Over-sampling Technique) or using Generative Adversarial Networks (GANs) to create new, realistic-looking NORMAL X-ray images, though implementing these is beyond the scope of this tutorial. Choosing the right strategy often involves experimentation. For class imbalance, using a weighted loss or resampling via WeightedRandomSampler are often effective starting points. Review and Next Steps Congratulations! You've successfully navigated this two-part tutorial series, journeying from the fundamentals of Convolutional Neural Networks all the way to building, training, and evaluating a practical pneumonia detection model using PyTorch. You've seen the entire workflow, from defining the architecture in Part 1 to preparing data, implementing training loops, interpreting results, and considering common challenges in Part 2. What You've Learned As we wrap up, let's distill the most important concepts and practices covered across both tutorials End-to-End Workflow Building an effective computer vision solution involves a complete pipeline careful data preparation (verification, splitting, augmentation, transformation), thoughtful model architecture definition (often using PyTorch's object-oriented nn.Module structure), implementing robust training loops (managing device placement and model modes), and performing rigorous evaluation tailored to the problem. Data is Foundational The quality and handling of your data are paramount. Accurate verification, appropriate splitting (train/validation/test), deliberate preprocessing choices (like resizing or grayscale conversion), and techniques like data augmentation significantly impact model performance and reliability. Evaluate Beyond Accuracy Especially for real-world applications like medical diagnosis, relying solely on accuracy can be misleading, particularly with imbalanced datasets. Metrics like precision, recall (sensitivity/specificity), F1-score, and confusion matrices provide a much deeper understanding of model strengths and weaknesses for each class. Practical Training Details Matter Correctly switching between model.train() and model.eval() is essential for layers like Dropout and BatchNorm to function properly. Being aware of potential issues like overfitting or class imbalance and knowing strategies to address them (e.g., early stopping, learning rate scheduling, weighted loss, resampling) are key practical skills for refining models. What You Can Try Next Your journey into computer vision with PyTorch doesn't have to end here! To deepen your skills, consider exploring these areas Transfer Learning Instead of training from scratch, leverage powerful models (like ResNet, VGG, DenseNet) pre-trained on large datasets (like ImageNet) and fine-tune them for your specific task. This often leads to better performance with less data and faster training. Cross-Validation Implement k-fold cross-validation for a more robust evaluation of your model's performance, reducing the dependency on a single train-validation split. Hyperparameter Tuning Systematically experiment with different learning rates, batch sizes, optimizer choices, network architectures, or augmentation strategies. Explainability Use techniques like Grad-CAM, SHAP, or LIME to understand why your model makes certain predictions. Visualize the image regions that most influence its decision. This is important for building trust, especially in medical AI. Remember that deep learning is as much an art as it is a science—experimentation, careful analysis, and domain knowledge all play important roles in creating effective solutions. Keep practicing these skills, and you'll be well-equipped to solve real-world problems with computer vision in PyTorch. Additional Resources To continue your learning journey PyTorch Documentation - Comprehensive reference for all PyTorch functions Sequence Models in PyTorch - If you're interested in extending your skills to sequential data Natural Language Processing with PyTorch - For processing text data with PyTorch Medical Image Analysis Papers - Recent research in medical image classification


Computer Vision in PyTorch (Part 1): Building Your First CNN for Pneumonia Detection
Computer Vision in PyTorch (Part 1) Building Your ...

Have you ever wondered how computers can recognize faces in photos or detect obstacles for self-driving cars? This capability stems from computer vision, the field of deep learning focused on enabling machines to interpret and understand visual information from the world around them. But how can this technology tackle more complex challenges, like analyzing medical images to aid diagnoses? In this two-part tutorial, you'll explore exactly that by learning how to use Convolutional Neural Networks (CNNs), a powerful type of neural network designed specifically for image analysis. You'll build your first CNN in PyTorch to analyze real chest X-ray images and identify signs of pneumonia. Whether you're new to computer vision or looking to apply your deep learning skills to a real-world problem, this tutorial series will guide you step-by-step through building, training, and evaluating your own image classification model. By the time you complete this tutorial, you will not only build your initial model but also be able to Explain how CNNs automatically extract important features from images. Understand the purpose of core CNN components like convolutional and pooling layers. Recognize why object-oriented programming is frequently used by professional deep learning practitioners. Define and build your own custom CNN architecture in PyTorch. Understanding the Pneumonia Detection Dataset Before we start designing our CNN architecture, let's first understand the dataset we'll be working with. This understanding will inform our design choices as we build out our model. We'll be working with a dataset of chest X-ray images labeled as either "NORMAL" or "PNEUMONIA." These medical images have specific characteristics we should keep in mind They're grayscale images (single-channel) rather than color (three-channel RGB) They contain subtle patterns that distinguish healthy lungs from those with pneumonia They show similar anatomical structures (lungs, heart, ribs) across patients, but with individual variations They have high resolution to capture fine details necessary for accurate diagnosis Here's what a NORMAL X-ray looks like (left) compared to a typical PNEUMONIA one (right) Notice how pneumonia appears as cloudy white areas in the lungs (which normally should be dark). These patterns are precisely what our CNN will learn to identify. Why CNNs Excel at Image Tasks If you've worked with traditional neural networks before, you might wonder why we need a specialized architecture for images. Why not just use a standard fully-connected network? If you were to try to train a traditional neural network on these X-ray images, you'd immediately face two major challenges Overwhelming parameter count A modest 256×256 grayscale X-ray contains 65,536 pixels. If we connected each pixel to just 1,000 neurons in the first hidden layer, we'd need over 65 million parameters for that layer alone! This would make the model Extremely slow to train Prone to severe overfitting Impractical for deployment in medical settings For perspective, the first convolutional layer in the CNN we will build in this tutorial achieves its initial feature extraction using only 320 parameters. Loss of critical spatial relationships When diagnosing pneumonia, the pattern and location of opacities in the lung matter tremendously. Traditional networks would immediately flatten images into 1D arrays, destroying the spatial information that doctors rely on. CNNs elegantly solve these problems through two ingenious design principles Local connectivity Rather than connecting to every pixel, each CNN neuron connects only to a small patch of the previous layer, much like how different parts of the visual cortex in our brains respond to specific regions of our visual field. This dramatically reduces parameters while preserving the ability to detect local patterns like the edges of lung structures. Parameter sharing The same set of filters (weights) is applied across the entire image. This makes intuitive sense since the feature that identifies pneumonia-related opacity should work regardless of whether it appears in the upper or lower lung. These design choices make CNNs particularly effective for analyzing medical images where accurately identifying spatial patterns can literally be a matter of life and death. Understanding CNN Components Now that we understand why CNNs are well-suited for image analysis, let's learn about the building blocks that make them work. These components will form the foundation of our pneumonia detection model. Convolutional Layers Are The Feature Extractors The heart of any CNN is the convolutional layer. Unlike standard fully-connected layers that look at all input values globally, convolutional layers work more like a magnifying glass scanning across the image. They use a small sliding window to examine sections of the input image one patch at a time. This approach allows them to effectively detect specific local patterns, like edges, corners, or simple textures, regardless of where those patterns appear in the overall image. This ability to recognize patterns independent of their location is fundamental to how CNNs process visual information. Now, let's look at how this sliding window operates. In the animation above, you can see the core process the small sliding window, technically called a kernel (the grid of weights, shown in white), moves (or convolves) across the input (green grid). At each position, it performs an element-wise multiplication between the kernel's weights and the underlying input values, and then sums the results to produce a single output value. This value becomes part of the output feature map (blue grid), which highlights where the pattern detected by the kernel was found. Interestingly, the kernel's weights aren't fixed; they are learnable parameters, automatically adjusted during training via backpropagation to become effective at detecting relevant patterns. For our pneumonia detection task, the filters in early convolutional layers might learn to detect simple features like edges (e.g., rib and organ boundaries) or basic textures. Filters in deeper layers can then combine these simpler features to recognize more complex patterns relevant to pneumonia, such as the characteristic cloudy opacities within the lungs. When defining a convolutional layer, you'll typically configure these key hyperparameters Kernel Size This defines the dimensions (height and width) of the kernel?the sliding window of weights. Common sizes are 3×3 or 5×5. Smaller kernels generally capture more localized, finer details, while larger kernels can identify broader, more spread-out patterns. Number of Filters This specifies how many different pattern detectors the layer will have. Each filter acts as a unique feature detector and consists of its own learnable kernel (weights) plus a single learnable bias term. So, conceptually filter = kernel + bias. The bias is a value added to the result of the convolution calculation (the sum of element-wise products) at each position. This learnable parameter allows the filter to adjust its output threshold independently of the weighted sum of inputs, increasing the model's flexibility to learn patterns. Applying one filter across the input produces one 2D feature map in the output. Therefore, the number of filters you specify directly determines the number of output channels (the depth) of the layer's output volume. More filters allow the network to learn a richer set of features simultaneously, but also increase the number of parameters and computational load. Stride This controls how many pixels the kernel slides across the input at each step. A stride of 1 (as in the animation above) means it moves one pixel at a time. A larger stride (like 2, as shown in the animation below) causes the kernel to skip pixels, resulting in a smaller output feature map (dimensionally) and potentially faster computation, but with less spatial detail captured. Padding This parameter controls whether pixels are added around the border of the input before the convolution operation. The two main strategies are No Padding (sometimes called 'valid' padding) In this mode, the kernel only slides over positions where it fully overlaps the input data. This causes the output feature map's height and width to shrink relative to the input dimensions (unless the kernel size is 1×1). The convolution is only computed for 'valid' positions where the kernel fits entirely. Zero Padding Pixels with a value of zero are added symmetrically around the input's border. This technique gives you control over the output dimensions. A common goal is to calculate the right amount of padding (based on kernel size) to achieve 'same' padding, where the output feature map has the same height and width as the input map (this is typically used when the stride is 1). Using 'same' padding helps preserve information throughout the network, especially features located near the edges of the input, which can be valuable when analyzing medical images where abnormalities might appear anywhere. Input and Output Shapes (Channels) Convolutional layers operate on input data arranged as 3D volumes with dimensions (height × width × input channels). They also produce output feature maps arranged as a 3D volume (output height × output width × output channels). The number of output channels is set by the Number of Filters hyperparameter you choose for the layer, as we discussed; each filter produces one channel (feature map) in the output. The number of input channels for a layer isn't typically a hyperparameter you tune; instead, it must match the number of channels in the data coming into that layer. For the very first convolutional layer that processes the raw image, this depends on the image type Grayscale images (like our X-rays) These have only one channel (input_channels=1). Why? Because each pixel's value represents only a single piece of information its intensity or brightness (from black to white). Color images These typically have three channels (input_channels=3). Why? Because they represent the intensity of three primary colors Red, Green, and Blue (RGB), which are needed to create the full color spectrum at each pixel position. For any subsequent convolutional layer deeper in the network, its input_channels must be equal to the output_channels (the number of filters) of the layer immediately preceding it, ensuring the dimensions match up correctly. The output feature map's height and width will depend on the input dimensions combined with the layer's kernel size, stride, and padding settings. Pooling Layers Focusing on What Matters After applying convolutions and detecting features, pooling layers help the network Reduce the spatial dimensions of feature maps Focus on the most important information Gain some resistance to small translations or shifts in the image The animation demonstrates max pooling, which divides the input into regions and takes only the maximum value from each. For pneumonia detection, this helps the network focus on the strongest indicators of disease while ignoring less relevant details. Max pooling creates a form of translation invariance because the network cares more about whether a feature is present than its exact location. This is useful for our task since pneumonia patterns can appear in slightly different locations across patients. Batch Normalization Stabilizing Training Medical image datasets like our pneumonia X-rays can have high variability in pixel intensity and contrast. Batch normalization helps stabilize the learning process by standardizing the inputs to each layer. By normalizing each batch of data during training, batch normalization Enables faster and more stable training Makes the model less sensitive to poor weight initialization Adds a mild regularization effect Allows for higher learning rates without divergence When building deep CNNs for medical imaging, batch normalization can be particularly valuable for handling the variability across different X-ray machines and imaging protocols. These components are often grouped together in repeating blocks within modern CNNs. A frequently used and effective structure for such a block is Convolutional Layer Batch Normalization Layer Activation Function (e.g., ReLU) Pooling Layer (optional, depending on the specific architecture) Dropout Layers Preventing Overfitting Medical imaging datasets like chest X-rays often contain far fewer examples than large-scale datasets like ImageNet. That makes it easier for a model to memorize the training data instead of learning patterns that generalize to new patients. To combat this, we’ll use dropout—a regularization technique that reduces overfitting by randomly disabling neurons during training. In the animated example below, you can see how a dropout layer with a 0.5 probability temporarily disables two out of four nodes on each forward pass. Notice how it’s not always the same two—it changes every time, forcing the network to build redundant pathways. In our pneumonia classifier, we’ll apply dropout usually within the fully connected layers near the end of the network. This helps ensure that the final classification doesn’t rely too heavily on any single feature learned earlier, helping the model generalize better to new chest X-rays. From Components to Architecture Now that we understand the individual CNN components, let's consider how to assemble them into a complete model architecture for our pneumonia detection task. Before designing the specific architecture (what we'll build), it's helpful to discuss the standard programming approach used to define such models in PyTorch (how we'll build it). Why Object-Oriented Models Are the Standard PyTorch offers multiple ways to define neural networks, but the object-oriented programming (OOP) approach using the nn.Module class is widely recognized as the standard for professional development. Let's explore why this approach is so beneficial, both for our current project and for your future computer vision work. When you look at how complex deep learning models are built in practice, whether for image recognition, autonomous navigation, natural language processing, or scientific discovery, you'll find they’re typically defined using object-oriented principles. This approach offers several key advantages Modularity OOP allows us to define reusable building blocks (like custom convolutional blocks or specific layer sequences) that can be easily stacked, swapped, and reconfigured. This is valuable when experimenting with different architectural ideas for any computer vision task, including optimizing models for medical image analysis. Maintainability Real-world models often need to evolve as new research emerges or project requirements change. The clear structure provided by OOP makes models easier to understand, debug, update, and collaborate on, whether you're incorporating a new state-of-the-art technique or adapting your model for a different dataset. Flexibility Many computer vision tasks benefit from custom operations or network structures that go beyond simple sequential layer stacking. OOP readily supports building complex, non-sequential architectures or integrating custom components, which can be cumbersome with simpler definition methods. Scalability As projects grow in complexity (e.g., tackling more intricate tasks, using larger datasets, or integrating different types of data), the organized nature of OOP makes managing this increased scale much more feasible than flatter script-based approaches. Industry alignment Across diverse fields applying deep learning, from tech companies and research institutions to finance and healthcare, this object-oriented approach using classes like nn.Module is the common standard for professional development. Simply put, learning to define your models using an object-oriented approach (by subclassing nn.Module) is ideal for building powerful, adaptable, and reusable computer vision systems. Of course, for very simple sequential models or quick proof-of-concept tests, more direct methods like using nn.Sequential can be perfectly effective and faster to write. However, the OOP structure truly shines when it comes to managing complexity, promoting code maintainability, and enabling the flexibility needed for larger or evolving real-world applications, making it the standard professional approach. Understanding this method prepares you to take on challenging and worthwhile projects, from analyzing medical images like we are here, to developing advanced systems in countless other fields. Defining Your CNN in PyTorch Now let's implement our pneumonia detection CNN using PyTorch's object-oriented style. We'll build a model that can effectively analyze chest X-rays and distinguish between normal and pneumonia cases. First, let's make sure we have all the required dependencies to build the model import torch import torch.nn as nn import torch.nn.functional as F Next, we'll define our CNN by subclassing nn.Module, PyTorch's base class for all neural networks class PneumoniaCNN(nn.Module) def __init__(self) super().__init__() # First convolutional block self.conv_block1 = nn.Sequential( nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=32), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Reduce spatial dimensions by half; see explanation below ) # Second convolutional block self.conv_block2 = nn.Sequential( nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=64), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Further reduce spatial dimensions; see explanation below ) # Third convolutional block self.conv_block3 = nn.Sequential( nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=128), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Further reduce spatial dimensions; see explanation below ) # Flatten layer to convert 3D feature maps to 1D vector self.flatten = nn.Flatten() # Fully connected layers for classification self.fc1 = nn.Linear(in_features=128 * 32 * 32, out_features=512) # Adjust size based on input dimensions self.dropout1 = nn.Dropout(0.5) # Add 50% dropout for regularization self.fc2 = nn.Linear(in_features=512, out_features=128) self.dropout2 = nn.Dropout(0.5) self.fc3 = nn.Linear(in_features=128, out_features=2) # 2 output classes Normal and Pneumonia def forward(self, x) # Pass input through convolutional blocks x = self.conv_block1(x) x = self.conv_block2(x) x = self.conv_block3(x) # Flatten the features x = self.flatten(x) # Pass through fully connected layers x = F.relu(self.fc1(x)) x = self.dropout1(x) x = F.relu(self.fc2(x)) x = self.dropout2(x) logits = self.fc3(x) # Raw, unnormalized predictions return logits Let's break down what's happening in this model We start by creating three convolutional blocks using the nn.Sequential class. Each block contains nn.Conv2d() A convolutional layer that extracts features from images nn.BatchNorm2d() Batch normalization to stabilize training nn.ReLU() ReLU activation to introduce non-linearity nn.MaxPool2d() Max pooling to reduce spatial dimensions and focus on the most important features within local regions Notice we pass in_channels=1 to the first convolutional layer (conv_block1). This explicitly tells the layer to expect input data with a single channel, which is correct for our grayscale X-ray images where each pixel has only one intensity value. (Color images would typically use in_channels=3 for RGB). We gradually increase the number of filters (output channels) in subsequent blocks (32 ? 64 ? 128). This is a common CNN design pattern. Early layers with fewer filters tend to capture simpler, more general features (like edges or basic textures), while deeper layers with more filters can combine these simple features to learn more complex and abstract patterns specific to the task (like the visual characteristics of pneumonia). After the convolutional blocks, we flattenthe final 3D feature map (height × width × channels) into a 1D vector. This vector becomes the input to the first fully connected layer (self.fc1). To determine the required in_features for self.fc1, we need to know the shape of the feature map after the last pooling layer. We'll be resizing our input images to 256×256 pixels during data preparation (covered in the next tutorial). Given this 256×256 input size, let's trace how the dimensions change through the three max pooling layers, as each one halves the height and width Start 256×256 After 1st Pool layer 128×128 After 2nd Pool layer 64×64 After 3rd Pool layer 32×32 So, the feature map entering the flatten layer has spatial dimensions 32×32. Since the last convolutional block (conv_block3) outputs 128 channels (or feature maps), the total number of features in the flattened vector is 128×32×32=131,072. This is the value we need for in_features in self.fc1. The fully connected layers (nn.Linear), sometimes called dense layers, perform the final classification based on the extracted features. We intersperse nn.Dropout(0.5) layers between the fully connected layers. Dropout is a regularization technique that helps prevent overfitting, which is especially important when working with limited datasets. It randomly sets a fraction (here, 50%) of neuron outputs to zero during training, forcing the network to learn more robust representations. The final layer (self.fc3) outputs two values, corresponding to the scores for our two classes Normal and Pneumonia. Note that these outputs are raw scores, often called logits. We don't apply a final activation function like Softmax here because the standard PyTorch loss function for multi-class classification, nn.CrossEntropyLoss, conveniently expects raw logits as input (it applies the necessary transformations internally during training). The __init__ method defines all the network's layers and assigns them to instance attributes (like self.conv_block1, self.fc1, etc.). The forward method then defines the order in which input data x flows through these predefined layers to produce the final output. You might also notice we used the module nn.ReLU() inside the nn.Sequential blocks defined in __init__, but called the functional version F.relu() directly in the forward method after the first two fully connected layers. Both apply the exact same ReLU activation. nn.ReLU() is required within nn.Sequential because nn.Sequential expects nn.Module instances. Using F.relu() directly in forward is common and often slightly more concise for stateless operations like activation functions, as you don't need to define it in __init__ first. Both approaches are valid within the forward method itself. The .forward() method in our model defines how data flows through our network?it's the execution path that input takes as it's transformed into output predictions. When we later use our model with syntax like outputs = model(images), PyTorch automatically calls this .forward() method behind the scenes. This clean separation between model structure (defined in __init__()) and computation flow (defined in forward()) is one of the key benefits of PyTorch's object-oriented approach. Verifying Tensor Shapes When building CNNs, one of the most common sources of errors is mismatched tensor shapes between layers. For example, if the flattened output of your convolutional blocks doesn't produce the exact number of features expected by your first fully connected layer, PyTorch will raise a RuntimeError when you try to pass data through. Carefully tracking shapes is vital. A simple yet effective debugging technique is to perform a "dry run"?passing a correctly shaped dummy input through the model and printing the tensor shape after each major step. This can help you catch dimension mismatches early and save hours of troubleshooting. First, let's create an instance of our model and a dummy input tensor representing one grayscale image of the expected size (256×256 pixels) # Create model instance model = PneumoniaCNN() # Create a random dummy grayscale image (batch_size, channels, height, width) dummy_input = torch.randn(1, 1, 256, 256) Now, we can define a helper function that mimics the model's forward pass but includes print statements to show the shape transformations # Forward pass function with shape printing def forward_with_shape_printing(model, x) print(f"Input shape \t\t{x.shape}") # Using tabs for alignment # Pass through convolutional blocks x = model.conv_block1(x) print(f"After conv_block1 \t{x.shape}") x = model.conv_block2(x) print(f"After conv_block2 \t{x.shape}") x = model.conv_block3(x) print(f"After conv_block3 \t{x.shape}") # Flatten the features x = model.flatten(x) print(f"After flatten \t\t{x.shape}") # Pass through fully connected layers (only showing final output shape) x = F.relu(model.fc1(x)) x = model.dropout1(x) x = F.relu(model.fc2(x)) x = model.dropout2(x) logits = model.fc3(x) print(f"Output shape (logits) \t{x.shape}") # Corrected variable name return logits # Run the forward pass (output is ignored with _) print("Running shape verification pass") _ = forward_with_shape_printing(model, dummy_input) Running this code should produce output similar to this Running shape verification pass Input shape torch.Size([1, 1, 256, 256]) After conv_block1 torch.Size([1, 32, 128, 128]) After conv_block2 torch.Size([1, 64, 64, 64]) After conv_block3 torch.Size([1, 128, 32, 32]) After flatten torch.Size([1, 131072]) Output shape (logits) torch.Size([1, 2]) Interpreting the Shape Transformations These printouts confirm several key aspects of our architecture Spatial Dimensions Decrease, Channel Depth Increases Notice how the height and width are halved after each convolutional block (due to the MaxPool2d layer) 256?128?64?32. Simultaneously, the number of channels (features) increases 1?32?64?128. This is the common CNN pattern we discussed earlier, visualized here the network trades spatial resolution for richer feature representation depth, allowing it to capture increasingly complex patterns as data flows deeper. Flattening Connects Blocks The output from the last convolutional block (1×128×32×32) is correctly flattened into a 1D vector of size 1×131072, matching the in_features expected by self.fc1. This confirms our calculation from the previous section and shows the bridge between the convolutional feature extractor and the fully connected classifier head. Interpreting the Final Output Shape ([1, 2]) Finally, let's take a closer look at the output shape torch.Size([1, 2]). The first dimension (1) corresponds to the batch size. We passed in a single dummy image, so the batch size is 1. The second dimension (2) corresponds to the number of classes our model predicts. As established, these are the raw, unnormalized scores (logits) for 'Normal' (index 0) and 'Pneumonia' (index 1). These logits are the direct output suitable for the nn.CrossEntropyLoss function during training. However, to turn them into human-interpretable predictions, two more steps are typically needed (which we'll implement fully in the next tutorial) Convert to Probabilities Apply the softmax function along the class dimension (dim=1) to convert the raw logits into probabilities that sum to 1.0 for each image in the batch. Python # Example Convert logits to probabilities probabilities = F.softmax(logits, dim=1) # probabilities might look like tensor([[0.312, 0.688]]) Get Predicted Class Find the index (0 or 1) corresponding to the highest probability. This index represents the model's final prediction. Python # Example Get the predicted class index _, predicted_class = torch.max(probabilities, dim=1) # predicted_class might look like tensor([1]) (meaning Pneumonia) This shape verification process confirms our model's internal dimensions align correctly and helps clarify how the final output relates to the classification task. Practical Tips for CNN Development Let's explore some important practices to keep in mind when developing CNNs in PyTorch. GPU Usage and Device Management Training CNNs involves a huge number of calculations. While CPUs can perform these operations, Graphics Processing Units (GPUs) are specialized for massive parallel computation, which can make training deep learning models drastically faster, often by an order of magnitude or more! This speed-up is especially noticeable with complex models or large datasets found in many computer vision applications, from analyzing high-resolution photographs to processing video streams or large medical scans. If you have access to a GPU (like NVIDIA GPUs compatible with CUDA), you'll want to leverage its processing power. The key steps are to determine the appropriate device (cuda for NVIDIA GPU or cpu) and then explicitly move both your model and your data tensors to that device before performing operations # 1. Determine the target device (usually done early in your script) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device {device}") # 2. Move your model to the device (usually done once after creating the model) model = model.to(device) # 3. Move your data tensors to the device (done for EACH batch) images = images.to(device) labels = labels.to(device) In a typical workflow, you'd set the device variable early on. You'd move the model to the device right after creating it (step 2). Importantly, inside your training or evaluation loops, you must also move each batch of images and labels to the same device (step 3) before feeding them into the model. Consistently placing both your model and your input data on the same device is required. Performing operations between tensors residing on different devices (e.g., CPU tensor vs. GPU tensor) is a common source of RuntimeError messages in PyTorch, so diligent device management can save you many headaches. Switching Between Training and Evaluation Modes While we'll cover training our model in the next tutorial, it's good to be reminded that PyTorch models have two operational modes model.train() # Set the model to training mode model.eval() # Set the model to evaluation mode The difference is significant because In training mode, dropout layers randomly disable neurons In evaluation mode, dropout is disabled so all neurons are active Batch normalization behaves differently in each mode This will be especially important when we implement the training loop in the next tutorial, but it's good to be aware of these modes now. Review and Next Steps You've now completed the first step toward building a pneumonia detection system designing an effective CNN architecture in PyTorch. Let's recap what you've learned in this tutorial You understand why CNNs are well-suited for image analysis tasks, such as detecting patterns in X-rays, leveraging their ability to learn spatial hierarchies. You've learned about key CNN components like convolutional layers, pooling layers, batch normalization, and dropout layers. You've implemented a complete CNN model using PyTorch's object-oriented approach You've explored techniques for debugging potential shape issues in your model This is an important foundation, but a model architecture alone can't detect pneumonia. Next, we'll build on this foundation to create a complete working system by Loading and preprocessing real chest X-ray images Implementing training and validation loops Evaluating the model's diagnostic performance Interpreting results and improving the model In the next tutorial, we'll transform this architectural framework into a working pneumonia detection system by adding data processing, training, and evaluation. See you there! Key Takeaways CNNs reduce parameters through local connectivity and weight sharing, making them ideal for image analysis Core CNN components work together to extract increasingly complex features from images PyTorch's object-oriented approach provides a flexible, maintainable framework for implementing CNNs Debugging techniques like shape verification are essential for successful model development Medical applications like pneumonia detection showcase the real-world impact of computer vision


Unhandled exception. System.DllNotFoundException: ...
Category: Network

Problem Unhandled exception. System.DllNotFoundException Unable to load shared ...


Views: 1207 Likes: 99
Why you should choose HomeAssistant as your Home A ...
Category: Research

Home automation is becoming increasingly popular as people look for ways to make their homes mor ...


Views: 0 Likes: 37
Is AI going to take Software Development Jobs?
Category: Research

Artificial Intelligence (AI) is becoming increasingly prevalent in the software development indu ...


Views: 0 Likes: 32
How to copy data from one table to a new table wit ...
Category: Research

Copying data from one table to another is a common task in database management. It can be done m ...


Views: 0 Likes: 30
Learning AI Roadmap
Category: Other

1. Start by learning what types of Machine Learning are there.2. Learn about Embedding<br / ...


Views: 0 Likes: 20
How to read an Email Header
Category: Research

An email header is a crucial part of an email message that contains important information about ...


Views: 0 Likes: 25
List of Math Functions and why they are used
Category: Research

Math functions are an essential part of programming, allowing developers to perform various calc ...


Views: 0 Likes: 24
Top 10 Things to Look Forward to in 2024
Category: Other

Anticipating the Future 10 Things to Look Forward to in 2024 As we bid ...


Views: 0 Likes: 5
Foundation models for reasoning on charts
Foundation models for reasoning on charts

Posted by Julian Eisenschlos, Research Software Engineer, Google Research Visual language is the form of communication that relies on pictorial symbols outside of text to convey information. It is ubiquitous in our digital life in the form of iconography, infographics, tables, plots, and charts, extending to the real world in street signs, comic books, food labels, etc. For that reason, having computers better understand this type of media can help with scientific communication and discovery, accessibility, and data transparency. While computer vision models have made tremendous progress using learning-based solutions since the advent of ImageNet, the focus has been on natural images, where all sorts of tasks, such as classification, visual question answering (VQA), captioning, detection and segmentation, have been defined, studied and in some cases advanced to reach human performance. However, visual language has not garnered a similar level of attention, possibly because of the lack of large-scale training sets in this space. But over the last few years, new academic datasets have been created with the goal of evaluating question answering systems on visual language images, like PlotQA, InfographicsVQA, and ChartQA. Example from ChartQA. Answering the question requires reading the information and computing the sum and the difference. Existing models built for these tasks relied on integrating optical character recognition (OCR) information and their coordinates into larger pipelines but the process is error prone, slow, and generalizes poorly. The prevalence of these methods was because existing end-to-end computer vision models based on convolutional neural networks (CNNs) or transformers pre-trained on natural images could not be easily adapted to visual language. But existing models are ill-prepared for the challenges in answering questions on charts, including reading the relative height of bars or the angle of slices in pie charts, understanding axis scales, correctly mapping pictograms with their legend values with colors, sizes and textures, and finally performing numerical operations with the extracted numbers. In light of these challenges, we propose “MatCha Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering”. MatCha, which stands for math and charts, is a pixels-to-text foundation model (a pre-trained model with built-in inductive biases that can be fine-tuned for multiple applications) trained on two complementary tasks (a) chart de-rendering and (b) math reasoning. In chart de-rendering, given a plot or chart, the image-to-text model is required to generate its underlying data table or the code used to render it. For math reasoning pre-training, we pick textual numerical reasoning datasets and render the input into images, which the image-to-text model needs to decode for answers. We also propose “DePlot One-shot visual language reasoning by plot-to-table translation”, a model built on top of MatCha for one-shot reasoning on charts via translation to tables. With these methods we surpass the previous state of the art in ChartQA by more than 20% and match the best summarization systems that have 1000 times more parameters. Both papers will be presented at ACL2023. Chart de-rendering Plots and charts are usually generated by an underlying data table and a piece of code. The code defines the overall layout of the figure (e.g., type, direction, color/shape scheme) and the underlying data table establishes the actual numbers and their groupings. Both the data and code are sent to a compiler/rendering engine to create the final image. To understand a chart, one needs to discover the visual patterns in the image and effectively parse and group them to extract the key information. Reversing the plot rendering process demands all such capabilities and can thus serve as an ideal pre-training task. A chart created from a table in the Airbus A380 Wikipedia page using random plotting options. The pre-training task for MatCha consists of recovering the source table or the source code from the image. In practice, it is challenging to simultaneously obtain charts, their underlying data tables, and their rendering code. To collect sufficient pre-training data, we independently accumulate [chart, code] and [chart, table] pairs. For [chart, code], we crawl all GitHub IPython notebooks with appropriate licenses and extract blocks with figures. A figure and the code block right before it are saved as a [chart, code] pair. For [chart, table] pairs, we explored two sources. For the first source, synthetic data, we manually write code to convert web-crawled Wikipedia tables from the TaPas codebase to charts. We sampled from and combined several plotting options depending on the column types. In addition, we also add [chart, table] pairs generated in PlotQA to diversify the pre-training corpus. The second source is web-crawled [chart, table] pairs. We directly use the [chart, table] pairs crawled in the ChartQA training set, containing around 20k pairs in total from four websites Statista, Pew, Our World in Data, and OECD. Math reasoning We incorporate numerical reasoning knowledge into MatCha by learning math reasoning skills from textual math datasets. We use two existing textual math reasoning datasets, MATH and DROP for pre-training. MATH is synthetically created, containing two million training examples per module (type) of questions. DROP is a reading-comprehension–style QA dataset where the input is a paragraph context and a question. To solve questions in DROP, the model needs to read the paragraph, extract relevant numbers and perform numerical computation. We found both datasets to be complementary. MATH contains a large number of questions across different categories, which helps us identify math operations needed to explicitly inject into the model. DROP’s reading-comprehension format resembles the typical QA format wherein models simultaneously perform information extraction and reasoning. In practice, we render inputs of both datasets into images. The model is trained to decode the answer. To improve the math reasoning skills of MatCha we incorporate examples from MATH and DROP into the pre-training objective, by rendering the input text as images. End-to-end results We use a Pix2Struct model backbone, which is an image-to-text transformer tailored for website understanding, and pre-train it with the two tasks described above. We demonstrate the strengths of MatCha by fine-tuning it on several visual language tasks — tasks involving charts and plots for question answering and summarization where no access to the underlying table is possible. MatCha surpasses previous models’ performance by a large margin and also outperforms the previous state of the art, which assumes access to underlying tables. In the figure below, we first evaluate two baseline models that incorporate information from an OCR pipeline, which until recently was the standard approach for working with charts. The first is based on T5, the second on VisionTaPas. We also compare against PaLI-17B, which is a large (~1000 times larger than the other models) image plus text-to-text transformer trained on a diverse set of tasks but with limited capabilities for reading text and other forms of visual language. Finally, we report the Pix2Struct and MatCha model results. Experimental results on two chart QA benchmarks ChartQA & PlotQA (using relaxed accuracy) and a chart summarization benchmark chart-to-text (using BLEU4). Matcha surpasses the state of the art by a large margin on QA, compared to larger models, and matches these larger models on summarization. For QA datasets, we use the official relaxed accuracy metric that allows for small relative errors in numerical outputs. For chart-to-text summarization, we report BLEU scores. MatCha achieves noticeably improved results compared to baselines for question answering, and comparable results to PaLI in summarization, where large size and extensive long text/captioning generation pre-training are advantageous for this kind of long-form text generation. Derendering plus large language model chains While extremely performant for their number of parameters, particularly on extractive tasks, we observed that fine-tuned MatCha models could still struggle with end-to-end complex reasoning (e.g., mathematical operations involving large numbers or multiple steps). Thus, we also propose a two-step method to tackle this 1) a model reads a chart, then outputs the underlying table, 2) a large language model (LLM) reads this output and then tries to answer the question solely based on the textual input. For the first model, we fine-tuned MatCha solely on the chart-to-table task, increasing the output sequence length to guarantee it could recover all or most of the information in the chart. DePlot is the resulting model. In the second stage, any LLM (such as FlanPaLM or Codex) can be used for the task, and we can rely on the standard methods to increase performance on LLMs, for example chain-of-thought and self-consistency. We also experimented with program-of-thoughts where the model produces executable Python code to offload complex computations. An illustration of the DePlot+LLM method. This is a real example using FlanPaLM and Codex. The blue boxes are input to the LLM and the red boxes contain the answer generated by the LLMs. We highlight some of the key reasoning steps in each answer. As shown in the example above, the DePlot model in combination with LLMs outperforms fine-tuned models by a significant margin, especially so in the human-sourced portion of ChartQA, where the questions are more natural but demand more difficult reasoning. Furthermore, DePlot+LLM can do so without access to any training data. We have released the new models and code at our GitHub repo, where you can try it out yourself in colab. Checkout the papers for MatCha and DePlot for more details on the experimental results. We hope that our results can benefit the research community and make the information in charts and plots more accessible to everyone. Acknowledgements This work was carried out by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen and Yasemin Altun from our Language Team as part of Fangyu's internship project. Nigel Collier from Cambridge also was a collaborator. We would like to thank Joshua Howland, Alex Polozov, Shrestha Basu Mallick, Massimo Nicosia and William Cohen for their valuable comments and suggestions.


Senior Software Engineer - Product
Category: Jobs

Senior Software Engineer &ndash; Product &nbsp; Do you thrive on ...


Views: 0 Likes: 34
Understanding Light Gradient Boosting Machine
Understanding Light Gradient Boosting Machine

In this article, I will walk you through Light GBM. This is also another variation of gradient boosting and light stands for a lighter version. Which is believed to make the model faster, more efficient, and a bit more accurate. I hope this article will give you clarity about using LGBM in Machine learning. In life we want everything to be faster and accurate. Similarly, all the software developers want their model to run faster and show results within no time. To make it possible LGBM came into existence. In this article, we are going to learn what is LGBM framework and what are its advantages in Machine learning and Deep learning. And how it has changed technology positively. In the previous article, we have discussed XGBoost which stands for Extreme gradient boosting. This link will help you to learn the fundamentals of XGBoost. Understanding XGBoost a basic overview.  Light GBM Light gradient boosting machine in short LGBM is a framework and a variant of gradient boosting. Like another gradient boosting Light GBM is also based on Decision tree algorithms. With the help of Light GBM, we can reduce memory usage and can increase efficiency. The main difference between Light GBM and other gradient boosting frameworks is that Light GBM expands in a vertical direction means it grows leaf-wise. While the other algorithms expand horizontally in a level-wise direction. Light GBM selects the leaf which produces the least error and maximum efficiency. This method is way more helpful in reducing the error percentage. In short, it grows leaf-wise while others expand level-wise. Nowadays, Light GBM has become more popular because with the help of regular algorithms the accuracy is not up to the mark, and for them, it has become quite difficult to produce results fast. Since the data is increasing on daily basis, we need a new model which will be faster and more efficient than came Light GBM into existence. We call it Light because of its high-speed training. Light GBM deals with a large amount of data and consumes only less amount of memory. Developers use Light GBM mostly in hackathons because it provides good efficiency and much faster results and it supports GPU training. It comes in handy to Data scientists. You guys try once.  When to use Light Gradient Boosting Machine? We cannot apply Light GBM on small datasets because of its overfitting problem. Light GBM is vulnerable to overfitting and can overfit a small amount of data easily. This Light gradient boosting machine shows good results if the data consists of several rows is more than 10,000. We use this LGBM when one is training with a large amount of data and requires high accuracy. Implementation Light GBM is very easy to understand and implement. The most complicated and the most important thing while implementing the Light GBM is parameter tuning. It involves nearly a hundred parameters while implementing. But don’t worry, you don’t need to remember all those parameters. I am here to help you by explaining a few important parameters. With the help of these parameters, Light GBM is one of the most powerful frameworks. Let’s see a few parameters. Parameters It is necessary to know about the parameters we are using in our algorithm. There are different types of parameters in Light GBM. Let’s see them. Control Parameters Max-depth It indicates the maximum depth of a tree. We use a max-depth parameter to handle the overfitting of a model. If the model is about to overfit or is already overfitted then immediately reduce the value of max-depth.Min-data-in-leaf It indicates the minimum number of values a leaf can store. The default optimal value to this is 20. Again, we use this to deal with the overfitting of a model.Feature-fraction We use feature fraction when our boosting is random forest. If the feature fraction value is 0.7 then Light GBM will select 70 percent of parameters randomly.Bagging-fraction It indicates the fraction of data we use in each iteration and we use it to reduce the overfitting and to improve the speed of the model.Early-stopping-round This parameter helps us in increasing the speed of the algorithm. The model won’t train any further if there is no improvement in the metric of validation of data in early stopping rounds. Using this we can reduce extra iterations.Lambda Indicates regularization and is between 0 and 1.Min-gain-to-split This parameter will indicate the minimum gain we need to make a split. Using this we can control the number of splits in a tree.Max-cat-group When we have a greater number of categories then splitting might lead to overfitting. To avoid this, Light GBM merges a few of them into max-cat-groups. By merging we can find the splitting point easily. Core Parameters Task It indicates the task that we going to perform on the data. The task can be prediction or training.Application This parameter is the most important one that specifies which model we should apply regression or classification. The default application to this parameter is regression. If the parameter application is Binary We use this for binary classificationMulticlass We use this for multiclass classificationRegression Then we perform regressionBoosting Boosting specifies the algorithm we should useIf boosting is rf then we use a random forest algorithm Goss Gradient-based one side samplingGbdt Traditional gradient boosting decision treeNum-boost-round It informs about the number of iterationsLearning rate Light GBM starts with an initial expectation that gets updated based on the output of the tree. This learning rate controls the changes in estimations.Num-Leaves It represents the total number of leaves present in a tree. The default value for this is 31. Metric Parameter  Metric This parameter represents the loss of the model while building. These are a few losses for classification and regression. Mse Mean square errorMae Mean absolute errorBinary-logloss Binary classification lossMulti-logloss Multi classification loss Finally, we do parameter tuning which is performed by data scientists. Conclusion Light GBM is the most useful and a really faster algorithm in Data science. In this article, I gave an overview of LGBM and basic idea of this algortihm. Thanks for reading! The post Understanding Light Gradient Boosting Machine appeared first on datamahadev.com.


How to install pyenv on Linux 22.04
Category: SERVERS

<div class="js-voting-container d-flex jc-center fd-colum ...


Views: 0 Likes: 47
What is domain driven design in Asp.Net 8?
Category: Research

Domain-Driven Design (DDD) is an approach to software development that emphasizes the importance ...


Views: 0 Likes: 29
Android Studio Error: Cause:com.android.build.grad ...
Category: Android

Android Studio is a popular integrated development environment (IDE) for building Android applicati ...


Views: 277 Likes: 82
How to use Whisper AI using ONNX in C#
Category: Research

IntroductionWhisper AI is an open-source speech recognition model developed by Google th ...


Views: 0 Likes: 22
The multi-part identifier "inserted.Id" could not ...
Category: SQL

This error message typically occurs in SQL Server when you're trying to use a column value from a ...


Views: 0 Likes: 40
Amazon Launches Computer Vision Services
Amazon Launches Computer Vision Services

Amazon launches computer vision services to detect defects inmanufactured products using the base art of Artificial Intelligence. Ituses the feature of Amazon Web Services (AWS), i.e., cloud serviceto analyze the images. For this, it uses computer vision and finds thedefects in the products. You might also like Understanding Machine Learning Ops – MLOps, Intro to AutoML – Automated Machine Learning, and Introduction to NLP For more such topics – Click Here The baseline for this achievement by Amazon is led by Artificial Intelligence. The AWS console provides various features to use the cloud services. The model is trained to detect various defects like dents, cracks, disordered placement of goods, and much more. This all is carried through computer vision. The model requires thousands of images for training to detect a specific issue. One of the major challenges in this was the camera can capture images at various angles, with specific dimensions, lightening, environment, etc. But Amazon claims that the Vision model that they have developed is powerful enough to draw the specific patterns from the input. For this, they use a real-time API. They have marked a baseline or can be referred to as the benchmark by which they compare the images. If the Vision marks that the image matches that level then it can suggest the defects whether they are present or not. It uses two dominant technologies, one is Computer Vision and the other is Artificial Intelligence. The customers can use the services through the Amazon Web Services (AWS) platform and they just need to pay for the hour of usage. The users can give valuable feedback for the improvement of this service. This technique has been an asset as the detection of various faults or anomalies in the manufactured products can be detected quite efficiently. It can process hundreds of images within an hour! Isn’t that awesome? Amazon claims that the service is effective, efficient, and has a minimal cost that depends on the hours of usage. The output, i.e., the Vision report generated after analysis of the images is reliable. It has added to the digitalization of the nation, where goods can be detected using web services with a pinch of artificial intelligence. This project is aimed to build the gap between the two technologies – Computer Vision and Artificial Intelligence. The main goal of this project is to satisfy the customers with high-quality online shopping where they can check the quality of their products. This ensures the customers that they will get, what they have ordered. Many times, customers skip the thought of online shopping because of the trust issue between the customer and the service provider, they fear that they would get the defective product as they don’t have any source of cross-checking the product that they have ordered. This has somehow, hindered the growth of online shopping. To increase the number of online customers and to boost the pace of online shopping, Amazon has introduced Computer Vision technology to aid the customers. There is no doubt in the fact that, once customer satisfaction is improved, the number of online buyers will increase which will lead to the progress of the IT sector. Each step brings us close to the digitalization of the industry. The use of Computer Vision to detect defects in manufactured products using artificial intelligence is one of the biggest steps in this path. Suppose you want to buy a Washing Machine. But what you care about is that the Machine shouldn’t have any defects, the color should be what you have selected, the size should be as per your selection, it shouldn’t have any cracks or dents, and much more. But you can’t verify it until it reaches your doorway! But with the Amazon service, it is possible to detect all the defects in the product that you are buying. Hence, it enables the users to verify their product before it reaches them. They can replace or cancel their order as per their choice and the policy of the particular service provider. Hence, it has achieved one of the biggest criteria, i.e., customer satisfaction. Now, the customers are no longer disappointed when the product reaches their home! Amazon Lookout for Vision has the high-quality to detect defects using machine learning and artificial intelligence. It uses various techniques and one of the major techniques is the few-shot learning technique of Machine Learning used bu=y this model. There are various techniques to detect defects in the manufactured products. One of the most widely used techniques is human inspection. But this technique requires lots of effort, a large amount of time, cost, and is prone to errors. Hence, this technique is not feasible. Due to this, various industries have been shut down. The main reason for their failure is their inability to satisfy the customers as their products were prone to errors and the error detection policy wasn’t to the mark. Then comes the traditional procedure, ie., using various machine learning algorithms to build a system that can detect errors. Using machine learning models and artificial intelligence, only, can also provide a solution to this problem but the process of making such a system is quite cumbersome. It requires high-efforts from a team of data scientists, who would gather, sort, pre-process, and do various operations on the data. Data mining and preprocessing are quite hefty tasks. Then the task of training the model with proper and sufficient data sets is another task. All these tasks require a large amount of time as well as labor. Hence, this approach isn’t suitable every time. And for such a big system, of error detection, with various features, it would be quite difficult to implement. Hence, with the combination of Computer Vision and Artificial Intelligence, Amazon has achieved great success. It has made the task of error detection in manufactured products quite easy. In this way, it has helped various sectors like Healthcare by providing them the facilities to detect defects in the manufactured products. It has enhanced the quality and delivery of products by fast, reliable, and accurate defect detection. The world is looking forward to more such digital innovations. To connect the real world with the machine, the addition of artificial intelligence in computer vision is quite necessary. Though, computer vision can be called a part of artificial intelligence. It has revolutionized the technological industry. Here we will see the basics of Computer Vision and Artificial Intelligence that have made the Amazon web service such an impactful resource. Computer Vision If you want the machine to understand the real world by providing real-time input in the form of images or videos then you are talking about Computer Vision. In the era, when scientists are dealing with ways to train computers, computer vision plays a vital role. It teaches the machines to interpret the world using deep learning, machine learning, artificial intelligence, and much more. It has various applications in lots of fields like agriculture, healthcare, defense, education, and much more. To deal with textual data, the algorithms have the basic training datasets but to deal with image input the algorithms need to be trained with high quality and high amount of dataset. The program must get what the image contains. When we click a picture from our camera, the device understands the frames and the pixel orientation. But to deal with the comparison of images, various dimensions, color coding, positions, angles, etc. need to be considered. This helps in guiding the machine or the system on how to look towards an image, i.e., what are its values and qualities. Hence, in the service provided by the Amazon, all the image processing, comparison, and other processes are handled by Computer Vision. This makes the process effective and efficient. Hence, we had a look at the importance and aim of Computer Vision in the emerging technological field. Now, let’s have a look at Artificial Intelligence. Artificial Intelligence The word artificial intelligence itself states its meaning, it’s building intelligence in machines! Isn’t it great how artificial intelligence constantly trains the machines to think and act like humans? The main goal of artificial intelligence is to make machines independent. Once trained, they should be able to do the specific changes as per the time requirement. The algorithm should be such that the machines don’t require human intervention at the time of any change in the input. The machine should be able to handle it on its own. Though, this concept wasn’t accepted first but now it has changed the whole scenario. There are various implementations in this field ranging from Sophia to Alexa, Siri to Tesla, and much more and the list is increasing with each passing day! The main aim of artificial intelligence is to increase the efficiency of machines and boost their computational power. This helps the machines to analyze their surrounding data with the help of various other technologies like machine learning, deep learning, big data, neural networks, and much more. Though the task might seem quite hefty, which it is, it has helped in the improvement of various sectors like education, healthcare, automobiles, and much more, This has brought automation and digitalization in the industry. In this way, Amazon Computer Vision Services helps to detect defects in manufactured products. For more such topics – Click Here The post Amazon Launches Computer Vision Services appeared first on datamahadev.com.


The code execution cannot proceed because msodbcsq ...
Category: Other

Question The code execution cannot proceed because msodbcsql17.dll was not found. Reinstalling t ...


Views: 0 Likes: 14
InvalidOperationException: The current thread is n ...
Category: Questions

InvalidOperationException The current thread is not associated with the Dispatcher. Use InvokeAs ...


Views: 29 Likes: 41
How to Prompt ChatGPT and Google Bard for Best Res ...
Category: Machine Learning

Here are some tips for prompting Google Bard Be specific. The more specific you a ...


Views: 0 Likes: 34
How to Create Symbolic links between two folders o ...
Category: Windows

Use CMD not Powershell &nbsp; mklink /D "E\FolderNameOnAnotherDisk\subfolder\sub ...


Views: 178 Likes: 85

Login to Continue, We will bring you back to this content 0



For peering opportunity Autonomouse System Number: AS401345 Custom Software Development at ErnesTech Email Address[email protected]