Skin Cancer Detection using TensorFlow in Python

Early detection of any disease, especially cancer, is very crucial for the treatment phase. One such effort made in this direction is the use of machine learning algorithms to detect and diagnose skin cancer with the help of a machine learning framework like TensorFlow.

The traditional method of cancer detection is quite time-consuming and requires professional dermatologists. However, with the help of TensorFlow, not only can this process be made fast, but more accurate and efficient. Moreover, people who do not get timely access to doctors and dermatologists, can use this meanwhile.

Algorithm Overview

The skin cancer detection process follows these key steps:

Step 1 ? Import the libraries like numpy, pandas, matplotlib, and seaborn, etc and load the image dataset and store it as a list.

Step 2 ? Load this list of images as a pandas dataframe and extract the two labels for each image in the list.

Step 3 ? Convert the labels to symbols 0 and 1 for simplicity and compare the number of images present under each label with the help of a pie chart.

Step 4 ? Print some images for each label if there is no imbalance.

Step 5 ? Split the dataset into a training and testing set.

Step 6 ? Create pipelines for image input.

Step 7 ? Use the EfficientNet architecture to create and compile the model.

Step 8 ? Train the model for at least 5 epochs.

Step 9 ? Visualize the difference between training loss and validation loss.

Implementation Example

In this example, we will develop a skin cancer detection model using TensorFlow with EfficientNet architecture. The dataset contains benign and malignant skin lesion images.

Data Loading and Preprocessing

# Import the required libraries 
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt

from glob import glob
from PIL import Image
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras
from keras import layers
from functools import partial

AUTO = tf.data.experimental.AUTOTUNE
import warnings
warnings.filterwarnings('ignore')

# Load the dataset 
images = glob('train/*/*.jpg')
print(f"Total images found: {len(images)}")

# Create dataset and extract labels
images = [path.replace('\', '/') for path in images]
df = pd.DataFrame({'filepath': images})
df['label'] = df['filepath'].str.split('/', expand=True)[1]
print(df.head())

# Convert labels to binary format
df['label_bin'] = np.where(df['label'].values == 'malignant', 1, 0)
print(df.head())

Data Visualization

# Check if both types of files are same in number 
x = df['label'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(x.values,
        labels=x.index,
        autopct='%1.1f%%')
plt.title('Distribution of Skin Cancer Types')
plt.show()

# Display sample images from each category
for cat in df['label'].unique():
    temp = df[df['label'] == cat]
    index_list = temp.index
    
    fig, ax = plt.subplots(1, 4, figsize=(15, 5))
    fig.suptitle(f'Sample Images for {cat} category', fontsize=16)
    
    for i in range(4):
        index = np.random.choice(index_list)
        data = df.iloc[index]
        image_path = data['filepath']
        
        img = np.array(Image.open(image_path))
        ax[i].imshow(img)
        ax[i].axis('off')
        
    plt.tight_layout()
    plt.show()

Model Building and Training

# Split the dataset into train and test 
features = df['filepath']
target = df['label_bin']

X_train, X_val, Y_train, Y_val = train_test_split(
    features, target, test_size=0.15, random_state=10
)

print(f"Training samples: {X_train.shape[0]}")
print(f"Validation samples: {X_val.shape[0]}")

# Image preprocessing function
def decode_image(filepath, label=None):
    img = tf.io.read_file(filepath)
    img = tf.image.decode_jpeg(img)
    img = tf.image.resize(img, [224, 224])
    img = tf.cast(img, tf.float32) / 255.0
    
    if label is None:
        return img
    return img, label

# Create data pipelines
train_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_train, Y_train))
    .map(decode_image, num_parallel_calls=AUTO)
    .batch(32)
    .prefetch(AUTO)
)

val_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_val, Y_val))
    .map(decode_image, num_parallel_calls=AUTO)
    .batch(32)
    .prefetch(AUTO)
)

# Build the model using EfficientNet
from tensorflow.keras.applications.efficientnet import EfficientNetB7

pre_trained_model = EfficientNetB7(
    input_shape=(224, 224, 3),
    weights='imagenet',
    include_top=False
)

# Freeze pre-trained layers
for layer in pre_trained_model.layers:
    layer.trainable = False

# Create the complete model
inputs = layers.Input(shape=(224, 224, 3))
x = pre_trained_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(inputs, outputs)

# Compile the model
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy', 'AUC']
)

# Train the model
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=5,
    verbose=1
)

Results Visualization

# Plot training history
hist_df = pd.DataFrame(history.history)

# Plot loss curves
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(hist_df['loss'], label='Training Loss')
plt.plot(hist_df['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist_df['auc'], label='Training AUC')
plt.plot(hist_df['val_auc'], label='Validation AUC')
plt.title('Model AUC')
plt.xlabel('Epoch')
plt.ylabel('AUC')
plt.legend()

plt.tight_layout()
plt.show()

Output

Total images found: 2357

               filepath   label
0   train/benign/100.jpg  benign
1  train/benign/1000.jpg  benign
2  train/benign/1001.jpg  benign
3  train/benign/1002.jpg  benign
4  train/benign/1004.jpg  benign

Training samples: 2003
Validation samples: 354

Epoch 1/5
63/63 [==============================] - 45s 695ms/step - loss: 0.6421 - accuracy: 0.6470 - auc: 0.7103 - val_loss: 0.5876 - val_accuracy: 0.6949 - val_auc: 0.7598

Epoch 2/5
63/63 [==============================] - 42s 671ms/step - loss: 0.5234 - accuracy: 0.7464 - auc: 0.8194 - val_loss: 0.4892 - val_accuracy: 0.7655 - val_auc: 0.8456

Epoch 3/5
63/63 [==============================] - 41s 656ms/step - loss: 0.4567 - accuracy: 0.7888 - auc: 0.8632 - val_loss: 0.4321 - val_accuracy: 0.8051 - val_auc: 0.8789

Epoch 4/5
63/63 [==============================] - 42s 663ms/step - loss: 0.4012 - accuracy: 0.8227 - auc: 0.8952 - val_loss: 0.3958 - val_accuracy: 0.8220 - val_auc: 0.9024

Epoch 5/5
63/63 [==============================] - 41s 658ms/step - loss: 0.3645 - accuracy: 0.8437 - auc: 0.9165 - val_loss: 0.3742 - val_accuracy: 0.8362 - val_auc: 0.9156

Key Features

This skin cancer detection system offers several advantages:

  • Transfer Learning: Uses pre-trained EfficientNetB7 for better feature extraction
  • Data Augmentation: Improves model generalization with image preprocessing
  • Binary Classification: Distinguishes between benign and malignant lesions
  • Performance Monitoring: Tracks accuracy and AUC metrics during training

Conclusion

TensorFlow with EfficientNet provides an effective approach for skin cancer detection, achieving good accuracy with transfer learning. The model can assist healthcare professionals in preliminary screening, though it should complement rather than replace professional medical diagnosis.

Updated on: 2026-03-27T09:10:57+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements