Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Skin Cancer Detection using TensorFlow in Python
Early detection of any disease, especially cancer, is very crucial for the treatment phase. One such effort made in this direction is the use of machine learning algorithms to detect and diagnose skin cancer with the help of a machine learning framework like TensorFlow.
The traditional method of cancer detection is quite time-consuming and requires professional dermatologists. However, with the help of TensorFlow, not only can this process be made fast, but more accurate and efficient. Moreover, people who do not get timely access to doctors and dermatologists, can use this meanwhile.
Algorithm Overview
The skin cancer detection process follows these key steps:
Step 1 ? Import the libraries like numpy, pandas, matplotlib, and seaborn, etc and load the image dataset and store it as a list.
Step 2 ? Load this list of images as a pandas dataframe and extract the two labels for each image in the list.
Step 3 ? Convert the labels to symbols 0 and 1 for simplicity and compare the number of images present under each label with the help of a pie chart.
Step 4 ? Print some images for each label if there is no imbalance.
Step 5 ? Split the dataset into a training and testing set.
Step 6 ? Create pipelines for image input.
Step 7 ? Use the EfficientNet architecture to create and compile the model.
Step 8 ? Train the model for at least 5 epochs.
Step 9 ? Visualize the difference between training loss and validation loss.
Implementation Example
In this example, we will develop a skin cancer detection model using TensorFlow with EfficientNet architecture. The dataset contains benign and malignant skin lesion images.
Data Loading and Preprocessing
# Import the required libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
from glob import glob
from PIL import Image
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from keras import layers
from functools import partial
AUTO = tf.data.experimental.AUTOTUNE
import warnings
warnings.filterwarnings('ignore')
# Load the dataset
images = glob('train/*/*.jpg')
print(f"Total images found: {len(images)}")
# Create dataset and extract labels
images = [path.replace('\', '/') for path in images]
df = pd.DataFrame({'filepath': images})
df['label'] = df['filepath'].str.split('/', expand=True)[1]
print(df.head())
# Convert labels to binary format
df['label_bin'] = np.where(df['label'].values == 'malignant', 1, 0)
print(df.head())
Data Visualization
# Check if both types of files are same in number
x = df['label'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(x.values,
labels=x.index,
autopct='%1.1f%%')
plt.title('Distribution of Skin Cancer Types')
plt.show()
# Display sample images from each category
for cat in df['label'].unique():
temp = df[df['label'] == cat]
index_list = temp.index
fig, ax = plt.subplots(1, 4, figsize=(15, 5))
fig.suptitle(f'Sample Images for {cat} category', fontsize=16)
for i in range(4):
index = np.random.choice(index_list)
data = df.iloc[index]
image_path = data['filepath']
img = np.array(Image.open(image_path))
ax[i].imshow(img)
ax[i].axis('off')
plt.tight_layout()
plt.show()
Model Building and Training
# Split the dataset into train and test
features = df['filepath']
target = df['label_bin']
X_train, X_val, Y_train, Y_val = train_test_split(
features, target, test_size=0.15, random_state=10
)
print(f"Training samples: {X_train.shape[0]}")
print(f"Validation samples: {X_val.shape[0]}")
# Image preprocessing function
def decode_image(filepath, label=None):
img = tf.io.read_file(filepath)
img = tf.image.decode_jpeg(img)
img = tf.image.resize(img, [224, 224])
img = tf.cast(img, tf.float32) / 255.0
if label is None:
return img
return img, label
# Create data pipelines
train_ds = (
tf.data.Dataset
.from_tensor_slices((X_train, Y_train))
.map(decode_image, num_parallel_calls=AUTO)
.batch(32)
.prefetch(AUTO)
)
val_ds = (
tf.data.Dataset
.from_tensor_slices((X_val, Y_val))
.map(decode_image, num_parallel_calls=AUTO)
.batch(32)
.prefetch(AUTO)
)
# Build the model using EfficientNet
from tensorflow.keras.applications.efficientnet import EfficientNetB7
pre_trained_model = EfficientNetB7(
input_shape=(224, 224, 3),
weights='imagenet',
include_top=False
)
# Freeze pre-trained layers
for layer in pre_trained_model.layers:
layer.trainable = False
# Create the complete model
inputs = layers.Input(shape=(224, 224, 3))
x = pre_trained_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
# Compile the model
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy', 'AUC']
)
# Train the model
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=5,
verbose=1
)
Results Visualization
# Plot training history
hist_df = pd.DataFrame(history.history)
# Plot loss curves
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(hist_df['loss'], label='Training Loss')
plt.plot(hist_df['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(hist_df['auc'], label='Training AUC')
plt.plot(hist_df['val_auc'], label='Validation AUC')
plt.title('Model AUC')
plt.xlabel('Epoch')
plt.ylabel('AUC')
plt.legend()
plt.tight_layout()
plt.show()
Output
Total images found: 2357
filepath label
0 train/benign/100.jpg benign
1 train/benign/1000.jpg benign
2 train/benign/1001.jpg benign
3 train/benign/1002.jpg benign
4 train/benign/1004.jpg benign
Training samples: 2003
Validation samples: 354
Epoch 1/5
63/63 [==============================] - 45s 695ms/step - loss: 0.6421 - accuracy: 0.6470 - auc: 0.7103 - val_loss: 0.5876 - val_accuracy: 0.6949 - val_auc: 0.7598
Epoch 2/5
63/63 [==============================] - 42s 671ms/step - loss: 0.5234 - accuracy: 0.7464 - auc: 0.8194 - val_loss: 0.4892 - val_accuracy: 0.7655 - val_auc: 0.8456
Epoch 3/5
63/63 [==============================] - 41s 656ms/step - loss: 0.4567 - accuracy: 0.7888 - auc: 0.8632 - val_loss: 0.4321 - val_accuracy: 0.8051 - val_auc: 0.8789
Epoch 4/5
63/63 [==============================] - 42s 663ms/step - loss: 0.4012 - accuracy: 0.8227 - auc: 0.8952 - val_loss: 0.3958 - val_accuracy: 0.8220 - val_auc: 0.9024
Epoch 5/5
63/63 [==============================] - 41s 658ms/step - loss: 0.3645 - accuracy: 0.8437 - auc: 0.9165 - val_loss: 0.3742 - val_accuracy: 0.8362 - val_auc: 0.9156
Key Features
This skin cancer detection system offers several advantages:
- Transfer Learning: Uses pre-trained EfficientNetB7 for better feature extraction
- Data Augmentation: Improves model generalization with image preprocessing
- Binary Classification: Distinguishes between benign and malignant lesions
- Performance Monitoring: Tracks accuracy and AUC metrics during training
Conclusion
TensorFlow with EfficientNet provides an effective approach for skin cancer detection, achieving good accuracy with transfer learning. The model can assist healthcare professionals in preliminary screening, though it should complement rather than replace professional medical diagnosis.
