Realistic Digital Image Processing-Driven Data Augmentation for Robust Wheat Leaf Disease Classification and Severity Scoring in Field Conditions

1. Project Context and Objective

The freelancer's role is to implement, train, evaluate, and document the experimental results.

1.1 The Task Briefly

Item	Classification	Severity Scoring
Short title	DIPAug-Net	DIPAug-SeverNet
Core task	Classify wheat leaf disease from augmented images under field conditions	Segment lesion regions and predict quantitative severity score (0–100%)
Key novel contribution	Physics-aware DIP augmentation framework + adaptive curriculum + dual-branch CNN-Transformer	DIP-guided saliency module + multi-task lesion segmentation + ordinal severity regression
Primary output	Disease class label (11 classes)	Disease class + lesion mask + severity grade (0–5)

2. Complete Scope of Work

Everything in this section is the freelancer's responsibility. The task is broken into 6 work packages (WPs). Each WP has a defined set of code deliverables and acceptance criteria.

WP1 — Data Preparation and Augmentation Pipeline

Applies to: Both Phase 1 and Phase 2

WP1.1 Dataset Setup

Download the Wheat Plant Disease dataset from Kaggle / Mendeley (public domain). Verify class distribution, resolution, and label quality. Report any anomalies.
Implement a stratified 70/15/15 train-validation-test split, locked by a fixed random seed (seed = 42) so all experiments use identical splits.
Resize all images to 384×384. Apply ImageNet normalisation (mean [0.485, 0.456, 0.406], std [0.229, 0.224, 0.225]).

WP1.2 DIPAug Module — 8 Physics-Aware Augmentations

Implement each augmentation as a callable Python class inheriting from Albumentations DualTransform (image + mask compatible). Each class must accept a probability p and an intensity parameter scaled to [0, 1].

#	Augmentation Class	Implementation Specification	Parameter Range	Field Phenomenon
1	IlluminationGradient	Bidirectional gradient on LAB L-channel. Random sun angle `[0–360°]`, strength `[0.3–0.8]`	angle, strength	Sun angle variation
2	CastShadow	Geometric polygon (3–6 vertices) with Gaussian penumbra blur sigma `[5–20px]`. Area `10–40%` of image.	area, blur_sigma	Canopy shadow
3	MotionBlur	Linear PSF kernel with random angle `[0–180°]`. Kernel size `[5–25px]`. Use `cv2.filter2D`.	kernel_size, angle	Camera / wind shake
4	DefocusBlur	Disk-shaped pillbox PSF via circular averaging. Radius `[3–15px]`. May be applied zonally.	radius	Depth-of-field error
5	ColourTempShift	Planckian locus CCT transform via `sRGB→XYZ→D65` chromatic adaptation. CCT `[3200–8000K]`.	cct_kelvin	Time-of-day lighting
6	ColourFade	HSV saturation reduction `[-30% to -70%]` + gamma correction `[0.6–1.4]` in LAB L-channel.	sat_factor, gamma	Dust / sensor degradation
7	DustOverlay	Poisson-distributed semi-transparent ellipses `[N: 50–300]`, opacity `[0.2–0.6]`, with Gaussian blur.	n_particles, opacity	Field dust / debris
8	SensorNoise	Mixed Gaussian (`sigma [5–30]`) + Poisson shot noise + JPEG compression artefact (`QF [40–90]`).	sigma, jpeg_qf	Camera noise

WP1.3 Adaptive Augmentation Scheduler (AAS)

Implement as a PyTorch training callback. Augmentation intensity D(epoch) follows a sigmoid curriculum:

D(epoch) = D_min + (D_max − D_min) × sigmoid((epoch − T_half) / tau)

where D_min = 0.1, D_max = 1.0, T_half = 50, tau = 10. Log D(epoch) to the training dashboard at every epoch.

WP1.4 Class-Imbalance–Aware Augmentation (CIAA)

Compute per-class sample weights as:

w_c = N_total / (N_classes × N_c)

Use PyTorch WeightedRandomSampler for batch construction. Additionally, minority classes (< 15% of total) receive augmentation intensity multiplied by 1.3× compared to majority classes.

WP1.5 Acceptance Criteria — WP1

All 8 augmentation classes are implemented, unit-tested, and pass a visual QC grid (original vs augmented at intensity levels 0.2 / 0.5 / 0.8 / 1.0 for 3 sample images per class).
Augmentation pipeline runs at ≥ 120 images/second on a single GPU (batch = 32) without becoming a training bottleneck.
Spatially consistent transforms apply identically to image and mask (verified on 20 image-mask pairs for WP4 use).
AAS logs are visible in training dashboard and match the sigmoid curve analytically.

WP2 — Phase 1 Model: DIPAug-Net (Disease Classification)

Applies to: Phase 1 only

WP2.1 Architecture

Implement DIPAug-Net with the following components:

CNN Branch: EfficientNet-B3 (ImageNet pre-trained via timm). Extract feature maps at stride-8 and stride-16 spatial resolutions.
Transformer Branch: Swin-Transformer-Tiny (ImageNet pre-trained via timm). Extract features at matching resolutions using the patch-merging hierarchy.
Cross-Attention Fusion Gate: nn.MultiheadAttention (8 heads, embed_dim = 512). CNN features = Keys/Values; Transformer features = Queries. Output is projected to 512-d.
Classification Head: GlobalAvgPool → FC(512, GELU) → Dropout(0.4) → FC(N_class) → Softmax.

WP2.2 Loss Function

L_total = 0.6 × L_focal + 0.4 × L_weighted_CE

Use gamma = 2.0 for focal loss. Class weights computed from training set distribution. Both losses use the same class weight vector.

WP2.3 Training Protocol

Hyperparameter	Value
Optimiser	AdamW, weight decay = `1e-4`
LR Schedule	Cosine annealing, initial LR = `1e-4`, min LR = `1e-6`
Batch size	32
Epochs	100, with early stopping patience = 15 epochs on val F1
Gradient clipping	`max_norm = 1.0`
Random seed	42 (all experiments)
Mixed precision	`torch.cuda.amp` (FP16)

WP2.4 Ablation Study — 6 Configurations

Train all 6 ablation configurations under identical protocols. Only the components listed differ across runs:

Exp	Description	DIPAug	AAS	Dual-Branch	CIAA Loss
E1	EfficientNet-B3, no augmentation	—	—	—	—
E2	EfficientNet-B3, geometric aug only (flip/rotate/crop)	Geometric	—	—	—
E3	EfficientNet-B3 + full DIPAug, fixed intensity	Yes	—	—	—
E4	EfficientNet-B3 + DIPAug + AAS	Yes	Yes	—	—
E5	Dual-Branch + DIPAug, no AAS, no CIAA	Yes	—	Yes	—
E6	DIPAug-Net FULL — all components	Yes	Yes	Yes	Yes

WP2.5 CEDB Robustness Evaluation

Implement the Controlled Extreme Distortion Benchmark (CEDB): apply each of the 8 distortion types at 5 severity levels (0.2 / 0.4 / 0.6 / 0.8 / 1.0) to the held-out test set (distortions applied post-training, images not seen during training). Report per-class F1 and overall accuracy under every (distortion, severity) combination. Generate a radar chart and degradation curve per distortion type.

WP2.6 Explainability

Generate Grad-CAM++ saliency maps (pytorch-grad-cam library) for a stratified sample of 50 test images (10 per class), including 5 correctly and 5 incorrectly classified examples per class. Save overlay images at publication quality (300 dpi). Verify qualitatively that activations concentrate on lesion regions.

WP2.7 Acceptance Criteria — WP2

All 6 ablation experiments complete and metrics logged. E6 (full model) must achieve overall accuracy ≥ 95% on the clean test set.
CEDB benchmark runs without errors; outputs a structured CSV (distortion, severity, class, accuracy, F1).
Grad-CAM++ overlays generated for all the sample images; saved as publication-quality PNGs.
Training logs (loss, accuracy, val-F1) exported in CSV and visualised as plots.
McNemar statistical test comparing E6 vs each baseline; p-values reported.

WP3 — Phase 2 Extensions: Contrastive Pre-Training and DGSM

Applies to: Phase 2 only (built on top of WP1 and WP2 codebase)

WP3.1 DIP-Guided Saliency Module (DGSM)

Implement as a non-trainable OpenCV-based preprocessing module:

Convert input image to LAB and HSV colour spaces.
Apply adaptive thresholding on the A-channel (LAB) to isolate yellow/brown lesion regions; on the S-channel (HSV) to capture desaturated necrotic zones.
Apply morphological closing (7×7 kernel) + connected component filtering (remove components < 50px).
Gaussian blur the binary mask (sigma = 15px) to produce a soft saliency map M_saliency in [0, 1].
Concatenate M_saliency as a 4th channel to the RGB input before the encoder.

Validate DGSM on 50 image-mask pairs from the segmentation dataset: compute IoU between DGSM output and ground truth mask. Target: mean IoU ≥ 0.45 as a prior (not the final prediction).

WP3.2 Augmentation-Consistent Contrastive Pre-Training

Implement SimCLR pre-training using the lightly-train library:

Backbone: EfficientNet-B4 (timm, ImageNet init). Projection head: MLP (2048 → 512 → 256). Temperature tau = 0.07.
Positive pairs: two independently sampled DIPAug transforms applied to the same image (no class labels used).
Training: 50 epochs, batch = 256 (use all available training images), AdamW LR = 3e-4, cosine decay.
After pre-training, evaluate representation quality via a linear probe: freeze backbone, train a single linear classifier, report accuracy.
Save pre-trained backbone weights as a checkpoint for use in WP4.

WP3.3 Acceptance Criteria — WP3

DGSM module produces saliency maps for all training images without errors; mean IoU ≥ 0.45 on validation sample.
SimCLR pre-training completes; linear probe accuracy ≥ 60% (confirming informative representation).
Pre-trained checkpoint saved and loadable for WP4.

WP4 — Phase 2 Model: DIPAug-SeverNet (Severity Scoring)

Applies to: Phase 2 only

WP4.1 Dataset Preparation for Segmentation

Primary: Obtain PDSNets wheat stripe rust field dataset (Shi et al., Agriculture 2025) which includes pixel-level masks.
Secondary (weak labels): Apply DGSM + SAM (Meta Segment Anything Model, vit_b checkpoint) to generate pseudo-masks for the classification-only Wheat Plant Disease dataset. Use DGSM bounding boxes as SAM spatial prompts. Save refined pseudo-masks.
Severity Grade Assignment: Compute severity % = (lesion pixels / total leaf pixels) from the ground truth/pseudo mask. Map to Grade 0–5 using thresholds: 0%, 1–10%, 11–25%, 26–50%, 51–75%, >75%.
Flag images within 3% of grade boundaries as borderline and log them for potential expert review.

WP4.2 Architecture — DIPAug-SeverNet

Shared Encoder: EfficientNet-B4, initialised from the SimCLR checkpoint (WP3.2). Fine-tune upper 40% of layers; freeze lower 60%.
Segmentation Decoder: U-Net-style decoder with skip connections from 4 encoder stages. Final output: binary lesion mask at original resolution (H×W). Loss: 0.5 × BCE + 0.5 × Dice.
Severity Head: Lesion-weighted feature = GlobalAvgPool(F_enc × M_pred) + GlobalAvgPool(F_enc). Feed to 3-layer MLP (1024→256→64): (i) sigmoid output for continuous severity score s in [0,1] (MSE loss), and (ii) ordinal output with 5 thresholds using CORN ordinal regression (ordinal CE loss).
Uncertainty-Weighted Multi-Task Loss (Kendall et al.): L_total = sum_i [ (1 / 2σ_i²) × L_i + log σ_i ], where σ_i are learnable log-uncertainty parameters for the 4 tasks: segmentation, classification, severity regression, severity ordinal.

WP4.3 Training Protocol

Stage 1 (Warm-up, 20 epochs): Activate segmentation loss only. Freeze severity head. Allow encoder fine-tuning from epoch 10 onward.
Stage 2 (Joint MTL, 80 epochs): Activate all losses with uncertainty weighting. Same hyperparameters as WP2.3.
Log all 4 individual task losses and σ_i values at every epoch.

WP4.4 Ablation Study — 5 Configurations

Exp	Description	DGSM	SimCLR	Seg Decoder	MTL Loss
S1	EfficientNet-B4 + severity head only (no seg)	—	—	—	CE only
S2	S1 + Segmentation Decoder (no DGSM, no SimCLR)	—	—	Yes	BCE+Dice
S3	S2 + DGSM saliency input	Yes	—	Yes	BCE+Dice
S4	S3 + SimCLR pre-training (no uncertainty MTL)	Yes	Yes	Yes	Fixed weights
S5	DIPAug-SeverNet FULL — all components	Yes	Yes	Yes	Uncertainty MTL

WP4.5 Acceptance Criteria — WP4

S5 (full model) achieves: segmentation mIoU ≥ 0.60, severity MAE ≤ 12%, severity QWK ≥ 0.70, classification F1 ≥ 0.85.
All σ_i uncertainty weights converge (logged curves saved).
Pseudo-mask generation pipeline completes for all images and pseudo-masks pass visual QC (random 30-image sample).

WP5 — Evaluation, Visualisation, and Results Package

Applies to: Both Phases

WP5.1 Metrics to Compute and Report

Phase 1 Metrics:

Overall accuracy, macro Precision / Recall / F1, Cohen's Kappa, per-class AUC-ROC.
Confusion matrix (normalised, publication-quality heatmap).
CEDB robustness CSV + radar chart + per-distortion degradation line plots.
McNemar test p-values vs all baselines.
Inference time (ms/image, averaged over 100 runs) and model parameter count.

Phase 2 Metrics:

All Paper 1 metrics (for the classification head of DIPAug-SeverNet).
Segmentation: IoU, Dice, Pixel Accuracy, mIoU, Boundary F1.
Severity regression: Pearson r, MAE, RMSE between predicted and ground truth severity %.
Severity ordinal: Quadratic Weighted Kappa (QWK), per-grade accuracy.
CEDB severity degradation for each metric under all 8 distortion types.
Relative Performance Retention (RPR) = metric_augmented / metric_clean per distortion.

WP5.2 Figures to Produce (Publication-Quality, 300 dpi)

DIPAug visualisation grid: 8 rows × 5 columns (augmentation type × intensity level), 3 disease classes shown.
AAS sigmoid schedule plot (D vs epoch).
DIPAug-Net architecture diagram (clear block diagram, export as PNG/SVG).
DIPAug-SeverNet architecture diagram (encoder + dual decoder + severity head).
DGSM pipeline figure: RGB input → LAB/HSV processing → saliency map → 4-channel input.
Spatially consistent augmentation pair: [image | mask | augmented image | augmented mask].
Confusion matrices for Paper 1 (all 6 ablations + baselines).
CEDB radar chart and degradation curves (Paper 1 and Paper 2).
Grad-CAM++ overlays — 50 images for Paper 1, 30 images for Paper 2.
Severity grade visualisation grid: 5 disease classes × 6 severity grades, with predicted vs ground truth overlaid.
Severity score scatter plot: predicted vs ground truth (r² annotated).
SimCLR linear probe accuracy learning curve.
σ_i uncertainty weight convergence plots (WP4 training).

WP5.3 Results Report

Deliver a structured Results Report (PDF or DOCX) with all quantitative results. The report must be organised as:

experiment configuration tables
main results tables
all figures listed in WP5.2
ablation analysis commentary (2–3 sentences per ablation row)
CEDB analysis section
failure case analysis — minimum 5 examples with explanation per paper

This report is the primary input to manuscript drafting.

WP6 — Deliverables

WP6.1 Repository Structure

dipaugnet/ — Paper 1 model, training, evaluation scripts
dipaugsevernet/ — Paper 2 model, training, evaluation scripts
dipauglib/ — Shared DIPAug augmentation modules (WP1)
notebooks/ — Jupyter notebooks for QC visualisations and result analysis
configs/ — YAML configuration files for all 11 experiments (E1–E6, S1–S5)
results/ — CSV files, metric summaries, and saved model checkpoints
figures/ — All publication-quality figures (PNG + SVG where applicable)
README.md — Full setup and execution guide
requirements.txt — Pinned dependency versions
RESULTS_REPORT.docx — Structured results report (WP5.3)

WP6.2 Code Quality Requirements

All modules have docstrings (Google style).
Unit tests for all 8 DIPAug transforms (pytest), covering edge cases (all-black image, single-pixel, mask consistency).
Config-driven training: all hyperparameters loaded from YAML, not hardcoded.
Reproducibility: fix all random seeds in PyTorch, NumPy, and Python random; document CUDA version and GPU used.
No GPU-specific hardcoded paths; code must run on any CUDA-enabled machine given the same dataset path.

WP6.3 Acceptance Criteria — WP6

Repository passes a fresh-clone test: setup → data download → single training run → evaluation → figure export, all without manual intervention beyond dataset path configuration.
All 11 training experiments are reproducible from the provided YAML configs.
pytest unit tests pass with ≥ 95% test coverage on DIPAug modules.

3. Baseline Models to Reproduce

The freelancer must train and evaluate the following baselines on the same dataset splits and report results in the same metric tables as the proposed models. Source code for baselines may be adapted from official repositories; full attribution required.

3.1 Paper 1 Baselines

Model	Reference	Architecture	Augmentation
ResNet-50 + Generic Aug	He et al., CVPR 2016	ResNet-50	Flip / rotate / crop
EfficientNet-B3 (no aug)	Tan & Le, ICML 2019	EfficientNet-B3	None
EfficientNet-B3 + Generic Aug	Tan & Le, 2019	EfficientNet-B3	Standard geometric
SC-ConvNeXt	Dong et al., Sci Rep 2024	SimCLR + CBAM + ConvNeXt	Generic
GLNet	Li et al., Front. Plant Sci. 2024	Global-Local CNN	Generic
ViT Multi-level Contrast	Khalid et al., Sci Rep 2025	Vision Transformer	Standard
CropNet	Bhola, AgriEngineering 2024	Shallow CNN + Transfer	Standard

3.2 Paper 2 Baselines

Model	Reference	Task	Notes
U-Net (standard)	Ronneberger et al., 2015	Segmentation	ResNet-34 encoder
Attention U-Net	Oktay et al., 2018	Segmentation	Attention gates
DeepLabV3+ (ResNet-50)	Chen et al., ECCV 2018	Segmentation	ASPP module
PDSNets (LinkNet-ResNet18)	Shi et al., Agriculture 2025	Field disease seg.	Closest field SOTA
DIPAug-Net + Ordinal Head	Ablation (this work)	Class + Severity	No segmentation
SegLearner	Upadhyay & Gupta, 2025	Disease severity	Severity scoring baseline

4. Deliverables and Milestones

MS	Week	Deliverable	Acceptance Criteria
M1	Wk 2	Dataset setup complete. DIPAug module (all 8 transforms) implemented and unit-tested. Visual QC grid submitted.	`pytest` passes; QC grid approved by project owner.
M2	Wk 4	AAS, CIAA, and CEDB modules implemented. DIPAug pipeline benchmarked (`≥120 img/s`).	Speed benchmark log submitted; AAS curve verified.
M3	Wk 6	DIPAug-Net architecture implemented. Baseline models (Paper 1) training complete.	Training logs for all baselines submitted.
M4	Wk 9	All 6 ablation experiments (`E1–E6`) trained and evaluated. Paper 1 results table and figures ready.	Main results table, confusion matrices, Grad-CAM overlays submitted.
M5	Wk 10	DGSM validated. SimCLR pre-training complete. Pseudo-mask generation done.	DGSM IoU report; SimCLR linear probe accuracy reported.
M6	Wk 12	DIPAug-SeverNet trained. All 5 ablation experiments (`S1–S5`) and Paper 2 baselines complete.	Full metric suite for Paper 2 submitted.
M7	Wk 14	All figures, Results Report, and complete GitHub repository submitted.	Fresh-clone test passes. All acceptance criteria met.

5. Technical Requirements

5.1 Mandatory Stack

Category	Required Tools / Libraries
Language	Python 3.10+
Deep Learning	PyTorch 2.x, torchvision, timm
Image Processing	OpenCV 4.x, scikit-image, Pillow
Augmentation	Albumentations (DualTransform for image+mask consistency)
SSL Pre-training	lightly-train (SimCLR implementation)
Segmentation	segmentation-models-pytorch
Weak Labelling	segment-anything (SAM vit_b checkpoint)
Experiment Tracking	Weights & Biases (WandB) — project owner will provide API key
Explainability	pytorch-grad-cam
Ordinal Regression	CORN loss (custom PyTorch implementation or coral-pytorch)
Statistical Testing	scipy.stats (McNemar test)
Version Control	Git (private GitHub repo, shared with project owner)
Config Management	PyYAML or Hydra
Testing	pytest (`≥ 95%` coverage on DIPAug modules)

5.2 Hardware Requirement

The freelancer is expected to have access to a GPU with at least 16 GB VRAM (e.g., NVIDIA RTX 3090, A5000, or equivalent cloud instance). Paper 2 training (DIPAug-SeverNet with segmentation decoder) requires at least 24 GB VRAM or gradient checkpointing enabled. Cloud credits on Google Colab Pro+, Lambda Labs, or Vast.ai are acceptable alternatives. Hardware costs are the freelancer's responsibility unless separately agreed.

5.3 Freelancer Skill Requirements

Proficiency in PyTorch — custom loss functions, multi-task training loops, mixed-precision training.
Experience with computer vision — image segmentation, classification, CNN and Transformer architectures.
Familiarity with image processing — OpenCV, colour space transforms, PSF-based blurring.
Experience with self-supervised learning (SimCLR or equivalent) is strongly preferred.
Ability to produce publication-quality figures (matplotlib, seaborn, or equivalent).
Clean coding practices — docstrings, modular design, config-driven experiments.
Applicants must share at least one relevant GitHub repository or project portfolio link.

6. Out of Scope (Not the Freelancer's Responsibility)

The following tasks are handled by the project owner and should NOT be included in the freelancer's bid or timeline.

Academic manuscript writing, formatting, or submission.
Literature review and related work section drafting.
Journal selection and correspondence with editors.
Expert agronomist review of borderline severity labels.
Acquiring additional field image datasets beyond those specified (PDSNets + Wheat Plant Disease).
Deploying the model to any mobile or web application.
Providing GPU cloud credits (unless separately agreed).

7. Communication and Reporting Protocol

Weekly progress update (every Monday): brief written summary of completed tasks, blockers, and planned tasks for the coming week.
Milestone submissions: each milestone deliverable is submitted via GitHub commit + a short Loom screen recording (5–10 min) walking through key results and any design decisions made.
All questions or clarification requests must be raised within 24 hours of encountering a blocker; do not spend more than 4 hours stuck on a single issue before reaching out.
The project owner will review milestone submissions within 3 business days and provide written feedback.
Revision rounds: up to 2 revision rounds per milestone are included within scope; additional revisions may incur extra cost.

8. Application Instructions

To apply for this task, please submit the following:

A brief cover message (max 200 words) explaining your relevant experience with PyTorch, image segmentation, and data augmentation.
A link to at least one relevant GitHub repository demonstrating your code quality and PyTorch experience.
Your estimated timeline for each milestone (M1–M7) and your total project quote.
Confirmation of GPU availability and specifications.
Answer the screening question: "Describe in 2–3 sentences how you would implement a physics-based motion blur augmentation using OpenCV that is also compatible with Albumentations DualTransform for mask consistency."

Applications that do not include all five items above will not be reviewed.

DIPAug Project Hub