DIPAug Project Hub

Project Title

Realistic Digital Image Processing-Driven Data Augmentation for Robust Wheat Leaf Disease Classification and Severity Scoring in Field Conditions

Short Titles

  • DIPAug-Net
  • DIPAug-SeverNet

This Hugging Face app is a lightweight dashboard for the project scaffold. It helps inspect the experiment configs and repository structure before training on a proper GPU machine.

Realistic Digital Image Processing-Driven Data Augmentation for Robust Wheat Leaf Disease Classification and Severity Scoring in Field Conditions

1. Project Context and Objective

The freelancer's role is to implement, train, evaluate, and document the experimental results.

1.1 The Task Briefly

Item Classification Severity Scoring
Short title DIPAug-Net DIPAug-SeverNet
Core task Classify wheat leaf disease from augmented images under field conditions Segment lesion regions and predict quantitative severity score (0–100%)
Key novel contribution Physics-aware DIP augmentation framework + adaptive curriculum + dual-branch CNN-Transformer DIP-guided saliency module + multi-task lesion segmentation + ordinal severity regression
Primary output Disease class label (11 classes) Disease class + lesion mask + severity grade (0–5)

2. Complete Scope of Work

Everything in this section is the freelancer's responsibility. The task is broken into 6 work packages (WPs). Each WP has a defined set of code deliverables and acceptance criteria.

WP1 — Data Preparation and Augmentation Pipeline

Applies to: Both Phase 1 and Phase 2

WP1.1 Dataset Setup

  1. Download the Wheat Plant Disease dataset from Kaggle / Mendeley (public domain). Verify class distribution, resolution, and label quality. Report any anomalies.
  2. Implement a stratified 70/15/15 train-validation-test split, locked by a fixed random seed (seed = 42) so all experiments use identical splits.
  3. Resize all images to 384×384. Apply ImageNet normalisation (mean [0.485, 0.456, 0.406], std [0.229, 0.224, 0.225]).

WP1.2 DIPAug Module — 8 Physics-Aware Augmentations

Implement each augmentation as a callable Python class inheriting from Albumentations DualTransform (image + mask compatible). Each class must accept a probability p and an intensity parameter scaled to [0, 1].

# Augmentation Class Implementation Specification Parameter Range Field Phenomenon
1 IlluminationGradient Bidirectional gradient on LAB L-channel. Random sun angle [0–360°], strength [0.3–0.8] angle, strength Sun angle variation
2 CastShadow Geometric polygon (3–6 vertices) with Gaussian penumbra blur sigma [5–20px]. Area 10–40% of image. area, blur_sigma Canopy shadow
3 MotionBlur Linear PSF kernel with random angle [0–180°]. Kernel size [5–25px]. Use cv2.filter2D. kernel_size, angle Camera / wind shake
4 DefocusBlur Disk-shaped pillbox PSF via circular averaging. Radius [3–15px]. May be applied zonally. radius Depth-of-field error
5 ColourTempShift Planckian locus CCT transform via sRGB→XYZ→D65 chromatic adaptation. CCT [3200–8000K]. cct_kelvin Time-of-day lighting
6 ColourFade HSV saturation reduction [-30% to -70%] + gamma correction [0.6–1.4] in LAB L-channel. sat_factor, gamma Dust / sensor degradation
7 DustOverlay Poisson-distributed semi-transparent ellipses [N: 50–300], opacity [0.2–0.6], with Gaussian blur. n_particles, opacity Field dust / debris
8 SensorNoise Mixed Gaussian (sigma [5–30]) + Poisson shot noise + JPEG compression artefact (QF [40–90]). sigma, jpeg_qf Camera noise

WP1.3 Adaptive Augmentation Scheduler (AAS)

Implement as a PyTorch training callback. Augmentation intensity D(epoch) follows a sigmoid curriculum:

D(epoch) = D_min + (D_max − D_min) × sigmoid((epoch − T_half) / tau)

where D_min = 0.1, D_max = 1.0, T_half = 50, tau = 10. Log D(epoch) to the training dashboard at every epoch.

WP1.4 Class-Imbalance–Aware Augmentation (CIAA)

Compute per-class sample weights as:

w_c = N_total / (N_classes × N_c)

Use PyTorch WeightedRandomSampler for batch construction. Additionally, minority classes (< 15% of total) receive augmentation intensity multiplied by 1.3× compared to majority classes.

WP1.5 Acceptance Criteria — WP1

  • All 8 augmentation classes are implemented, unit-tested, and pass a visual QC grid (original vs augmented at intensity levels 0.2 / 0.5 / 0.8 / 1.0 for 3 sample images per class).
  • Augmentation pipeline runs at ≥ 120 images/second on a single GPU (batch = 32) without becoming a training bottleneck.
  • Spatially consistent transforms apply identically to image and mask (verified on 20 image-mask pairs for WP4 use).
  • AAS logs are visible in training dashboard and match the sigmoid curve analytically.

WP2 — Phase 1 Model: DIPAug-Net (Disease Classification)

Applies to: Phase 1 only

WP2.1 Architecture

Implement DIPAug-Net with the following components:

  • CNN Branch: EfficientNet-B3 (ImageNet pre-trained via timm). Extract feature maps at stride-8 and stride-16 spatial resolutions.
  • Transformer Branch: Swin-Transformer-Tiny (ImageNet pre-trained via timm). Extract features at matching resolutions using the patch-merging hierarchy.
  • Cross-Attention Fusion Gate: nn.MultiheadAttention (8 heads, embed_dim = 512). CNN features = Keys/Values; Transformer features = Queries. Output is projected to 512-d.
  • Classification Head: GlobalAvgPool → FC(512, GELU) → Dropout(0.4) → FC(N_class) → Softmax.

WP2.2 Loss Function

L_total = 0.6 × L_focal + 0.4 × L_weighted_CE

Use gamma = 2.0 for focal loss. Class weights computed from training set distribution. Both losses use the same class weight vector.

WP2.3 Training Protocol

Hyperparameter Value
Optimiser AdamW, weight decay = 1e-4
LR Schedule Cosine annealing, initial LR = 1e-4, min LR = 1e-6
Batch size 32
Epochs 100, with early stopping patience = 15 epochs on val F1
Gradient clipping max_norm = 1.0
Random seed 42 (all experiments)
Mixed precision torch.cuda.amp (FP16)

WP2.4 Ablation Study — 6 Configurations

Train all 6 ablation configurations under identical protocols. Only the components listed differ across runs:

Exp Description DIPAug AAS Dual-Branch CIAA Loss
E1 EfficientNet-B3, no augmentation
E2 EfficientNet-B3, geometric aug only (flip/rotate/crop) Geometric
E3 EfficientNet-B3 + full DIPAug, fixed intensity Yes
E4 EfficientNet-B3 + DIPAug + AAS Yes Yes
E5 Dual-Branch + DIPAug, no AAS, no CIAA Yes Yes
E6 DIPAug-Net FULL — all components Yes Yes Yes Yes

WP2.5 CEDB Robustness Evaluation

Implement the Controlled Extreme Distortion Benchmark (CEDB): apply each of the 8 distortion types at 5 severity levels (0.2 / 0.4 / 0.6 / 0.8 / 1.0) to the held-out test set (distortions applied post-training, images not seen during training). Report per-class F1 and overall accuracy under every (distortion, severity) combination. Generate a radar chart and degradation curve per distortion type.

WP2.6 Explainability

Generate Grad-CAM++ saliency maps (pytorch-grad-cam library) for a stratified sample of 50 test images (10 per class), including 5 correctly and 5 incorrectly classified examples per class. Save overlay images at publication quality (300 dpi). Verify qualitatively that activations concentrate on lesion regions.

WP2.7 Acceptance Criteria — WP2

  • All 6 ablation experiments complete and metrics logged. E6 (full model) must achieve overall accuracy ≥ 95% on the clean test set.
  • CEDB benchmark runs without errors; outputs a structured CSV (distortion, severity, class, accuracy, F1).
  • Grad-CAM++ overlays generated for all the sample images; saved as publication-quality PNGs.
  • Training logs (loss, accuracy, val-F1) exported in CSV and visualised as plots.
  • McNemar statistical test comparing E6 vs each baseline; p-values reported.

WP3 — Phase 2 Extensions: Contrastive Pre-Training and DGSM

Applies to: Phase 2 only (built on top of WP1 and WP2 codebase)

WP3.1 DIP-Guided Saliency Module (DGSM)

Implement as a non-trainable OpenCV-based preprocessing module:

  1. Convert input image to LAB and HSV colour spaces.
  2. Apply adaptive thresholding on the A-channel (LAB) to isolate yellow/brown lesion regions; on the S-channel (HSV) to capture desaturated necrotic zones.
  3. Apply morphological closing (7×7 kernel) + connected component filtering (remove components < 50px).
  4. Gaussian blur the binary mask (sigma = 15px) to produce a soft saliency map M_saliency in [0, 1].
  5. Concatenate M_saliency as a 4th channel to the RGB input before the encoder.

Validate DGSM on 50 image-mask pairs from the segmentation dataset: compute IoU between DGSM output and ground truth mask. Target: mean IoU ≥ 0.45 as a prior (not the final prediction).

WP3.2 Augmentation-Consistent Contrastive Pre-Training

Implement SimCLR pre-training using the lightly-train library:

  • Backbone: EfficientNet-B4 (timm, ImageNet init). Projection head: MLP (2048 → 512 → 256). Temperature tau = 0.07.
  • Positive pairs: two independently sampled DIPAug transforms applied to the same image (no class labels used).
  • Training: 50 epochs, batch = 256 (use all available training images), AdamW LR = 3e-4, cosine decay.
  • After pre-training, evaluate representation quality via a linear probe: freeze backbone, train a single linear classifier, report accuracy.
  • Save pre-trained backbone weights as a checkpoint for use in WP4.

WP3.3 Acceptance Criteria — WP3

  • DGSM module produces saliency maps for all training images without errors; mean IoU ≥ 0.45 on validation sample.
  • SimCLR pre-training completes; linear probe accuracy ≥ 60% (confirming informative representation).
  • Pre-trained checkpoint saved and loadable for WP4.

WP4 — Phase 2 Model: DIPAug-SeverNet (Severity Scoring)

Applies to: Phase 2 only

WP4.1 Dataset Preparation for Segmentation

  • Primary: Obtain PDSNets wheat stripe rust field dataset (Shi et al., Agriculture 2025) which includes pixel-level masks.
  • Secondary (weak labels): Apply DGSM + SAM (Meta Segment Anything Model, vit_b checkpoint) to generate pseudo-masks for the classification-only Wheat Plant Disease dataset. Use DGSM bounding boxes as SAM spatial prompts. Save refined pseudo-masks.
  • Severity Grade Assignment: Compute severity % = (lesion pixels / total leaf pixels) from the ground truth/pseudo mask. Map to Grade 0–5 using thresholds: 0%, 1–10%, 11–25%, 26–50%, 51–75%, >75%.
  • Flag images within 3% of grade boundaries as borderline and log them for potential expert review.

WP4.2 Architecture — DIPAug-SeverNet

  • Shared Encoder: EfficientNet-B4, initialised from the SimCLR checkpoint (WP3.2). Fine-tune upper 40% of layers; freeze lower 60%.
  • Segmentation Decoder: U-Net-style decoder with skip connections from 4 encoder stages. Final output: binary lesion mask at original resolution (H×W). Loss: 0.5 × BCE + 0.5 × Dice.
  • Severity Head: Lesion-weighted feature = GlobalAvgPool(F_enc × M_pred) + GlobalAvgPool(F_enc). Feed to 3-layer MLP (1024→256→64): (i) sigmoid output for continuous severity score s in [0,1] (MSE loss), and (ii) ordinal output with 5 thresholds using CORN ordinal regression (ordinal CE loss).
  • Uncertainty-Weighted Multi-Task Loss (Kendall et al.): L_total = sum_i [ (1 / 2σ_i²) × L_i + log σ_i ], where σ_i are learnable log-uncertainty parameters for the 4 tasks: segmentation, classification, severity regression, severity ordinal.

WP4.3 Training Protocol

  • Stage 1 (Warm-up, 20 epochs): Activate segmentation loss only. Freeze severity head. Allow encoder fine-tuning from epoch 10 onward.
  • Stage 2 (Joint MTL, 80 epochs): Activate all losses with uncertainty weighting. Same hyperparameters as WP2.3.
  • Log all 4 individual task losses and σ_i values at every epoch.

WP4.4 Ablation Study — 5 Configurations

Exp Description DGSM SimCLR Seg Decoder MTL Loss
S1 EfficientNet-B4 + severity head only (no seg) CE only
S2 S1 + Segmentation Decoder (no DGSM, no SimCLR) Yes BCE+Dice
S3 S2 + DGSM saliency input Yes Yes BCE+Dice
S4 S3 + SimCLR pre-training (no uncertainty MTL) Yes Yes Yes Fixed weights
S5 DIPAug-SeverNet FULL — all components Yes Yes Yes Uncertainty MTL

WP4.5 Acceptance Criteria — WP4

  • S5 (full model) achieves: segmentation mIoU ≥ 0.60, severity MAE ≤ 12%, severity QWK ≥ 0.70, classification F1 ≥ 0.85.
  • All σ_i uncertainty weights converge (logged curves saved).
  • Pseudo-mask generation pipeline completes for all images and pseudo-masks pass visual QC (random 30-image sample).

WP5 — Evaluation, Visualisation, and Results Package

Applies to: Both Phases

WP5.1 Metrics to Compute and Report

Phase 1 Metrics:

  • Overall accuracy, macro Precision / Recall / F1, Cohen's Kappa, per-class AUC-ROC.
  • Confusion matrix (normalised, publication-quality heatmap).
  • CEDB robustness CSV + radar chart + per-distortion degradation line plots.
  • McNemar test p-values vs all baselines.
  • Inference time (ms/image, averaged over 100 runs) and model parameter count.

Phase 2 Metrics:

  • All Paper 1 metrics (for the classification head of DIPAug-SeverNet).
  • Segmentation: IoU, Dice, Pixel Accuracy, mIoU, Boundary F1.
  • Severity regression: Pearson r, MAE, RMSE between predicted and ground truth severity %.
  • Severity ordinal: Quadratic Weighted Kappa (QWK), per-grade accuracy.
  • CEDB severity degradation for each metric under all 8 distortion types.
  • Relative Performance Retention (RPR) = metric_augmented / metric_clean per distortion.

WP5.2 Figures to Produce (Publication-Quality, 300 dpi)

  1. DIPAug visualisation grid: 8 rows × 5 columns (augmentation type × intensity level), 3 disease classes shown.
  2. AAS sigmoid schedule plot (D vs epoch).
  3. DIPAug-Net architecture diagram (clear block diagram, export as PNG/SVG).
  4. DIPAug-SeverNet architecture diagram (encoder + dual decoder + severity head).
  5. DGSM pipeline figure: RGB input → LAB/HSV processing → saliency map → 4-channel input.
  6. Spatially consistent augmentation pair: [image | mask | augmented image | augmented mask].
  7. Confusion matrices for Paper 1 (all 6 ablations + baselines).
  8. CEDB radar chart and degradation curves (Paper 1 and Paper 2).
  9. Grad-CAM++ overlays — 50 images for Paper 1, 30 images for Paper 2.
  10. Severity grade visualisation grid: 5 disease classes × 6 severity grades, with predicted vs ground truth overlaid.
  11. Severity score scatter plot: predicted vs ground truth ( annotated).
  12. SimCLR linear probe accuracy learning curve.
  13. σ_i uncertainty weight convergence plots (WP4 training).

WP5.3 Results Report

Deliver a structured Results Report (PDF or DOCX) with all quantitative results. The report must be organised as:

  1. experiment configuration tables
  2. main results tables
  3. all figures listed in WP5.2
  4. ablation analysis commentary (2–3 sentences per ablation row)
  5. CEDB analysis section
  6. failure case analysis — minimum 5 examples with explanation per paper

This report is the primary input to manuscript drafting.

WP6 — Deliverables

WP6.1 Repository Structure

  • dipaugnet/ — Paper 1 model, training, evaluation scripts
  • dipaugsevernet/ — Paper 2 model, training, evaluation scripts
  • dipauglib/ — Shared DIPAug augmentation modules (WP1)
  • notebooks/ — Jupyter notebooks for QC visualisations and result analysis
  • configs/ — YAML configuration files for all 11 experiments (E1–E6, S1–S5)
  • results/ — CSV files, metric summaries, and saved model checkpoints
  • figures/ — All publication-quality figures (PNG + SVG where applicable)
  • README.md — Full setup and execution guide
  • requirements.txt — Pinned dependency versions
  • RESULTS_REPORT.docx — Structured results report (WP5.3)

WP6.2 Code Quality Requirements

  • All modules have docstrings (Google style).
  • Unit tests for all 8 DIPAug transforms (pytest), covering edge cases (all-black image, single-pixel, mask consistency).
  • Config-driven training: all hyperparameters loaded from YAML, not hardcoded.
  • Reproducibility: fix all random seeds in PyTorch, NumPy, and Python random; document CUDA version and GPU used.
  • No GPU-specific hardcoded paths; code must run on any CUDA-enabled machine given the same dataset path.

WP6.3 Acceptance Criteria — WP6

  • Repository passes a fresh-clone test: setup → data download → single training run → evaluation → figure export, all without manual intervention beyond dataset path configuration.
  • All 11 training experiments are reproducible from the provided YAML configs.
  • pytest unit tests pass with ≥ 95% test coverage on DIPAug modules.

3. Baseline Models to Reproduce

The freelancer must train and evaluate the following baselines on the same dataset splits and report results in the same metric tables as the proposed models. Source code for baselines may be adapted from official repositories; full attribution required.

3.1 Paper 1 Baselines

Model Reference Architecture Augmentation
ResNet-50 + Generic Aug He et al., CVPR 2016 ResNet-50 Flip / rotate / crop
EfficientNet-B3 (no aug) Tan & Le, ICML 2019 EfficientNet-B3 None
EfficientNet-B3 + Generic Aug Tan & Le, 2019 EfficientNet-B3 Standard geometric
SC-ConvNeXt Dong et al., Sci Rep 2024 SimCLR + CBAM + ConvNeXt Generic
GLNet Li et al., Front. Plant Sci. 2024 Global-Local CNN Generic
ViT Multi-level Contrast Khalid et al., Sci Rep 2025 Vision Transformer Standard
CropNet Bhola, AgriEngineering 2024 Shallow CNN + Transfer Standard

3.2 Paper 2 Baselines

Model Reference Task Notes
U-Net (standard) Ronneberger et al., 2015 Segmentation ResNet-34 encoder
Attention U-Net Oktay et al., 2018 Segmentation Attention gates
DeepLabV3+ (ResNet-50) Chen et al., ECCV 2018 Segmentation ASPP module
PDSNets (LinkNet-ResNet18) Shi et al., Agriculture 2025 Field disease seg. Closest field SOTA
DIPAug-Net + Ordinal Head Ablation (this work) Class + Severity No segmentation
SegLearner Upadhyay & Gupta, 2025 Disease severity Severity scoring baseline

4. Deliverables and Milestones

MS Week Deliverable Acceptance Criteria
M1 Wk 2 Dataset setup complete. DIPAug module (all 8 transforms) implemented and unit-tested. Visual QC grid submitted. pytest passes; QC grid approved by project owner.
M2 Wk 4 AAS, CIAA, and CEDB modules implemented. DIPAug pipeline benchmarked (≥120 img/s). Speed benchmark log submitted; AAS curve verified.
M3 Wk 6 DIPAug-Net architecture implemented. Baseline models (Paper 1) training complete. Training logs for all baselines submitted.
M4 Wk 9 All 6 ablation experiments (E1–E6) trained and evaluated. Paper 1 results table and figures ready. Main results table, confusion matrices, Grad-CAM overlays submitted.
M5 Wk 10 DGSM validated. SimCLR pre-training complete. Pseudo-mask generation done. DGSM IoU report; SimCLR linear probe accuracy reported.
M6 Wk 12 DIPAug-SeverNet trained. All 5 ablation experiments (S1–S5) and Paper 2 baselines complete. Full metric suite for Paper 2 submitted.
M7 Wk 14 All figures, Results Report, and complete GitHub repository submitted. Fresh-clone test passes. All acceptance criteria met.

5. Technical Requirements

5.1 Mandatory Stack

Category Required Tools / Libraries
Language Python 3.10+
Deep Learning PyTorch 2.x, torchvision, timm
Image Processing OpenCV 4.x, scikit-image, Pillow
Augmentation Albumentations (DualTransform for image+mask consistency)
SSL Pre-training lightly-train (SimCLR implementation)
Segmentation segmentation-models-pytorch
Weak Labelling segment-anything (SAM vit_b checkpoint)
Experiment Tracking Weights & Biases (WandB) — project owner will provide API key
Explainability pytorch-grad-cam
Ordinal Regression CORN loss (custom PyTorch implementation or coral-pytorch)
Statistical Testing scipy.stats (McNemar test)
Version Control Git (private GitHub repo, shared with project owner)
Config Management PyYAML or Hydra
Testing pytest (≥ 95% coverage on DIPAug modules)

5.2 Hardware Requirement

The freelancer is expected to have access to a GPU with at least 16 GB VRAM (e.g., NVIDIA RTX 3090, A5000, or equivalent cloud instance). Paper 2 training (DIPAug-SeverNet with segmentation decoder) requires at least 24 GB VRAM or gradient checkpointing enabled. Cloud credits on Google Colab Pro+, Lambda Labs, or Vast.ai are acceptable alternatives. Hardware costs are the freelancer's responsibility unless separately agreed.

5.3 Freelancer Skill Requirements

  • Proficiency in PyTorch — custom loss functions, multi-task training loops, mixed-precision training.
  • Experience with computer vision — image segmentation, classification, CNN and Transformer architectures.
  • Familiarity with image processing — OpenCV, colour space transforms, PSF-based blurring.
  • Experience with self-supervised learning (SimCLR or equivalent) is strongly preferred.
  • Ability to produce publication-quality figures (matplotlib, seaborn, or equivalent).
  • Clean coding practices — docstrings, modular design, config-driven experiments.
  • Applicants must share at least one relevant GitHub repository or project portfolio link.

6. Out of Scope (Not the Freelancer's Responsibility)

The following tasks are handled by the project owner and should NOT be included in the freelancer's bid or timeline.

  • Academic manuscript writing, formatting, or submission.
  • Literature review and related work section drafting.
  • Journal selection and correspondence with editors.
  • Expert agronomist review of borderline severity labels.
  • Acquiring additional field image datasets beyond those specified (PDSNets + Wheat Plant Disease).
  • Deploying the model to any mobile or web application.
  • Providing GPU cloud credits (unless separately agreed).

7. Communication and Reporting Protocol

  • Weekly progress update (every Monday): brief written summary of completed tasks, blockers, and planned tasks for the coming week.
  • Milestone submissions: each milestone deliverable is submitted via GitHub commit + a short Loom screen recording (5–10 min) walking through key results and any design decisions made.
  • All questions or clarification requests must be raised within 24 hours of encountering a blocker; do not spend more than 4 hours stuck on a single issue before reaching out.
  • The project owner will review milestone submissions within 3 business days and provide written feedback.
  • Revision rounds: up to 2 revision rounds per milestone are included within scope; additional revisions may incur extra cost.

8. Application Instructions

To apply for this task, please submit the following:

  1. A brief cover message (max 200 words) explaining your relevant experience with PyTorch, image segmentation, and data augmentation.
  2. A link to at least one relevant GitHub repository demonstrating your code quality and PyTorch experience.
  3. Your estimated timeline for each milestone (M1–M7) and your total project quote.
  4. Confirmation of GPU availability and specifications.
  5. Answer the screening question: "Describe in 2–3 sentences how you would implement a physics-based motion blur augmentation using OpenCV that is also compatible with Albumentations DualTransform for mask consistency."

Applications that do not include all five items above will not be reviewed.