DIPAug Project Hub
Project Title
Realistic Digital Image Processing-Driven Data Augmentation for Robust Wheat Leaf Disease Classification and Severity Scoring in Field Conditions
Short Titles
DIPAug-NetDIPAug-SeverNet
This Hugging Face app is a lightweight dashboard for the project scaffold. It helps inspect the experiment configs and repository structure before training on a proper GPU machine.
Realistic Digital Image Processing-Driven Data Augmentation for Robust Wheat Leaf Disease Classification and Severity Scoring in Field Conditions
1. Project Context and Objective
The freelancer's role is to implement, train, evaluate, and document the experimental results.
1.1 The Task Briefly
| Item | Classification | Severity Scoring |
|---|---|---|
| Short title | DIPAug-Net | DIPAug-SeverNet |
| Core task | Classify wheat leaf disease from augmented images under field conditions | Segment lesion regions and predict quantitative severity score (0–100%) |
| Key novel contribution | Physics-aware DIP augmentation framework + adaptive curriculum + dual-branch CNN-Transformer | DIP-guided saliency module + multi-task lesion segmentation + ordinal severity regression |
| Primary output | Disease class label (11 classes) | Disease class + lesion mask + severity grade (0–5) |
2. Complete Scope of Work
Everything in this section is the freelancer's responsibility. The task is broken into 6 work packages (WPs). Each WP has a defined set of code deliverables and acceptance criteria.
WP1 — Data Preparation and Augmentation Pipeline
Applies to: Both Phase 1 and Phase 2
WP1.1 Dataset Setup
- Download the Wheat Plant Disease dataset from Kaggle / Mendeley (public domain). Verify class distribution, resolution, and label quality. Report any anomalies.
- Implement a stratified 70/15/15 train-validation-test split, locked by a fixed random seed (
seed = 42) so all experiments use identical splits. - Resize all images to
384×384. Apply ImageNet normalisation (mean [0.485, 0.456, 0.406],std [0.229, 0.224, 0.225]).
WP1.2 DIPAug Module — 8 Physics-Aware Augmentations
Implement each augmentation as a callable Python class inheriting from Albumentations DualTransform (image + mask compatible). Each class must accept a probability p and an intensity parameter scaled to [0, 1].
| # | Augmentation Class | Implementation Specification | Parameter Range | Field Phenomenon |
|---|---|---|---|---|
| 1 | IlluminationGradient | Bidirectional gradient on LAB L-channel. Random sun angle [0–360°], strength [0.3–0.8] |
angle, strength | Sun angle variation |
| 2 | CastShadow | Geometric polygon (3–6 vertices) with Gaussian penumbra blur sigma [5–20px]. Area 10–40% of image. |
area, blur_sigma | Canopy shadow |
| 3 | MotionBlur | Linear PSF kernel with random angle [0–180°]. Kernel size [5–25px]. Use cv2.filter2D. |
kernel_size, angle | Camera / wind shake |
| 4 | DefocusBlur | Disk-shaped pillbox PSF via circular averaging. Radius [3–15px]. May be applied zonally. |
radius | Depth-of-field error |
| 5 | ColourTempShift | Planckian locus CCT transform via sRGB→XYZ→D65 chromatic adaptation. CCT [3200–8000K]. |
cct_kelvin | Time-of-day lighting |
| 6 | ColourFade | HSV saturation reduction [-30% to -70%] + gamma correction [0.6–1.4] in LAB L-channel. |
sat_factor, gamma | Dust / sensor degradation |
| 7 | DustOverlay | Poisson-distributed semi-transparent ellipses [N: 50–300], opacity [0.2–0.6], with Gaussian blur. |
n_particles, opacity | Field dust / debris |
| 8 | SensorNoise | Mixed Gaussian (sigma [5–30]) + Poisson shot noise + JPEG compression artefact (QF [40–90]). |
sigma, jpeg_qf | Camera noise |
WP1.3 Adaptive Augmentation Scheduler (AAS)
Implement as a PyTorch training callback. Augmentation intensity D(epoch) follows a sigmoid curriculum:
D(epoch) = D_min + (D_max − D_min) × sigmoid((epoch − T_half) / tau)
where D_min = 0.1, D_max = 1.0, T_half = 50, tau = 10. Log D(epoch) to the training dashboard at every epoch.
WP1.4 Class-Imbalance–Aware Augmentation (CIAA)
Compute per-class sample weights as:
w_c = N_total / (N_classes × N_c)
Use PyTorch WeightedRandomSampler for batch construction. Additionally, minority classes (< 15% of total) receive augmentation intensity multiplied by 1.3× compared to majority classes.
WP1.5 Acceptance Criteria — WP1
- All 8 augmentation classes are implemented, unit-tested, and pass a visual QC grid (original vs augmented at intensity levels
0.2 / 0.5 / 0.8 / 1.0for 3 sample images per class). - Augmentation pipeline runs at
≥ 120 images/secondon a single GPU (batch = 32) without becoming a training bottleneck. - Spatially consistent transforms apply identically to image and mask (verified on 20 image-mask pairs for WP4 use).
- AAS logs are visible in training dashboard and match the sigmoid curve analytically.
WP2 — Phase 1 Model: DIPAug-Net (Disease Classification)
Applies to: Phase 1 only
WP2.1 Architecture
Implement DIPAug-Net with the following components:
- CNN Branch: EfficientNet-B3 (ImageNet pre-trained via
timm). Extract feature maps at stride-8 and stride-16 spatial resolutions. - Transformer Branch: Swin-Transformer-Tiny (ImageNet pre-trained via
timm). Extract features at matching resolutions using the patch-merging hierarchy. - Cross-Attention Fusion Gate:
nn.MultiheadAttention(8 heads,embed_dim = 512). CNN features = Keys/Values; Transformer features = Queries. Output is projected to512-d. - Classification Head:
GlobalAvgPool → FC(512, GELU) → Dropout(0.4) → FC(N_class) → Softmax.
WP2.2 Loss Function
L_total = 0.6 × L_focal + 0.4 × L_weighted_CE
Use gamma = 2.0 for focal loss. Class weights computed from training set distribution. Both losses use the same class weight vector.
WP2.3 Training Protocol
| Hyperparameter | Value |
|---|---|
| Optimiser | AdamW, weight decay = 1e-4 |
| LR Schedule | Cosine annealing, initial LR = 1e-4, min LR = 1e-6 |
| Batch size | 32 |
| Epochs | 100, with early stopping patience = 15 epochs on val F1 |
| Gradient clipping | max_norm = 1.0 |
| Random seed | 42 (all experiments) |
| Mixed precision | torch.cuda.amp (FP16) |
WP2.4 Ablation Study — 6 Configurations
Train all 6 ablation configurations under identical protocols. Only the components listed differ across runs:
| Exp | Description | DIPAug | AAS | Dual-Branch | CIAA Loss |
|---|---|---|---|---|---|
| E1 | EfficientNet-B3, no augmentation | — | — | — | — |
| E2 | EfficientNet-B3, geometric aug only (flip/rotate/crop) | Geometric | — | — | — |
| E3 | EfficientNet-B3 + full DIPAug, fixed intensity | Yes | — | — | — |
| E4 | EfficientNet-B3 + DIPAug + AAS | Yes | Yes | — | — |
| E5 | Dual-Branch + DIPAug, no AAS, no CIAA | Yes | — | Yes | — |
| E6 | DIPAug-Net FULL — all components | Yes | Yes | Yes | Yes |
WP2.5 CEDB Robustness Evaluation
Implement the Controlled Extreme Distortion Benchmark (CEDB): apply each of the 8 distortion types at 5 severity levels (0.2 / 0.4 / 0.6 / 0.8 / 1.0) to the held-out test set (distortions applied post-training, images not seen during training). Report per-class F1 and overall accuracy under every (distortion, severity) combination. Generate a radar chart and degradation curve per distortion type.
WP2.6 Explainability
Generate Grad-CAM++ saliency maps (pytorch-grad-cam library) for a stratified sample of 50 test images (10 per class), including 5 correctly and 5 incorrectly classified examples per class. Save overlay images at publication quality (300 dpi). Verify qualitatively that activations concentrate on lesion regions.
WP2.7 Acceptance Criteria — WP2
- All 6 ablation experiments complete and metrics logged. E6 (full model) must achieve overall accuracy
≥ 95%on the clean test set. - CEDB benchmark runs without errors; outputs a structured CSV (
distortion, severity, class, accuracy, F1). - Grad-CAM++ overlays generated for all the sample images; saved as publication-quality PNGs.
- Training logs (
loss,accuracy,val-F1) exported in CSV and visualised as plots. - McNemar statistical test comparing E6 vs each baseline; p-values reported.
WP3 — Phase 2 Extensions: Contrastive Pre-Training and DGSM
Applies to: Phase 2 only (built on top of WP1 and WP2 codebase)
WP3.1 DIP-Guided Saliency Module (DGSM)
Implement as a non-trainable OpenCV-based preprocessing module:
- Convert input image to LAB and HSV colour spaces.
- Apply adaptive thresholding on the A-channel (LAB) to isolate yellow/brown lesion regions; on the S-channel (HSV) to capture desaturated necrotic zones.
- Apply morphological closing (
7×7kernel) + connected component filtering (remove components< 50px). - Gaussian blur the binary mask (
sigma = 15px) to produce a soft saliency mapM_saliencyin[0, 1]. - Concatenate
M_saliencyas a 4th channel to the RGB input before the encoder.
Validate DGSM on 50 image-mask pairs from the segmentation dataset: compute IoU between DGSM output and ground truth mask. Target: mean IoU ≥ 0.45 as a prior (not the final prediction).
WP3.2 Augmentation-Consistent Contrastive Pre-Training
Implement SimCLR pre-training using the lightly-train library:
- Backbone: EfficientNet-B4 (
timm, ImageNet init). Projection head:MLP (2048 → 512 → 256). Temperaturetau = 0.07. - Positive pairs: two independently sampled DIPAug transforms applied to the same image (no class labels used).
- Training: 50 epochs,
batch = 256(use all available training images), AdamWLR = 3e-4, cosine decay. - After pre-training, evaluate representation quality via a linear probe: freeze backbone, train a single linear classifier, report accuracy.
- Save pre-trained backbone weights as a checkpoint for use in WP4.
WP3.3 Acceptance Criteria — WP3
- DGSM module produces saliency maps for all training images without errors; mean IoU
≥ 0.45on validation sample. - SimCLR pre-training completes; linear probe accuracy
≥ 60%(confirming informative representation). - Pre-trained checkpoint saved and loadable for WP4.
WP4 — Phase 2 Model: DIPAug-SeverNet (Severity Scoring)
Applies to: Phase 2 only
WP4.1 Dataset Preparation for Segmentation
- Primary: Obtain PDSNets wheat stripe rust field dataset (Shi et al., Agriculture 2025) which includes pixel-level masks.
- Secondary (weak labels): Apply DGSM + SAM (Meta Segment Anything Model,
vit_bcheckpoint) to generate pseudo-masks for the classification-only Wheat Plant Disease dataset. Use DGSM bounding boxes as SAM spatial prompts. Save refined pseudo-masks. - Severity Grade Assignment: Compute severity
% = (lesion pixels / total leaf pixels)from the ground truth/pseudo mask. Map to Grade0–5using thresholds:0%,1–10%,11–25%,26–50%,51–75%,>75%. - Flag images within
3%of grade boundaries asborderlineand log them for potential expert review.
WP4.2 Architecture — DIPAug-SeverNet
- Shared Encoder: EfficientNet-B4, initialised from the SimCLR checkpoint (WP3.2). Fine-tune upper 40% of layers; freeze lower 60%.
- Segmentation Decoder: U-Net-style decoder with skip connections from 4 encoder stages. Final output: binary lesion mask at original resolution (
H×W). Loss:0.5 × BCE + 0.5 × Dice. - Severity Head: Lesion-weighted feature =
GlobalAvgPool(F_enc × M_pred) + GlobalAvgPool(F_enc). Feed to 3-layer MLP (1024→256→64): (i) sigmoid output for continuous severity scores in [0,1](MSE loss), and (ii) ordinal output with 5 thresholds using CORN ordinal regression (ordinal CE loss). - Uncertainty-Weighted Multi-Task Loss (Kendall et al.):
L_total = sum_i [ (1 / 2σ_i²) × L_i + log σ_i ], whereσ_iare learnable log-uncertainty parameters for the 4 tasks: segmentation, classification, severity regression, severity ordinal.
WP4.3 Training Protocol
- Stage 1 (Warm-up, 20 epochs): Activate segmentation loss only. Freeze severity head. Allow encoder fine-tuning from epoch 10 onward.
- Stage 2 (Joint MTL, 80 epochs): Activate all losses with uncertainty weighting. Same hyperparameters as WP2.3.
- Log all 4 individual task losses and
σ_ivalues at every epoch.
WP4.4 Ablation Study — 5 Configurations
| Exp | Description | DGSM | SimCLR | Seg Decoder | MTL Loss |
|---|---|---|---|---|---|
| S1 | EfficientNet-B4 + severity head only (no seg) | — | — | — | CE only |
| S2 | S1 + Segmentation Decoder (no DGSM, no SimCLR) | — | — | Yes | BCE+Dice |
| S3 | S2 + DGSM saliency input | Yes | — | Yes | BCE+Dice |
| S4 | S3 + SimCLR pre-training (no uncertainty MTL) | Yes | Yes | Yes | Fixed weights |
| S5 | DIPAug-SeverNet FULL — all components | Yes | Yes | Yes | Uncertainty MTL |
WP4.5 Acceptance Criteria — WP4
- S5 (full model) achieves: segmentation
mIoU ≥ 0.60, severityMAE ≤ 12%, severityQWK ≥ 0.70, classificationF1 ≥ 0.85. - All
σ_iuncertainty weights converge (logged curves saved). - Pseudo-mask generation pipeline completes for all images and pseudo-masks pass visual QC (random 30-image sample).
WP5 — Evaluation, Visualisation, and Results Package
Applies to: Both Phases
WP5.1 Metrics to Compute and Report
Phase 1 Metrics:
- Overall accuracy, macro Precision / Recall / F1, Cohen's Kappa, per-class AUC-ROC.
- Confusion matrix (normalised, publication-quality heatmap).
- CEDB robustness CSV + radar chart + per-distortion degradation line plots.
- McNemar test p-values vs all baselines.
- Inference time (
ms/image, averaged over 100 runs) and model parameter count.
Phase 2 Metrics:
- All Paper 1 metrics (for the classification head of DIPAug-SeverNet).
- Segmentation: IoU, Dice, Pixel Accuracy, mIoU, Boundary F1.
- Severity regression: Pearson r, MAE, RMSE between predicted and ground truth severity %.
- Severity ordinal: Quadratic Weighted Kappa (QWK), per-grade accuracy.
- CEDB severity degradation for each metric under all 8 distortion types.
- Relative Performance Retention (RPR) =
metric_augmented / metric_cleanper distortion.
WP5.2 Figures to Produce (Publication-Quality, 300 dpi)
- DIPAug visualisation grid: 8 rows × 5 columns (augmentation type × intensity level), 3 disease classes shown.
- AAS sigmoid schedule plot (D vs epoch).
- DIPAug-Net architecture diagram (clear block diagram, export as PNG/SVG).
- DIPAug-SeverNet architecture diagram (encoder + dual decoder + severity head).
- DGSM pipeline figure: RGB input → LAB/HSV processing → saliency map → 4-channel input.
- Spatially consistent augmentation pair:
[image | mask | augmented image | augmented mask]. - Confusion matrices for Paper 1 (all 6 ablations + baselines).
- CEDB radar chart and degradation curves (Paper 1 and Paper 2).
- Grad-CAM++ overlays — 50 images for Paper 1, 30 images for Paper 2.
- Severity grade visualisation grid: 5 disease classes × 6 severity grades, with predicted vs ground truth overlaid.
- Severity score scatter plot: predicted vs ground truth (
r²annotated). - SimCLR linear probe accuracy learning curve.
σ_iuncertainty weight convergence plots (WP4 training).
WP5.3 Results Report
Deliver a structured Results Report (PDF or DOCX) with all quantitative results. The report must be organised as:
- experiment configuration tables
- main results tables
- all figures listed in WP5.2
- ablation analysis commentary (2–3 sentences per ablation row)
- CEDB analysis section
- failure case analysis — minimum 5 examples with explanation per paper
This report is the primary input to manuscript drafting.
WP6 — Deliverables
WP6.1 Repository Structure
dipaugnet/— Paper 1 model, training, evaluation scriptsdipaugsevernet/— Paper 2 model, training, evaluation scriptsdipauglib/— Shared DIPAug augmentation modules (WP1)notebooks/— Jupyter notebooks for QC visualisations and result analysisconfigs/— YAML configuration files for all 11 experiments (E1–E6,S1–S5)results/— CSV files, metric summaries, and saved model checkpointsfigures/— All publication-quality figures (PNG + SVG where applicable)README.md— Full setup and execution guiderequirements.txt— Pinned dependency versionsRESULTS_REPORT.docx— Structured results report (WP5.3)
WP6.2 Code Quality Requirements
- All modules have docstrings (Google style).
- Unit tests for all 8 DIPAug transforms (
pytest), covering edge cases (all-black image, single-pixel, mask consistency). - Config-driven training: all hyperparameters loaded from YAML, not hardcoded.
- Reproducibility: fix all random seeds in PyTorch, NumPy, and Python random; document CUDA version and GPU used.
- No GPU-specific hardcoded paths; code must run on any CUDA-enabled machine given the same dataset path.
WP6.3 Acceptance Criteria — WP6
- Repository passes a fresh-clone test: setup → data download → single training run → evaluation → figure export, all without manual intervention beyond dataset path configuration.
- All 11 training experiments are reproducible from the provided YAML configs.
pytestunit tests pass with≥ 95%test coverage on DIPAug modules.
3. Baseline Models to Reproduce
The freelancer must train and evaluate the following baselines on the same dataset splits and report results in the same metric tables as the proposed models. Source code for baselines may be adapted from official repositories; full attribution required.
3.1 Paper 1 Baselines
| Model | Reference | Architecture | Augmentation |
|---|---|---|---|
| ResNet-50 + Generic Aug | He et al., CVPR 2016 | ResNet-50 | Flip / rotate / crop |
| EfficientNet-B3 (no aug) | Tan & Le, ICML 2019 | EfficientNet-B3 | None |
| EfficientNet-B3 + Generic Aug | Tan & Le, 2019 | EfficientNet-B3 | Standard geometric |
| SC-ConvNeXt | Dong et al., Sci Rep 2024 | SimCLR + CBAM + ConvNeXt | Generic |
| GLNet | Li et al., Front. Plant Sci. 2024 | Global-Local CNN | Generic |
| ViT Multi-level Contrast | Khalid et al., Sci Rep 2025 | Vision Transformer | Standard |
| CropNet | Bhola, AgriEngineering 2024 | Shallow CNN + Transfer | Standard |
3.2 Paper 2 Baselines
| Model | Reference | Task | Notes |
|---|---|---|---|
| U-Net (standard) | Ronneberger et al., 2015 | Segmentation | ResNet-34 encoder |
| Attention U-Net | Oktay et al., 2018 | Segmentation | Attention gates |
| DeepLabV3+ (ResNet-50) | Chen et al., ECCV 2018 | Segmentation | ASPP module |
| PDSNets (LinkNet-ResNet18) | Shi et al., Agriculture 2025 | Field disease seg. | Closest field SOTA |
| DIPAug-Net + Ordinal Head | Ablation (this work) | Class + Severity | No segmentation |
| SegLearner | Upadhyay & Gupta, 2025 | Disease severity | Severity scoring baseline |
4. Deliverables and Milestones
| MS | Week | Deliverable | Acceptance Criteria |
|---|---|---|---|
| M1 | Wk 2 | Dataset setup complete. DIPAug module (all 8 transforms) implemented and unit-tested. Visual QC grid submitted. | pytest passes; QC grid approved by project owner. |
| M2 | Wk 4 | AAS, CIAA, and CEDB modules implemented. DIPAug pipeline benchmarked (≥120 img/s). |
Speed benchmark log submitted; AAS curve verified. |
| M3 | Wk 6 | DIPAug-Net architecture implemented. Baseline models (Paper 1) training complete. | Training logs for all baselines submitted. |
| M4 | Wk 9 | All 6 ablation experiments (E1–E6) trained and evaluated. Paper 1 results table and figures ready. |
Main results table, confusion matrices, Grad-CAM overlays submitted. |
| M5 | Wk 10 | DGSM validated. SimCLR pre-training complete. Pseudo-mask generation done. | DGSM IoU report; SimCLR linear probe accuracy reported. |
| M6 | Wk 12 | DIPAug-SeverNet trained. All 5 ablation experiments (S1–S5) and Paper 2 baselines complete. |
Full metric suite for Paper 2 submitted. |
| M7 | Wk 14 | All figures, Results Report, and complete GitHub repository submitted. | Fresh-clone test passes. All acceptance criteria met. |
5. Technical Requirements
5.1 Mandatory Stack
| Category | Required Tools / Libraries |
|---|---|
| Language | Python 3.10+ |
| Deep Learning | PyTorch 2.x, torchvision, timm |
| Image Processing | OpenCV 4.x, scikit-image, Pillow |
| Augmentation | Albumentations (DualTransform for image+mask consistency) |
| SSL Pre-training | lightly-train (SimCLR implementation) |
| Segmentation | segmentation-models-pytorch |
| Weak Labelling | segment-anything (SAM vit_b checkpoint) |
| Experiment Tracking | Weights & Biases (WandB) — project owner will provide API key |
| Explainability | pytorch-grad-cam |
| Ordinal Regression | CORN loss (custom PyTorch implementation or coral-pytorch) |
| Statistical Testing | scipy.stats (McNemar test) |
| Version Control | Git (private GitHub repo, shared with project owner) |
| Config Management | PyYAML or Hydra |
| Testing | pytest (≥ 95% coverage on DIPAug modules) |
5.2 Hardware Requirement
The freelancer is expected to have access to a GPU with at least 16 GB VRAM (e.g., NVIDIA RTX 3090, A5000, or equivalent cloud instance). Paper 2 training (DIPAug-SeverNet with segmentation decoder) requires at least 24 GB VRAM or gradient checkpointing enabled. Cloud credits on Google Colab Pro+, Lambda Labs, or Vast.ai are acceptable alternatives. Hardware costs are the freelancer's responsibility unless separately agreed.
5.3 Freelancer Skill Requirements
- Proficiency in PyTorch — custom loss functions, multi-task training loops, mixed-precision training.
- Experience with computer vision — image segmentation, classification, CNN and Transformer architectures.
- Familiarity with image processing — OpenCV, colour space transforms, PSF-based blurring.
- Experience with self-supervised learning (SimCLR or equivalent) is strongly preferred.
- Ability to produce publication-quality figures (matplotlib, seaborn, or equivalent).
- Clean coding practices — docstrings, modular design, config-driven experiments.
- Applicants must share at least one relevant GitHub repository or project portfolio link.
6. Out of Scope (Not the Freelancer's Responsibility)
The following tasks are handled by the project owner and should NOT be included in the freelancer's bid or timeline.
- Academic manuscript writing, formatting, or submission.
- Literature review and related work section drafting.
- Journal selection and correspondence with editors.
- Expert agronomist review of borderline severity labels.
- Acquiring additional field image datasets beyond those specified (PDSNets + Wheat Plant Disease).
- Deploying the model to any mobile or web application.
- Providing GPU cloud credits (unless separately agreed).
7. Communication and Reporting Protocol
- Weekly progress update (every Monday): brief written summary of completed tasks, blockers, and planned tasks for the coming week.
- Milestone submissions: each milestone deliverable is submitted via GitHub commit + a short Loom screen recording (5–10 min) walking through key results and any design decisions made.
- All questions or clarification requests must be raised within 24 hours of encountering a blocker; do not spend more than 4 hours stuck on a single issue before reaching out.
- The project owner will review milestone submissions within 3 business days and provide written feedback.
- Revision rounds: up to 2 revision rounds per milestone are included within scope; additional revisions may incur extra cost.
8. Application Instructions
To apply for this task, please submit the following:
- A brief cover message (max 200 words) explaining your relevant experience with PyTorch, image segmentation, and data augmentation.
- A link to at least one relevant GitHub repository demonstrating your code quality and PyTorch experience.
- Your estimated timeline for each milestone (
M1–M7) and your total project quote. - Confirmation of GPU availability and specifications.
- Answer the screening question: "Describe in 2–3 sentences how you would implement a physics-based motion blur augmentation using OpenCV that is also compatible with Albumentations DualTransform for mask consistency."
Applications that do not include all five items above will not be reviewed.