The recently released Xiaomi 17 Ultra Leica Special Edition features an exclusive “Leica Instant” imaging mode.

  1. It’s not a filter, but a style model. “Leica One Moment” is not a simple addition of filters like in ordinary phones, but a reconstruction of the style model at the level of the image processing pipeline. It is the result of end-to-end model training and modeling of real film/camera samples.

  2. Simulate the image quality style of a real classic Leica camera. “Leica Instant” offers two styles:

  • Leica M9 CCD Simulation Mode – Reproduces the color and tone of Leica M9 camera CCD images, bringing rich color layers and soft detail texture, rather than just color adjustment.
  • Leica M3+MONOPAN 50 film black and white mode – simulates the black and white tone effects produced by the classic Leica M3 camera combined with MONOPAN 50 film, with realistic brightness transitions and grainy texture, rather than simple color fading like ordinary black and white filters.
  1. Deep computing photography model implements style transfer. Color and tone are not preset filters, but end side models trained on real Leica M9 and M3 sample data, which can be understood as “style transfer models”. It maps the input RAW image to Leica style output while minimizing algorithmic flavor and maintaining natural texture.

The author has a Fuji XS20 and has taken a large number of photos using Fuji CC or NC filters, while retaining JPG and RAF format files. These RAF and corresponding target style JPG training data are very suitable for training a “Fuji Instant” model. This is very similar to Xiaomi’s “Leica One Moment” training model using Leica M9/M3 samples, which learns a photography style mapping and ultimately expects to process RAF or raw format photos through the model to output JPG format photos with Fuji filter style. We first conduct feasibility verification based on the Pix2Pix project

Here is the training route recommended by GPT:

  • Style transfer/image to image networks: such as using GAN (e.g. CycleGAN, Pix2Pix) or variant networks (e.g. StyleGAN, VAE/GAN combination)
    These models can learn the mapping from RAW space to target style space.
  • Conditional generation model: Use conditional on RAW input to generate a model that outputs Fujifilm JPG style images.
  • Enhance understanding of light, color, and film grain
    Fujifilm style often involves characteristic mapping of bright/dark colors, which may require designing custom losses (such as perceptual loss, color histogram loss, etc.) to preserve style details.

Due to Pix2Pix’s strict one-to-one (paired) image to image model, data such as raf and Fujifilm internal jpg, which have a one-to-one correspondence of the same name, are the ideal usage scenarios for Pix2Pix. Therefore, we started with Pix2Pix.

The approximate amount of data required given by Gpt:

Level 0: Validation effectiveness

200-300 pairs of RAW+JPG

Can achieve:

  • The model clearly learned Fuji style
  • Color direction, contrast, and overall appearance are similar
  • There may be artifacts locally
  • High light/dark areas may be unstable

Suitable for the first time running through the pipeline

Not recommended for less than 150 pairs, Pix2Pix may overfit to the ‘template color’.

Level 1: Comparable and usable

800-1500 pairs (recommended)

Can achieve:

  • Very close to the built-in JPG
  • Stable performance in different scenarios (indoor/outdoor/cloudy)
  • The transition of particles and colors begins to resemble Fuji
  • Comparing RAW with JPG in the training set, the difference is acceptable to the naked eye

Let’s start with the simplest, but it should be noted that raw or raf files cannot be directly used for model training and must be preprocessed to convert RAW to a unified ‘photographic input space’.

Process:

  1. Rawpy reads RAW data
  2. No curve, no gamma
  3. White balance (or fixed)
  4. Convert to linear RGB
  5. Normalize to [0,1]

Otherwise, the model learns chaotic mappings of different exposures or cameras.

Overall process:

Fuji RAW (.RAF / .DNG)

        ↓  rawpy

Linear RGB (float32)

        ↓ resize / crop

256x256 tensor

        ↓ Pix2Pix Generator

Fake Fuji JPG
  1. First, prepare a fuji_instant folder, and then create jpg and raf folders to store the original dataset.

Attention

  • The file names of raf and jpg must be exactly the same
  • Level 0 does not perform data augmentation
  1. Then prepare the script preprocess_1evel0.py for preprocessing the data.
import rawpy
import numpy as np
from PIL import Image
from pathlib import Path
from tqdm import tqdm

RAW_DIR = Path("/mnt/smb_share/fuji_instant/raf")
JPG_DIR = Path("/mnt/smb_share/fuji_instant/jpg")

OUT_ROOT = Path("pytorch-CycleGAN-and-pix2pix/datasets/fuji_level0")
TRAIN_A = OUT_ROOT / "train/A"
TRAIN_B = OUT_ROOT / "train/B"

TRAIN_A.mkdir(parents=True, exist_ok=True)
TRAIN_B.mkdir(parents=True, exist_ok=True)

SIZE = 256

def process_raw(path):
    with rawpy.imread(str(path)) as raw:   # ← 关键修复点
        rgb = raw.postprocess(
            use_camera_wb=True,
            no_auto_bright=True,
            output_bps=16
        )
    rgb = rgb.astype(np.float32) / 65535.0
    img = Image.fromarray((rgb * 255).astype(np.uint8))
    return img.resize((SIZE, SIZE), Image.BICUBIC)

def process_jpg(path):
    img = Image.open(path).convert("RGB")
    return img.resize((SIZE, SIZE), Image.BICUBIC)

for raw_path in tqdm(sorted(RAW_DIR.glob("*"))):
    name = raw_path.stem
    jpg_path = JPG_DIR / f"{name}.jpg"
    if not jpg_path.exists():
        continue

    process_raw(raw_path).save(TRAIN_A / f"{name}.png")
    process_jpg(jpg_path).save(TRAIN_B / f"{name}.png")

print("✅ Level 0 preprocess done")

python3 reprocess_level0.py

Generated data path

  1. Prepare Pix2Pix and install related dependencies
git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git

cd pytorch-CycleGAN-and-pix2pix

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install numpy pillow opencv-python visdom dominate tqdm rawpy imageio
  1. Then start level 0 training
python train.py \

  --dataroot datasets/fuji_level0 \

  --name fuji_pix2pix_level0 \

  --model pix2pix \

  --direction AtoB \

  --netG unet_256 \

  --batch_size 1 \

  --gpu_ids 0 \

  --display_freq 200 \

  --print_freq 200 \

  --save_epoch_freq 10 \

  --n_epochs 100 \

  --n_epochs_decay 50

You will see three images in the output path:

File Meaning
xxx-real-A.png RAW Input
xxx_fake_B. png Model Output
xxx-real-B.png Fuji Internal JPG
     

Result Sample

The following is the evaluation conclusion of GPT on training logs:

This is a normal Pix2Pix Level 0 training log that has successfully completed the core process of Level 0.

1、 Conclusion

Current training status: Normal, worth continuing to run

  • No mode collapse
  • No D/G crashes
  • L1 fluctuates within a reasonable range
  • GAN loss is not one-sided

2、 Interpretation of the first round of data:

(epoch: 1, iters: 200)

G_GAN: 2.306

G_L1: 23.450

D_real: 0.069

D_fake: 0.239
  1. G1 (Important)
  • Initial stage: 20-50 is normal
  • Now: fluctuating between 7 and 26

Very healthy, the model is rapidly learning the main color scheme of “RAW → Fuji JPG”; Comparison

  • G_GAN
  • Normal range 0.5~3.0
  • Now: 0.6~2.6

GAN loss is already shaking
This is not a regression issue.

  • D_real/D_fake (discriminator state)

Now:

  • Sometimes D_real ≈ 0.06 (D is very confident)
  • Sometimes D_real>; 1 (D was deceived)
  • D_fake travels back and forth from 0 to 1

This is the ideal state of Pix2Pix. Explanation:

  • Discriminator and generator in sawing
  • Neither side completely crushed it

The author observed with the naked eye that the color of epoch 20 is similar, but there is a mosaic like effect that is not very clear, and epoch 150 has this issue. This is not a training failure, nor is it an issue with your data, but rather a structural flaw in Pix2Pix’s RAW to JPG task.

1、 Mosaic feeling

There are these characteristics:

  • Block texture on the screen
  • Partial image is cut into “8 × 8/16 × 16” blocks
  • Edge not sharp
  • Texture like “apply+block”

This is not a low resolution issue, but a known problem with Pix2Pix (U-Net+PatchGAN) in photography tasks. Because PatchGAN is a “block level discriminator”, the discriminator only looks at 70 × 70 patches and only cares about local similarity, not global continuity. The model will learn that ‘every block is like Fuji’, but it does not care about the continuity between blocks (overall effect). Simultaneously, RAW to JPG is a continuous tone mapping issue, while Pix2Pix excels in:

  • Semantic Translation (Ma) ↔ Zebra)
  • Tasks with obvious edges

But what is needed in photography is:

  • Smooth brightness transition
  • Continuous color curve
  • Natural preservation of high-frequency textures

Pix2Pix is not good at this.

  • 256 × 256+resizing itself will amplify the issue
  • resize → bicubic
  • GAN learned about the ‘fake texture’ after resizing
  • In the end, it looks like a combination of smearing and mosaic

The same issue occurred during the training up to epoch 150, which is not due to insufficient data volume, but rather an incorrect expression of the model. But we have confirmed that 1. the color is close to Fuji’s filter and 2. the style direction is correct. This indicates that the data and pipeline are correct, but the problem lies in the model structure. Level 0 ends here. It verifies the feasibility of converting RAF to Fujifilm style JPG and helps us find the boundary of the problem. At the same time, we noticed that the output sample had a problem of mismatched cutting areas, which we also need to solve in the subsequent training process.

Note: During the process, it was discovered that mounting the NAS path on Smb was unsuccessful. Later, it was discovered that the password length for Smb users cannot exceed 64!!!!

Next article: Leica for a moment? Fuji for a moment! Level 1 Training

Published on 20260118.

Leave a Reply