Background: Why ShiVa3D Projects Fail in Non-Obvious Ways at Scale

ShiVa3D couples a visual Authoring Tool with a portable runtime, a Lua-based scripting layer, a scenegraph/resource system, and a multi-platform export pipeline. For small scenes, the defaults are sensible. At enterprise scale—tens of thousands of assets, large AI graphs, multiple scenes/levels, and live-ops—edge cases surface:

  • Data scale: Asset bloat increases IO pressure and raises the cost of serialization/deserialization during scene loads.
  • Interplay of Lua and engine code: Tight per-frame Lua loops, accidental allocations, and reflection-heavy calls produce GC thrash and frame spikes.
  • Platform variance: Different GL/Metal/Vulkan backends, shader permutations, and file systems (case sensitivity, path length) act as failure multipliers.
  • Export pipeline drift: Authoring Tool profiles, plugin binaries, and project metadata can become misaligned across branches, leading to subtle runtime differences.

The immediate consequence is instability that looks random: a specific handset stutters only in one level; a Linux dedicated server leaks memory; a Windows export crashes on alt-tab; iOS builds hitch when resuming from background. These are usually systemic, not random.

ShiVa Architecture: The Moving Parts That Matter for Troubleshooting

Runtime and Script Layer

ShiVa exposes engine features via a Lua API. Scripts attach to objects, AI models, and HUDs, running as event-driven callbacks or per-frame updates. The Lua VM is fast but sensitive to allocation patterns and upvalues captured in closures. The engine provides services—scenegraph, animation, physics, audio, networking—invoked from Lua and implemented natively.

Scenegraph and Resources

Scenes reference models, textures, materials, animations, and sounds. Loading resolves references, creates GPU resources, and registers objects in update lists. Poorly scoped scene hierarchies, heavy materials, and oversubscribed update lists drive time spent outside your game code.

Authoring and Export

The Authoring Tool compiles assets, packages binary data, and emits platform targets. Each target can carry platform-specific code (plugins), graphics settings, and manifest/entitlement data. Mismatch here causes platform-only bugs.

Networking

ShiVa's networking layer supports client/server synchronization and RPC-like messaging through Lua. Round-trip costs, packet coalescing, and serialization overhead often surface in multiplayer scenes with many replicated entities.

Diagnostics: A Methodical Playbook

1) Establish a Known-Good Baseline

Create a minimal branch that contains: fixed Authoring Tool version, frozen plugin builds, deterministic graphics settings, and a tiny scene that reproduces the symptom. This isolates pipeline drift from code issues.

#
# Pseudo-steps for a reproducible baseline
#
git checkout -b shiva-baseline
# lock tool/SDK versions in docs/versions.txt
echo \u0022Authoring: 2.x.y\nLua runtime: project-config vA.B\nPlugins: hash 1a2b3c\u0022 > docs/versions.txt
# disable dynamic content loading
echo \u0022STREAMING=OFF\u0022 > config/build.profile
# narrow scene to only the problematic assets
shiva-tool scene copy Level_17 Level_17_min
# export with deterministic options
shiva-tool export --profile=prod-locked --deterministic

2) Instrument Frame Time and Allocations

Use per-frame timers and allocation counters. Instrument hot Lua callbacks and expensive engine calls. Track time in Lua vs time in engine to identify which side drives spikes.

--
-- Lightweight Lua profiler for ShiVa scripts
--
local t0, t1, acc = 0, 0, 0
local frames = 0
function onEnterFrame()
  t0 = system.getTimer()
  -- update AI, input, gameplay
  gameplay.update()
  t1 = system.getTimer()
  acc = acc + (t1 - t0)
  frames = frames + 1
  if frames % 60 == 0 then
    log.message(\"LuaFrameAvg:\", acc / 60)
    acc = 0
  end
end

3) Profile Scenegraph Pressure

Count active objects, components, and animations. A burst of dynamic entity creation/destruction can fragment engine lists and create cache-unfriendly update orders.

-- Count nodes and running animations
local total, anim = scene.getObjectCount(this.getCurrentScene()), 0
for i=0, total-1 do
  local h = scene.getObjectAt(this.getCurrentScene(), i)
  if object.getAnimationState(h) == 1 then anim = anim + 1 end
end
log.message(\"SceneObjects:\", total, \" AnimRunning:\", anim)

4) Confirm Export Parity

Export the same scene to all target platforms with identical content hashes. If only one platform diverges, examine graphics backend and filesystem behavior. Case mismatches (\"Texture.png\" vs \"texture.png\") are notorious on Linux and Android.

5) Audit Lua GC and Table Growth

Periodic spikes often correlate with garbage collection. Track object lifetimes and table sizes. Convert per-frame temporary tables into preallocated arrays.

-- Inspect Lua memory (approx)
if collectgarbage then
  local kb = collectgarbage(\"count\")
  if kb % 1024 < 1 then log.message(\"LuaKB:\", kb) end
end

6) Network Round-Trip and Serialization

Log payload sizes and message rates. Coalesce small messages and throttle broadcast frequency. Validate that serialization code does not allocate fresh tables for every packet.

-- Simple network send wrapper with rate tracking
local bytesOut, msgs = 0, 0
function net.send(channel, payload)
  bytesOut = bytesOut + string.len(payload)
  msgs = msgs + 1
  if msgs % 60 == 0 then
    log.message(\"NetOut bytes:\", bytesOut, \" msgs:\", msgs)
    bytesOut, msgs = 0, 0
  end
  network.send(channel, payload)
end

Symptoms → Root Causes: Mapping the Usual Suspects

Symptom A: Random 80–150 ms Frame Spikes on Mobile Only

  • Likely: Lua GC major cycles triggered by short-lived allocations in tight loops.
  • Also: Texture paging due to oversized atlases; background-to-foreground event reinitializing resources.

Fix: Preallocate tables; reuse buffers; call collectgarbage(\"step\") in controlled micro-steps during loading screens; split mega-atlases per scene; verify texture compression per GPU family.

Symptom B: Scene Loads Hang at 90% on Certain Platforms

  • Likely: A missing or case-mismatched asset causes a blocking retry loop in the resource resolver.
  • Also: Plugin dependency failing to load, shader compile stall on first use.

Fix: Enable verbose asset resolves in the export; run a pre-export validator that ensures path case normalization and dependency presence; pre-warm shaders on splash scenes.

Symptom C: Multiplayer Actors 'Rubber-Band' Under Load

  • Likely: Server tick too low relative to client interpolation window; per-actor delta serialization allocates per packet.
  • Also: Nagle's algorithm or OS-level coalescing fights custom coalescing logic.

Fix: Raise server tick to match client interpolation budget; batch deltas; pool serialization buffers; apply timewarp smoothing on client; disable conflicting socket options where appropriate.

Symptom D: Intermittent Crashes on Windows Export Only

  • Likely: Plugin ABI mismatch across projects; memory overwritten by out-of-bounds vertex streams with D3D-only layout.
  • Also: Threading primitive misuse visible only under that scheduler.

Fix: Rebuild all plugins with exactly the target runtime and compiler; add bounds checks in vertex/index streaming; run with debug graphics validation layers if available in your build of ShiVa runtime.

Step-by-Step Fixes: From Quick Wins to Architectural Refactors

1) Tame Lua GC and Allocation Hotspots

Switch from ephemeral tables to reusable arrays or structs held in upvalues. Avoid string concatenation in per-frame loops; use table buffers. Move high-allocation logic to loading screens and run incremental GC during low-load windows.

-- BEFORE: allocates table each frame
function updateProjectiles(list)
  local out = {}
  for i=1,#list do
    out[i] = compute(list[i])
  end
  return out
end

-- AFTER: reuse buffer
local outBuf = {}
function updateProjectiles(list)
  local n = #list
  for i=1,n do outBuf[i] = compute(list[i]) end
  for i=n+1,#outBuf do outBuf[i] = nil end
  return outBuf
end

-- GC pacing during loads
for i=1,200 do collectgarbage(\"step\", 2000) end

2) Normalize Assets and Prevent Streaming Stalls

Introduce a pre-export validation step that enforces naming rules, checks for duplicates and case-collisions, validates texture dimensions (power-of-two where required), and ensures compression formats per platform.

#
# Pre-export validator (pseudo)
#
python tools/validate_assets.py \
  --root Assets/ \
  --rules rulesets/mobile.yml \
  --fail-on-case-collision \
  --check-texture-formats

# rulesets/mobile.yml excerpt
textures:
  maxSize: 2048
  formats: [PVRTC, ETC2, ASTC]
names:
  forbidSpaces: true
  lowercase: true

3) Stabilize the Export Pipeline

Pin the Authoring Tool version and embed a manifest that records plugin hashes, graphics settings, and resource compiler options. The export job should fail fast if any drift is detected.

# CI snippet
shiva-tool export --profile=android-prod --manifest out/export.manifest
python tools/verify_manifest.py out/export.manifest expected/export.manifest

4) Scenegraph Hygiene: Reduce Update Fan-Out

Group frequently updated objects under dedicated parents to localize transform updates. Avoid deep hierarchies for animated props. Prefer state machines that suspend updates when off-screen.

-- Suspend expensive updates when culled
function onEnterFrame()
  if not camera.isObjectVisible(this.getViewCamera(), this.getObject()) then
    if this.isActive() then this.deactivate() end
    return
  end
  if not this.isActive() then this.activate() end
  this.tick()
end

5) Network Budgeting and Deterministic Transforms

Send input deltas, not absolute states. Quantize transforms to fixed-point to reduce bandwidth and improve determinism. Interpolate on the client with a bounded buffer.

-- Fixed-point position packing
local SCALE = 100
local function packVec3(x,y,z)
  return string.pack(\"iii\", math.floor(x*SCALE), math.floor(y*SCALE), math.floor(z*SCALE))
end
local function unpackVec3(s)
  local ix,iy,iz = string.unpack(\"iii\", s)
  return ix/SCALE, iy/SCALE, iz/SCALE
end

6) Crash Forensics: Symbolization and Triage

Make crash reports actionable: include build ID, platform, scene name, last 64 log lines, and a ring buffer of recent engine events. Tie every export to a symbol package that your crash server can use.

# Produce symbols (platform-specific) and upload
shiva-tool symbols export --target=win64 --out=symbols.zip
curl -F file=@symbols.zip https://crash.example.com/upload

7) Plugin ABI Discipline

Version your plugin interface explicitly. Avoid relying on compiler defaults; enforce calling conventions and alignment. Provide a tiny runtime smoke test that loads each plugin and exercises a known function before packaging.

// C++ (plugin header)
extern \"C\" __declspec(dllexport) int SHIVA_Plugin_Version();
extern \"C\" __declspec(dllexport) void SHIVA_Plugin_Smoke();

// CI smoke
shiva-tool plugin smoke --all --fail-on-mismatch

Pitfalls That Sabotage Large ShiVa Projects

  • Unbounded Update Loops: Scripts that run per frame with no back-pressure. Remedy: event-driven or budgeted updates.
  • Hidden Asset Duplication: Multiple copies of the same texture under different paths; blows memory and IO. Remedy: deduplicate and enforce GUID/hash-based references.
  • Monolithic Scenes: One mega-scene instead of streaming subscenes; causes long stalls and large working sets. Remedy: scene streaming and content staging.
  • Overuse of Strings: String keys in hot paths cause allocations and hashing costs. Remedy: intern or replace with integer IDs.
  • Ad-hoc Networking: Per-entity broadcasts with no aggregation. Remedy: interest management and channel batching.
  • Export Profile Drift: Team members using slightly different export presets. Remedy: repository-owned, versioned profiles only.

Long-Term Best Practices: Turning Fixes into Policy

  • Budgeting: Define per-system budgets (ms/frame, MB/scene, kB/s network). Enforce in CI with headless validation exports that run micro-benchmarks.
  • Performance Gates: Treat perf regressions like test failures. A PR that adds 1 ms to the main thread must justify the cost.
  • Content Contracts: Texture max sizes, mesh poly budgets, animation track limits—all encoded as machine-checked rules.
  • GC-Safe Scripting: Style guide for Lua: table reuse, pooled objects, no closures in tight loops, no string.format in per-frame code.
  • Deterministic Networking: Fixed tick rates, quantized data, client interpolation windows, and authoritative reconciliation.
  • Single Source of Truth for Exports: Profiles/manifests live in the repo; Authoring Tool picks them up read-only in CI.
  • Crash-First Diagnostics: Symbol packages, scene breadcrumbs, and build IDs baked into every binary.

Case Studies: From Symptom to Resolution

Case 1: Mobile Hitch on Scene Transition

Symptom: 120–200 ms spikes when opening pause menu on Android mid-game. Root: UI script allocated dozens of new tables and built strings per button per frame; background music crossfade created two new decoders. Fix: Prebuild UI data, reuse tables, pool sound decoders, stagger GC with collectgarbage(\"step\"), pre-warm UI textures at boot. Result: Spikes reduced to < 10 ms.

Case 2: Windows-Only Crash After 45 Minutes

Symptom: Access violation in release builds only. Root: Plugin compiled with different structure packing from main runtime; over time, a rare codepath wrote past buffer end. Fix: Explicit #pragma pack, exported version handshake, CI smoke test, bounds checks. Result: No recurrence, and plugin builds now self-verify.

Case 3: Multiplayer Rubber-Banding at 64 Players

Symptom: Clients jitter when view contains many replicated actors. Root: Server broadcasted per-actor full states each tick; payload fragmentation and GC in serializer. Fix: Interest management (per client visibility set), delta compression, buffer pooling; raised server tick and tuned client interpolation window. Result: 50% bandwidth reduction, smooth movement.

Operationalizing Troubleshooting: Tooling and Automation

Performance CI

Add a headless benchmark scene that measures frame time under scripted load and outputs JSON. Fail the build if budgets are exceeded. Trend the metrics over time.

# Headless performance export (pseudo)
shiva-tool run-benchmark --scene=PerfLab --frames=300 --out=perf.json
python tools/assert_perf.py perf.json --max-frame-ms=16.6

Asset Compliance Bot

Run an asset linter on every commit. Enforce texture caps, mesh LOD presence, and naming rules. Autogenerate a report that links to offending assets inside the Authoring Tool.

Network Replay Harness

Record authoritative server streams and client input; replay deterministically to reproduce sync bugs. Compare checksum/hash of world state per tick.

-- Checksum example
function checksumWorld()
  local sum = 0
  local total = scene.getObjectCount(this.getCurrentScene())
  for i=0,total-1 do
    local h = scene.getObjectAt(this.getCurrentScene(), i)
    local x,y,z = object.getTranslation(h)
    sum = (sum * 16777619 ~ (math.floor(x*1000) + math.floor(y*1000) + math.floor(z*1000))) % 2^32
  end
  return sum
end

Security, Stability, and Live-Ops Considerations

  • Sandbox Lua Extensions: Audit any native bindings; validate inputs and lengths to avoid memory corruption.
  • Patch Discipline: Export profiles should pin runtime and plugin versions; hotfixes update the pin, not ad-hoc local installs.
  • Telemetry: Instrument gameplay and engine stats (frame time, GC, net RTT, packet loss). Feed dashboards to catch regressions within hours, not weeks.
  • Feature Flags: Wrap risky systems (new physics or renderer paths) with server-controlled flags to rollback instantly if metrics spike.

Conclusion

Most ShiVa3D troubles in large productions trace to scale amplifiers: allocation-heavy Lua patterns, ungoverned asset growth, export profile drift, and naive networking. Treat performance and stability as first-class deliverables: lock versions, validate assets, pace GC, and engineer networking for determinism. Make every export explainable with manifests and symbols, and make every regression visible with automated benchmarks. When you codify these practices into CI and team policy, intermittent hitches and platform-only crashes give way to predictable, maintainable builds that ship reliably across devices and live-ops cycles.

FAQs

1. How do I tell if a spike is Lua GC or GPU stall?

Instrument Lua wall time and allocation counters alongside GPU timing markers. If Lua time jumps while GPU is idle, it's GC or CPU-bound logic; if Lua time is flat but total frame spikes with high draw submission, suspect GPU. Use staged GC steps and pre-warming to separate effects.

2. What's the safest pattern for per-frame data in ShiVa Lua?

Preallocate buffers and reuse tables. Avoid creating closures in hot loops and prefer numeric indices over string keys. Batch engine calls (e.g., set transforms in arrays) to minimize VM→native crossings.

3. We see asset-related crashes only on Android—why?

Android's case-sensitive filesystem and texture compression constraints (ETC2/ASTC) expose naming and format mistakes hidden on desktop. Add a pre-export validator that enforces lowercase, checks format compatibility, and rejects oversized textures for the device class.

4. How do we keep exports consistent across teams and branches?

Version export profiles and manifests inside the repo, pin tool/runtime versions, and add a CI guard that diff-checks the produced manifest against an expected one. Fail fast if any plugin hash or setting drifts.

5. Multiplayer jitter persists even after bandwidth reductions—what next?

Increase server tick rate within CPU budget, widen client interpolation window slightly, and ensure timestamps are monotonic and synchronized. Add interest management to cut irrelevant updates and quantize transforms to stabilize extrapolation.