Background: Why ShiVa3D Projects Fail in Non-Obvious Ways at Scale
ShiVa3D couples a visual Authoring Tool with a portable runtime, a Lua-based scripting layer, a scenegraph/resource system, and a multi-platform export pipeline. For small scenes, the defaults are sensible. At enterprise scale—tens of thousands of assets, large AI graphs, multiple scenes/levels, and live-ops—edge cases surface:
- Data scale: Asset bloat increases IO pressure and raises the cost of serialization/deserialization during scene loads.
- Interplay of Lua and engine code: Tight per-frame Lua loops, accidental allocations, and reflection-heavy calls produce GC thrash and frame spikes.
- Platform variance: Different GL/Metal/Vulkan backends, shader permutations, and file systems (case sensitivity, path length) act as failure multipliers.
- Export pipeline drift: Authoring Tool profiles, plugin binaries, and project metadata can become misaligned across branches, leading to subtle runtime differences.
The immediate consequence is instability that looks random: a specific handset stutters only in one level; a Linux dedicated server leaks memory; a Windows export crashes on alt-tab; iOS builds hitch when resuming from background. These are usually systemic, not random.
ShiVa Architecture: The Moving Parts That Matter for Troubleshooting
Runtime and Script Layer
ShiVa exposes engine features via a Lua API. Scripts attach to objects, AI models, and HUDs, running as event-driven callbacks or per-frame updates. The Lua VM is fast but sensitive to allocation patterns and upvalues captured in closures. The engine provides services—scenegraph, animation, physics, audio, networking—invoked from Lua and implemented natively.
Scenegraph and Resources
Scenes reference models, textures, materials, animations, and sounds. Loading resolves references, creates GPU resources, and registers objects in update lists. Poorly scoped scene hierarchies, heavy materials, and oversubscribed update lists drive time spent outside your game code.
Authoring and Export
The Authoring Tool compiles assets, packages binary data, and emits platform targets. Each target can carry platform-specific code (plugins), graphics settings, and manifest/entitlement data. Mismatch here causes platform-only bugs.
Networking
ShiVa's networking layer supports client/server synchronization and RPC-like messaging through Lua. Round-trip costs, packet coalescing, and serialization overhead often surface in multiplayer scenes with many replicated entities.
Diagnostics: A Methodical Playbook
1) Establish a Known-Good Baseline
Create a minimal branch that contains: fixed Authoring Tool version, frozen plugin builds, deterministic graphics settings, and a tiny scene that reproduces the symptom. This isolates pipeline drift from code issues.
# # Pseudo-steps for a reproducible baseline # git checkout -b shiva-baseline # lock tool/SDK versions in docs/versions.txt echo \u0022Authoring: 2.x.y\nLua runtime: project-config vA.B\nPlugins: hash 1a2b3c\u0022 > docs/versions.txt # disable dynamic content loading echo \u0022STREAMING=OFF\u0022 > config/build.profile # narrow scene to only the problematic assets shiva-tool scene copy Level_17 Level_17_min # export with deterministic options shiva-tool export --profile=prod-locked --deterministic
2) Instrument Frame Time and Allocations
Use per-frame timers and allocation counters. Instrument hot Lua callbacks and expensive engine calls. Track time in Lua vs time in engine to identify which side drives spikes.
-- -- Lightweight Lua profiler for ShiVa scripts -- local t0, t1, acc = 0, 0, 0 local frames = 0 function onEnterFrame() t0 = system.getTimer() -- update AI, input, gameplay gameplay.update() t1 = system.getTimer() acc = acc + (t1 - t0) frames = frames + 1 if frames % 60 == 0 then log.message(\"LuaFrameAvg:\", acc / 60) acc = 0 end end
3) Profile Scenegraph Pressure
Count active objects, components, and animations. A burst of dynamic entity creation/destruction can fragment engine lists and create cache-unfriendly update orders.
-- Count nodes and running animations local total, anim = scene.getObjectCount(this.getCurrentScene()), 0 for i=0, total-1 do local h = scene.getObjectAt(this.getCurrentScene(), i) if object.getAnimationState(h) == 1 then anim = anim + 1 end end log.message(\"SceneObjects:\", total, \" AnimRunning:\", anim)
4) Confirm Export Parity
Export the same scene to all target platforms with identical content hashes. If only one platform diverges, examine graphics backend and filesystem behavior. Case mismatches (\"Texture.png\" vs \"texture.png\") are notorious on Linux and Android.
5) Audit Lua GC and Table Growth
Periodic spikes often correlate with garbage collection. Track object lifetimes and table sizes. Convert per-frame temporary tables into preallocated arrays.
-- Inspect Lua memory (approx) if collectgarbage then local kb = collectgarbage(\"count\") if kb % 1024 < 1 then log.message(\"LuaKB:\", kb) end end
6) Network Round-Trip and Serialization
Log payload sizes and message rates. Coalesce small messages and throttle broadcast frequency. Validate that serialization code does not allocate fresh tables for every packet.
-- Simple network send wrapper with rate tracking local bytesOut, msgs = 0, 0 function net.send(channel, payload) bytesOut = bytesOut + string.len(payload) msgs = msgs + 1 if msgs % 60 == 0 then log.message(\"NetOut bytes:\", bytesOut, \" msgs:\", msgs) bytesOut, msgs = 0, 0 end network.send(channel, payload) end
Symptoms → Root Causes: Mapping the Usual Suspects
Symptom A: Random 80–150 ms Frame Spikes on Mobile Only
- Likely: Lua GC major cycles triggered by short-lived allocations in tight loops.
- Also: Texture paging due to oversized atlases; background-to-foreground event reinitializing resources.
Fix: Preallocate tables; reuse buffers; call collectgarbage(\"step\")
in controlled micro-steps during loading screens; split mega-atlases per scene; verify texture compression per GPU family.
Symptom B: Scene Loads Hang at 90% on Certain Platforms
- Likely: A missing or case-mismatched asset causes a blocking retry loop in the resource resolver.
- Also: Plugin dependency failing to load, shader compile stall on first use.
Fix: Enable verbose asset resolves in the export; run a pre-export validator that ensures path case normalization and dependency presence; pre-warm shaders on splash scenes.
Symptom C: Multiplayer Actors 'Rubber-Band' Under Load
- Likely: Server tick too low relative to client interpolation window; per-actor delta serialization allocates per packet.
- Also: Nagle's algorithm or OS-level coalescing fights custom coalescing logic.
Fix: Raise server tick to match client interpolation budget; batch deltas; pool serialization buffers; apply timewarp smoothing on client; disable conflicting socket options where appropriate.
Symptom D: Intermittent Crashes on Windows Export Only
- Likely: Plugin ABI mismatch across projects; memory overwritten by out-of-bounds vertex streams with D3D-only layout.
- Also: Threading primitive misuse visible only under that scheduler.
Fix: Rebuild all plugins with exactly the target runtime and compiler; add bounds checks in vertex/index streaming; run with debug graphics validation layers if available in your build of ShiVa runtime.
Step-by-Step Fixes: From Quick Wins to Architectural Refactors
1) Tame Lua GC and Allocation Hotspots
Switch from ephemeral tables to reusable arrays or structs held in upvalues. Avoid string concatenation in per-frame loops; use table buffers. Move high-allocation logic to loading screens and run incremental GC during low-load windows.
-- BEFORE: allocates table each frame function updateProjectiles(list) local out = {} for i=1,#list do out[i] = compute(list[i]) end return out end -- AFTER: reuse buffer local outBuf = {} function updateProjectiles(list) local n = #list for i=1,n do outBuf[i] = compute(list[i]) end for i=n+1,#outBuf do outBuf[i] = nil end return outBuf end -- GC pacing during loads for i=1,200 do collectgarbage(\"step\", 2000) end
2) Normalize Assets and Prevent Streaming Stalls
Introduce a pre-export validation step that enforces naming rules, checks for duplicates and case-collisions, validates texture dimensions (power-of-two where required), and ensures compression formats per platform.
# # Pre-export validator (pseudo) # python tools/validate_assets.py \ --root Assets/ \ --rules rulesets/mobile.yml \ --fail-on-case-collision \ --check-texture-formats # rulesets/mobile.yml excerpt textures: maxSize: 2048 formats: [PVRTC, ETC2, ASTC] names: forbidSpaces: true lowercase: true
3) Stabilize the Export Pipeline
Pin the Authoring Tool version and embed a manifest that records plugin hashes, graphics settings, and resource compiler options. The export job should fail fast if any drift is detected.
# CI snippet shiva-tool export --profile=android-prod --manifest out/export.manifest python tools/verify_manifest.py out/export.manifest expected/export.manifest
4) Scenegraph Hygiene: Reduce Update Fan-Out
Group frequently updated objects under dedicated parents to localize transform updates. Avoid deep hierarchies for animated props. Prefer state machines that suspend updates when off-screen.
-- Suspend expensive updates when culled function onEnterFrame() if not camera.isObjectVisible(this.getViewCamera(), this.getObject()) then if this.isActive() then this.deactivate() end return end if not this.isActive() then this.activate() end this.tick() end
5) Network Budgeting and Deterministic Transforms
Send input deltas, not absolute states. Quantize transforms to fixed-point to reduce bandwidth and improve determinism. Interpolate on the client with a bounded buffer.
-- Fixed-point position packing local SCALE = 100 local function packVec3(x,y,z) return string.pack(\"iii\", math.floor(x*SCALE), math.floor(y*SCALE), math.floor(z*SCALE)) end local function unpackVec3(s) local ix,iy,iz = string.unpack(\"iii\", s) return ix/SCALE, iy/SCALE, iz/SCALE end
6) Crash Forensics: Symbolization and Triage
Make crash reports actionable: include build ID, platform, scene name, last 64 log lines, and a ring buffer of recent engine events. Tie every export to a symbol package that your crash server can use.
# Produce symbols (platform-specific) and upload shiva-tool symbols export --target=win64 --out=symbols.zip curl -F file=@symbols.zip https://crash.example.com/upload
7) Plugin ABI Discipline
Version your plugin interface explicitly. Avoid relying on compiler defaults; enforce calling conventions and alignment. Provide a tiny runtime smoke test that loads each plugin and exercises a known function before packaging.
// C++ (plugin header) extern \"C\" __declspec(dllexport) int SHIVA_Plugin_Version(); extern \"C\" __declspec(dllexport) void SHIVA_Plugin_Smoke(); // CI smoke shiva-tool plugin smoke --all --fail-on-mismatch
Pitfalls That Sabotage Large ShiVa Projects
- Unbounded Update Loops: Scripts that run per frame with no back-pressure. Remedy: event-driven or budgeted updates.
- Hidden Asset Duplication: Multiple copies of the same texture under different paths; blows memory and IO. Remedy: deduplicate and enforce GUID/hash-based references.
- Monolithic Scenes: One mega-scene instead of streaming subscenes; causes long stalls and large working sets. Remedy: scene streaming and content staging.
- Overuse of Strings: String keys in hot paths cause allocations and hashing costs. Remedy: intern or replace with integer IDs.
- Ad-hoc Networking: Per-entity broadcasts with no aggregation. Remedy: interest management and channel batching.
- Export Profile Drift: Team members using slightly different export presets. Remedy: repository-owned, versioned profiles only.
Long-Term Best Practices: Turning Fixes into Policy
- Budgeting: Define per-system budgets (ms/frame, MB/scene, kB/s network). Enforce in CI with headless validation exports that run micro-benchmarks.
- Performance Gates: Treat perf regressions like test failures. A PR that adds 1 ms to the main thread must justify the cost.
- Content Contracts: Texture max sizes, mesh poly budgets, animation track limits—all encoded as machine-checked rules.
- GC-Safe Scripting: Style guide for Lua: table reuse, pooled objects, no closures in tight loops, no string.format in per-frame code.
- Deterministic Networking: Fixed tick rates, quantized data, client interpolation windows, and authoritative reconciliation.
- Single Source of Truth for Exports: Profiles/manifests live in the repo; Authoring Tool picks them up read-only in CI.
- Crash-First Diagnostics: Symbol packages, scene breadcrumbs, and build IDs baked into every binary.
Case Studies: From Symptom to Resolution
Case 1: Mobile Hitch on Scene Transition
Symptom: 120–200 ms spikes when opening pause menu on Android mid-game. Root: UI script allocated dozens of new tables and built strings per button per frame; background music crossfade created two new decoders. Fix: Prebuild UI data, reuse tables, pool sound decoders, stagger GC with collectgarbage(\"step\")
, pre-warm UI textures at boot. Result: Spikes reduced to < 10 ms.
Case 2: Windows-Only Crash After 45 Minutes
Symptom: Access violation in release builds only. Root: Plugin compiled with different structure packing from main runtime; over time, a rare codepath wrote past buffer end. Fix: Explicit #pragma pack
, exported version handshake, CI smoke test, bounds checks. Result: No recurrence, and plugin builds now self-verify.
Case 3: Multiplayer Rubber-Banding at 64 Players
Symptom: Clients jitter when view contains many replicated actors. Root: Server broadcasted per-actor full states each tick; payload fragmentation and GC in serializer. Fix: Interest management (per client visibility set), delta compression, buffer pooling; raised server tick and tuned client interpolation window. Result: 50% bandwidth reduction, smooth movement.
Operationalizing Troubleshooting: Tooling and Automation
Performance CI
Add a headless benchmark scene that measures frame time under scripted load and outputs JSON. Fail the build if budgets are exceeded. Trend the metrics over time.
# Headless performance export (pseudo) shiva-tool run-benchmark --scene=PerfLab --frames=300 --out=perf.json python tools/assert_perf.py perf.json --max-frame-ms=16.6
Asset Compliance Bot
Run an asset linter on every commit. Enforce texture caps, mesh LOD presence, and naming rules. Autogenerate a report that links to offending assets inside the Authoring Tool.
Network Replay Harness
Record authoritative server streams and client input; replay deterministically to reproduce sync bugs. Compare checksum/hash of world state per tick.
-- Checksum example function checksumWorld() local sum = 0 local total = scene.getObjectCount(this.getCurrentScene()) for i=0,total-1 do local h = scene.getObjectAt(this.getCurrentScene(), i) local x,y,z = object.getTranslation(h) sum = (sum * 16777619 ~ (math.floor(x*1000) + math.floor(y*1000) + math.floor(z*1000))) % 2^32 end return sum end
Security, Stability, and Live-Ops Considerations
- Sandbox Lua Extensions: Audit any native bindings; validate inputs and lengths to avoid memory corruption.
- Patch Discipline: Export profiles should pin runtime and plugin versions; hotfixes update the pin, not ad-hoc local installs.
- Telemetry: Instrument gameplay and engine stats (frame time, GC, net RTT, packet loss). Feed dashboards to catch regressions within hours, not weeks.
- Feature Flags: Wrap risky systems (new physics or renderer paths) with server-controlled flags to rollback instantly if metrics spike.
Conclusion
Most ShiVa3D troubles in large productions trace to scale amplifiers: allocation-heavy Lua patterns, ungoverned asset growth, export profile drift, and naive networking. Treat performance and stability as first-class deliverables: lock versions, validate assets, pace GC, and engineer networking for determinism. Make every export explainable with manifests and symbols, and make every regression visible with automated benchmarks. When you codify these practices into CI and team policy, intermittent hitches and platform-only crashes give way to predictable, maintainable builds that ship reliably across devices and live-ops cycles.
FAQs
1. How do I tell if a spike is Lua GC or GPU stall?
Instrument Lua wall time and allocation counters alongside GPU timing markers. If Lua time jumps while GPU is idle, it's GC or CPU-bound logic; if Lua time is flat but total frame spikes with high draw submission, suspect GPU. Use staged GC steps and pre-warming to separate effects.
2. What's the safest pattern for per-frame data in ShiVa Lua?
Preallocate buffers and reuse tables. Avoid creating closures in hot loops and prefer numeric indices over string keys. Batch engine calls (e.g., set transforms in arrays) to minimize VM→native crossings.
3. We see asset-related crashes only on Android—why?
Android's case-sensitive filesystem and texture compression constraints (ETC2/ASTC) expose naming and format mistakes hidden on desktop. Add a pre-export validator that enforces lowercase, checks format compatibility, and rejects oversized textures for the device class.
4. How do we keep exports consistent across teams and branches?
Version export profiles and manifests inside the repo, pin tool/runtime versions, and add a CI guard that diff-checks the produced manifest against an expected one. Fail fast if any plugin hash or setting drifts.
5. Multiplayer jitter persists even after bandwidth reductions—what next?
Increase server tick rate within CPU budget, widen client interpolation window slightly, and ensure timestamps are monotonic and synchronized. Add interest management to cut irrelevant updates and quantize transforms to stabilize extrapolation.