Content is user-generated and unverified.

Layer 1: Intelligent Data Collection - Complete Strategic Framework

Executive Overview

Layer 1 is the sensory nervous system of the Factory Intelligence System. It transforms three chaotic data streams (PLC, ERP, and Human inputs) into a unified, intelligent data platform ready for gap discovery and value creation.

The Fundamental Challenge

Factories generate three types of data, each with unique problems:

  1. PLC Data: Drowning in data (millions of points/day) but starving for insights
  2. ERP Data: Rich in context but trapped behind slow queries and system limitations
  3. Human Data: Critical knowledge locked in Excel files and operators' heads

Layer 1 solves this by creating an intelligent collection system that processes data at the source, unifies storage, and maintains business context - all using free tools.

Part 1: Understanding the Three Data Universes

The PLC Data Universe

What It Is:

  • Real-time operational data from machines
  • Sensors, counters, timers, machine states
  • Typically 100-10,000 data points per production line
  • Updates every 100ms to 10 seconds

The Core Problem: Imagine a temperature sensor sending readings every second:

  • 1 sensor = 86,400 readings/day
  • 100 sensors = 8.6 million readings/day
  • 1 year = 3.1 billion readings

This creates three problems:

  1. Storage explosion: Database grows beyond manageable size
  2. Query death: Simple queries take minutes
  3. Noise over signal: Important patterns buried in repetitive data

Our Strategic Solution - Edge Intelligence:

Instead of collecting everything and processing later (traditional approach), we implement intelligence at the edge:

Traditional: Sensor → Database → Process → Insights (fails at scale)
Our Way: Sensor → Edge Processing → Intelligent Data → Database → Immediate Insights

This means:

  • For Sensors: Don't store every reading. Calculate min/max/avg per minute at the edge. This gives you all the insight with 60x less data.
  • For Counters: Don't store absolute values. Calculate production rates and deltas. This gives immediate business value.
  • For States: Don't store unchanged states. Capture transitions with duration. This gives complete history efficiently.

Why This Philosophy Works:

  • Distributes intelligence across the system
  • Provides value immediately, not after batch processing
  • Continues working even if central systems fail
  • Reduces infrastructure costs by 60x

The ERP Data Universe

What It Is:

  • Business context for production (orders, products, standards)
  • Typically updates every few minutes to hours
  • Contains the "why" behind production targets
  • Usually 10-100 queries per minute capacity

The Core Problem: Real-time KPIs need ERP context for every calculation:

  • OEE needs: current order, standard rate, product specs
  • Quality metrics need: specification limits, sampling rules
  • Efficiency needs: target quantities, changeover standards

If we query ERP directly:

  • Each KPI calculation = 200-500ms ERP query
  • 100 machines × 60 calculations/min = 6,000 queries/min
  • Result: ERP crashes, KPIs fail

Our Strategic Solution - Intelligent Caching:

We create a synchronized cache of only the active data needed for real-time decisions:

What We Cache:
- Active production orders (next 24 hours)
- Product specifications for active products
- Quality standards for current production
- Current bill of materials

What We Don't Cache:
- Historical orders (query when needed)
- Inactive products
- Financial data
- Archived records

Refresh Strategy:
- Orders: Every 5 minutes (they change rarely during shift)
- Products: Daily (specifications are stable)
- Quality: Hourly (may be updated during production)

Why This Philosophy Works:

  • Enables sub-second KPI calculations
  • Protects ERP from overload
  • Maintains data consistency throughout shift
  • Provides resilience during ERP downtime

The Human Data Universe

What It Is:

  • Quality test results from labs
  • Downtime reasons from operators
  • Manual counts and checks
  • Shift reports and observations

The Core Problem:

  • Data arrives hours after events
  • Inconsistent formats (everyone has their own Excel)
  • No validation (typos, wrong units)
  • Lost context (which machine? when exactly?)

Our Strategic Solution - Structured Collection:

Transform chaos into structure through:

  1. Standardized Templates: Pre-built Excel with validation
  2. Automated Monitoring: Watch folders, process immediately
  3. Web Forms: Mobile-friendly for real-time entry
  4. Smart Reconciliation: Match with automated data

Why This Philosophy Works:

  • Captures knowledge that sensors miss
  • Provides context for anomalies
  • Enables continuous improvement
  • Respects existing workflows

Part 2: Detailed Data Flow Architecture

PLC Data: From Raw Signals to Business Intelligence

The PLC Data Journey

Starting Point: Raw electrical signals in PLC registers Ending Point: Business-ready metrics in TimescaleDB Transformation: From meaningless numbers to actionable intelligence

Step 1: Initial Contact (PLC → Node-RED)

What Happens:

PLC Register DB100.DBD0 = 3247 (meaningless number)
                ↓
Node-RED S7 Connection reads every 1 second
                ↓
Node-RED knows: DB100.DBD0 = Temperature Sensor Zone 1
                ↓
Applies scaling: (3247 * 0.1) - 50 = 274.7°C

Critical Details:

  • Connection: Each PLC has IP address (e.g., 192.168.1.100)
  • Protocol: S7 for Siemens, Modbus for others
  • Polling Rate: Based on data type:
    • Temperatures: 1 second (changes slowly)
    • Counters: 100ms (must catch every piece)
    • States: On change (event-driven)
    • Timers: 10 seconds (accumulated values)

Data Enrichment at Source:

javascript
// In Node-RED Function Node
msg.payload = {
    timestamp: new Date().toISOString(),
    equipment_id: "LINE_01_OVEN_01",
    sensor_type: "temperature",
    tag_name: "zone1_temp",
    raw_value: 3247,
    scaled_value: 274.7,
    unit: "celsius",
    quality: 192,  // OPC quality code
    alarm_limits: {high: 300, low: 250}
}

Step 2: Edge Intelligence (Processing in Node-RED)

For Sensor Data:

Raw: 274.7, 274.8, 274.6, 274.9... (60 readings/minute)
                ↓
Edge Processing: Buffer 60 values
                ↓
Calculate: avg=274.75, min=274.6, max=274.9, std_dev=0.13
                ↓
Output: One enriched message per minute

For Counter Data:

Raw: Counter shows 10547
     (100ms later) Counter shows 10548
                ↓
Edge Processing: Delta = 1 piece in 100ms
                ↓
Calculate: Rate = 10 pieces/second = 600 pieces/minute
                ↓
Output: Instantaneous production rate

For State Data:

Previous State: "RUNNING" (for last 45 minutes)
New State: "STOPPED"
                ↓
Edge Processing: State changed!
                ↓
Calculate: Duration in RUNNING = 2700 seconds
                ↓
Output: State transition event with duration

Step 3: Reliable Transport (Node-RED → MQTT)

Message Structure:

json
Topic: factory/plc/line01/sensor/zone1_temp
Payload: {
    "timestamp": "2024-01-15T10:30:00Z",
    "equipment_id": "LINE_01_OVEN_01",
    "avg_1min": 274.75,
    "min_1min": 274.6,
    "max_1min": 274.9,
    "std_dev": 0.13,
    "sample_count": 60,
    "quality": "GOOD"
}

Why This Structure:

  • Topic hierarchy enables selective subscription
  • Payload contains both raw and calculated values
  • Timestamp ensures proper ordering
  • Quality flag prevents bad data propagation

Step 4: Central Processing (MQTT → Python → TimescaleDB)

Python Subscriber Logic:

python
# Receives MQTT message
# Validates against business rules
if avg_temp > 350 or avg_temp < 200:
    flag_as_suspicious()
    
# Enriches with context
message['shift'] = get_current_shift()
message['product'] = get_current_product(equipment_id)

# Batches for efficiency
batch.append(message)
if len(batch) >= 1000 or time_elapsed > 10:
    bulk_insert_to_db(batch)

Final Storage Structure:

sql
sensor_data table:
time                    | 2024-01-15 10:30:00
equipment_id           | LINE_01_OVEN_01
sensor_name            | zone1_temp
avg_1min               | 274.75
min_1min               | 274.6
max_1min               | 274.9
std_dev                | 0.13
shift                  | SHIFT_A
product_id             | PROD_12345

ERP Data: From Business Logic to Operational Context

The ERP Data Journey

Starting Point: Business transactions in ERP Ending Point: Cached context ready for real-time joins Transformation: From slow queries to instant context

Step 1: Identify Critical Data

What We Need from ERP:

Production Orders:
- Order ID, Product ID, Target Quantity
- Standard Rate (pieces/hour)
- Planned Start/End times
- Current Status

Product Specifications:
- Product ID, Name, Category
- Cycle Time (seconds/piece)
- Temperature Requirements
- Quality Specifications

Bill of Materials:
- Product ID → Component IDs
- Component Quantities
- Scrap Allowances

What We DON'T Need:

  • Financial data (costs, prices)
  • HR data (operator names)
  • Historical orders (> 30 days old)
  • Archived products

Step 2: Smart Query Strategy

Instead of: "SELECT * FROM orders" We Do: Incremental queries with business logic

sql
-- Production Orders Query (runs every 5 minutes)
SELECT 
    order_no,
    product_code,
    target_quantity,
    standard_rate_per_hour,
    planned_start_time,
    planned_end_time,
    order_status,
    last_modified_timestamp
FROM production_orders
WHERE 
    -- Only active or upcoming orders
    order_status IN ('RELEASED', 'IN_PROCESS', 'SCHEDULED')
    -- Only if changed since last sync
    AND last_modified_timestamp > :last_sync_time
    -- Only for next 24 hours
    AND planned_start_time < NOW() + INTERVAL '24 hours'
ORDER BY planned_start_time;

Step 3: Transform and Enrich

ERP Returns: Raw business data We Transform: Into operational context

python
def process_erp_order(erp_record):
    return {
        'order_id': erp_record['order_no'],
        'product_id': erp_record['product_code'],
        'target_quantity': erp_record['target_quantity'],
        'standard_rate_per_hour': erp_record['standard_rate_per_hour'],
        'standard_rate_per_minute': erp_record['standard_rate_per_hour'] / 60,
        'cycle_time_seconds': 3600 / erp_record['standard_rate_per_hour'],
        'planned_duration_minutes': calculate_duration(
            erp_record['target_quantity'],
            erp_record['standard_rate_per_hour']
        ),
        'efficiency_target': 0.85,  # Business rule
        'status': erp_record['order_status'],
        'sync_timestamp': datetime.now()
    }

Step 4: Strategic Caching

Cache Storage Strategy:

sql
-- In same PostgreSQL instance as TimescaleDB
CREATE SCHEMA erp_cache;

-- Current orders (refreshed every 5 min)
CREATE TABLE erp_cache.active_orders (
    order_id TEXT PRIMARY KEY,
    product_id TEXT NOT NULL,
    target_quantity INTEGER,
    standard_rate_per_hour NUMERIC,
    cycle_time_seconds NUMERIC,
    planned_start TIMESTAMPTZ,
    planned_end TIMESTAMPTZ,
    last_sync TIMESTAMPTZ DEFAULT NOW()
);

-- Product specs (refreshed daily)
CREATE TABLE erp_cache.product_specs (
    product_id TEXT PRIMARY KEY,
    product_name TEXT,
    temp_min NUMERIC,
    temp_max NUMERIC,
    pressure_min NUMERIC,
    pressure_max NUMERIC,
    quality_specs JSONB,
    last_sync TIMESTAMPTZ DEFAULT NOW()
);

-- Create indexes for join performance
CREATE INDEX idx_orders_product ON erp_cache.active_orders(product_id);
CREATE INDEX idx_orders_time ON erp_cache.active_orders(planned_start);

User Data: From Human Knowledge to Digital Intelligence

The User Data Journey

Starting Point: Knowledge in people's heads or Excel files Ending Point: Validated data linked to automated systems Transformation: From unstructured to structured intelligence

Step 1: Capture Methods

Method 1: Standardized Excel Templates

Template Structure:
- Protected headers (no renaming)
- Dropdown lists for equipment IDs
- Data validation for ranges
- Required timestamp column
- Auto-generated upload ID

Example: Quality Test Results
| Timestamp | Equipment_ID | Product_ID | Test_Type | Result | Pass_Fail |
|-----------|--------------|------------|-----------|--------|-----------|
| 10:30:00  | LINE_01      | PROD_123   | Thickness | 2.54   | PASS      |

Method 2: Web Forms (Mobile-Friendly)

python
# Flask web form for downtime entry
@app.route('/downtime', methods=['POST'])
def record_downtime():
    data = {
        'timestamp': request.form['timestamp'],
        'equipment_id': request.form['equipment_id'],
        'duration_minutes': request.form['duration'],
        'reason_category': request.form['category'],
        'reason_detail': request.form['detail'],
        'operator_id': session['operator_id'],
        'entry_timestamp': datetime.now()
    }
    validate_and_store(data)

Method 3: Direct Database Entry

  • For repetitive data (e.g., hourly checks)
  • Pre-populated forms with previous values
  • Automatic timestamp generation

Step 2: Validation and Enrichment

Multi-Level Validation:

python
def validate_user_input(data):
    # Level 1: Format validation
    if not is_valid_timestamp(data['timestamp']):
        raise ValidationError("Invalid timestamp format")
    
    # Level 2: Business rule validation
    if data['equipment_id'] not in get_valid_equipment_ids():
        raise ValidationError(f"Unknown equipment: {data['equipment_id']}")
    
    # Level 3: Sanity checks
    time_diff = datetime.now() - data['timestamp']
    if time_diff > timedelta(hours=24):
        raise ValidationError("Data too old (>24 hours)")
    
    # Level 4: Cross-reference with automated data
    if not equipment_was_running(data['equipment_id'], data['timestamp']):
        flag_for_review("Test performed while equipment stopped")
    
    return validated_data

Step 3: Reconciliation with Automated Data

Matching Process:

python
def reconcile_manual_with_automated(manual_entry):
    # Find corresponding automated data
    window_start = manual_entry['timestamp'] - timedelta(minutes=5)
    window_end = manual_entry['timestamp'] + timedelta(minutes=5)
    
    automated_data = query_plc_data(
        equipment_id=manual_entry['equipment_id'],
        time_range=(window_start, window_end)
    )
    
    # Link the data
    manual_entry['linked_production_count'] = automated_data['production_count']
    manual_entry['linked_machine_state'] = automated_data['machine_state']
    manual_entry['linked_sensor_values'] = automated_data['sensor_values']
    
    # Calculate confidence score
    if automated_data['machine_state'] == 'RUNNING':
        manual_entry['confidence_score'] = 0.95
    else:
        manual_entry['confidence_score'] = 0.70  # Lower confidence
    
    return manual_entry

Step 4: Storage with Full Context

Final Storage Structure:

sql
CREATE TABLE user_input.quality_tests (
    entry_id UUID PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    equipment_id TEXT NOT NULL,
    product_id TEXT NOT NULL,
    test_type TEXT NOT NULL,
    test_result NUMERIC,
    pass_fail BOOLEAN,
    -- Linked automated data
    linked_production_count INTEGER,
    linked_machine_state TEXT,
    linked_temp_avg NUMERIC,
    -- Metadata
    entered_by TEXT,
    entry_timestamp TIMESTAMPTZ,
    confidence_score NUMERIC,
    validation_flags TEXT[]
);

How These Three Streams Combine in Layer 2

The Convergence Point

In Layer 2, these three data streams merge to create complete operational intelligence:

sql
-- Example: Real-time OEE Calculation combining all three sources
WITH production_data AS (
    -- From PLC data
    SELECT 
        time_bucket('1 hour', time) as hour,
        equipment_id,
        SUM(pieces_delta) as actual_production,
        SUM(CASE WHEN state = 'RUNNING' THEN duration ELSE 0 END) as run_time
    FROM plc_data.production_data
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY hour, equipment_id
),
order_context AS (
    -- From ERP cache
    SELECT 
        equipment_id,
        standard_rate_per_hour,
        target_quantity,
        product_id
    FROM erp_cache.active_orders
    WHERE NOW() BETWEEN planned_start AND planned_end
),
quality_data AS (
    -- From user input
    SELECT 
        equipment_id,
        COUNT(CASE WHEN pass_fail = true THEN 1 END) as passed,
        COUNT(*) as total_tested
    FROM user_input.quality_tests
    WHERE timestamp > NOW() - INTERVAL '1 hour'
    GROUP BY equipment_id
)
-- Combine all three for complete OEE
SELECT 
    p.equipment_id,
    p.hour,
    -- Availability (PLC data)
    (p.run_time / 3600.0) as availability,
    -- Performance (PLC + ERP data)
    (p.actual_production / (o.standard_rate_per_hour * p.run_time / 3600.0)) as performance,
    -- Quality (PLC + User data)
    (COALESCE(q.passed, p.actual_production) / p.actual_production) as quality,
    -- Combined OEE
    (p.run_time / 3600.0) * 
    (p.actual_production / (o.standard_rate_per_hour * p.run_time / 3600.0)) * 
    (COALESCE(q.passed, p.actual_production) / p.actual_production) as oee,
    -- Context for gap analysis
    o.product_id,
    o.standard_rate_per_hour,
    p.actual_production
FROM production_data p
JOIN order_context o ON p.equipment_id = o.equipment_id
LEFT JOIN quality_data q ON p.equipment_id = q.equipment_id;

The Intelligence Creation

Layer 2 uses these combined streams to:

  1. Calculate Compound KPIs:
    • OEE = Availability × Performance × Quality
    • Where each component comes from different sources
  2. Identify Gaps:
    • Expected production (ERP) vs Actual (PLC)
    • Standard quality (ERP) vs Tested quality (User)
    • Planned runtime (ERP) vs Actual runtime (PLC)
  3. Correlate Events:
    • Temperature spike (PLC) → Quality failure (User)
    • Order change (ERP) → Production rate drop (PLC)
    • Maintenance entry (User) → Improved performance (PLC)
  4. Enable Predictions:
    • Historical patterns from all three sources
    • Feed into AI models in Layer 4

This convergence transforms three separate data streams into unified intelligence, enabling the Factory Intelligence System to identify and quantify gaps for systematic improvement.

The Technology Stack Architecture

We chose each technology for specific strategic reasons:

Node-RED (PLC Connectivity)

  • Why: Visual programming perfect for maintenance teams
  • Alternative considered: Direct Python scripts
  • Decision: Node-RED wins on maintainability and protocol support

MQTT/Mosquitto (Message Transport)

  • Why: Decouples producers from consumers
  • Alternative considered: Direct database writes
  • Decision: MQTT wins on reliability and flexibility

TimescaleDB (Time-Series Storage)

  • Why: Optimized for time-series queries
  • Alternative considered: Regular PostgreSQL
  • Decision: TimescaleDB wins on automatic data management

PostgreSQL (ERP Cache)

  • Why: Shares instance with TimescaleDB
  • Alternative considered: Separate cache database
  • Decision: Same instance wins on simplicity

Python (Orchestration)

  • Why: Flexibility and library ecosystem
  • Alternative considered: Java, Go
  • Decision: Python wins on rapid development

Part 3: Critical Design Decisions

Decision 1: Edge Processing vs Central Processing

Options Considered:

  1. Collect everything, process centrally
  2. Process everything at edge
  3. Hybrid approach (our choice)

Why Hybrid:

  • Simple calculations at edge (rates, averages)
  • Complex analytics centrally (patterns, predictions)
  • Best of both worlds

Decision 2: All TimescaleDB vs Mixed Storage

Options Considered:

  1. Current in Redis, History in TimescaleDB
  2. Current in PostgreSQL, History in TimescaleDB
  3. Everything in TimescaleDB (our choice)

Why All TimescaleDB:

  • Materialized views give Redis-like performance
  • Single query interface
  • Time-travel queries possible
  • Simpler architecture

Decision 3: MQTT vs Direct Database

Options Considered:

  1. Node-RED writes directly to database
  2. Message queue (Kafka/RabbitMQ)
  3. MQTT broker (our choice)

Why MQTT:

  • Perfect for sensor data patterns
  • Extremely lightweight
  • Built-in QoS levels
  • Industry standard for IoT

Decision 4: Intelligent Caching Strategy

Options Considered:

  1. No caching (direct ERP queries)
  2. Cache everything
  3. Smart caching of active data (our choice)

Why Smart Caching:

  • Balances performance and freshness
  • Minimizes storage requirements
  • Reduces sync complexity
  • Maintains data quality

Part 4: Implementation Methodology

Phase 1: Foundation (Week 1-2)

  1. Set up TimescaleDB with proper schemas
  2. Install Mosquitto broker
  3. Deploy first Node-RED instance
  4. Create basic Python collector
  5. Verify data flow end-to-end

Phase 2: Single Line Pilot (Week 3-4)

  1. Connect one production line completely
  2. Implement all four PLC data types
  3. Set up ERP synchronization
  4. Create first materialized views
  5. Validate data quality

Phase 3: Intelligence Layer (Week 5-6)

  1. Add edge calculations
  2. Implement batching strategies
  3. Create continuous aggregates
  4. Add data quality gates
  5. Set up monitoring

Phase 4: Scale Out (Week 7-8)

  1. Replicate to other lines
  2. Optimize performance
  3. Implement retention policies
  4. Add backup procedures
  5. Document everything

Part 5: Operational Excellence

Data Quality Assurance

At Collection:

  • Validate sensor ranges
  • Check timestamp sanity
  • Verify equipment IDs exist
  • Flag suspicious values

At Processing:

  • Business rule validation
  • Referential integrity
  • Duplicate detection
  • Completeness checks

At Storage:

  • Constraint enforcement
  • Index optimization
  • Partition management
  • Compression verification

Performance Optimization

Edge Level:

  • Buffer size tuning
  • Calculation optimization
  • Network retry logic
  • Local caching

Transport Level:

  • MQTT QoS settings
  • Topic hierarchy design
  • Message size limits
  • Broker clustering

Database Level:

  • Proper indexes
  • Partition sizing
  • Continuous aggregates
  • Query optimization

Monitoring and Alerting

Key Metrics:

  1. Data latency (target: <5 seconds)
  2. Message queue depth (target: <1000)
  3. Failed message rate (target: <0.1%)
  4. Query performance (target: <100ms)
  5. Storage growth rate

Alert Triggers:

  • No data from equipment > 5 minutes
  • ERP sync failure > 2 cycles
  • Database storage > 80%
  • Query time > 1 second
  • Error rate > 1%

Part 6: Value Delivery

Immediate Benefits (Day 1)

  • Real-time visibility into production
  • Automatic KPI calculation
  • Reduced manual data entry
  • Alert on anomalies

Short-term Benefits (Month 1)

  • Historical trending
  • Pattern identification
  • Predictive maintenance indicators
  • Quality correlation

Long-term Benefits (Month 3+)

  • Full gap discovery engine
  • Automated optimization
  • Predictive analytics
  • Strategic insights

Part 7: Common Pitfalls and Solutions

Pitfall 1: "We'll Process Later"

Symptom: Collecting raw data thinking you'll aggregate later Result: Database explosion, slow queries Solution: Process at edge from day one

Pitfall 2: "Our ERP is Fast"

Symptom: Direct ERP queries for real-time data Result: ERP overload, system failures Solution: Implement intelligent caching

Pitfall 3: "Storage is Cheap"

Symptom: No retention policy Result: Queries become impossibly slow Solution: Define lifecycle from start

Pitfall 4: "Manual Data Isn't Important"

Symptom: Ignoring human inputs Result: Missing critical context Solution: Structured collection strategy

Pitfall 5: "We Don't Need Buffering"

Symptom: Assuming network is reliable Result: Data loss during outages Solution: Buffer at every level

The Strategic Outcome

When Layer 1 is complete, you have:

  1. Unified Data Platform: All three data types in one queryable system
  2. Real-time Intelligence: Business metrics calculated at edge
  3. Historical Context: Every event stored efficiently
  4. Business Alignment: ERP context attached to operational data
  5. Human Insights: Manual inputs integrated seamlessly

This isn't just data collection - it's the foundation for the entire Factory Intelligence System. Every subsequent layer depends on the quality, completeness, and timeliness of Layer 1 data.

Success Criteria

Layer 1 is successful when:

  • Operators trust the data (>95% accuracy)
  • Queries return instantly (<100ms)
  • No data loss during outages
  • ERP never overloaded
  • Storage grows predictably

Part 8: Preparing for Layer 2 - The Convergence

What Layer 2 Receives from Layer 1

Layer 2 inherits a rich, unified data platform with three synchronized streams:

  1. Real-Time Operational Stream (from PLCs)
    • Production rates calculated every minute
    • Equipment states with duration tracking
    • Sensor readings with statistical aggregates
    • Quality indicators from inspection systems
  2. Business Context Stream (from ERP)
    • Active production orders with targets
    • Product specifications and requirements
    • Standard rates and efficiency targets
    • Quality specifications and limits
  3. Human Intelligence Stream (from Users)
    • Quality test results with timestamps
    • Downtime reasons and durations
    • Maintenance actions and outcomes
    • Operator observations and adjustments

How Layer 2 Will Process Combined Data

Real-Time KPI Engine

Layer 2 will run continuous queries that join all three streams:

sql
-- Example: Real-time efficiency with full context
CREATE MATERIALIZED VIEW realtime_efficiency AS
WITH current_production AS (
    SELECT 
        p.equipment_id,
        p.rate_per_hour as actual_rate,
        s.state_value as current_state,
        s.duration_seconds as state_duration
    FROM plc_data.current_production p
    JOIN plc_data.current_state s ON p.equipment_id = s.equipment_id
),
active_orders AS (
    SELECT 
        equipment_id,
        product_id,
        standard_rate_per_hour as target_rate,
        cycle_time_seconds
    FROM erp_cache.active_orders
    WHERE NOW() BETWEEN planned_start AND planned_end
),
recent_quality AS (
    SELECT 
        equipment_id,
        AVG(CASE WHEN pass_fail THEN 1 ELSE 0 END) as quality_rate
    FROM user_input.quality_tests
    WHERE timestamp > NOW() - INTERVAL '1 hour'
    GROUP BY equipment_id
)
SELECT 
    cp.equipment_id,
    cp.actual_rate,
    ao.target_rate,
    (cp.actual_rate / ao.target_rate) as performance_ratio,
    cp.current_state,
    cp.state_duration,
    rq.quality_rate,
    ao.product_id,
    NOW() as calculation_time
FROM current_production cp
JOIN active_orders ao ON cp.equipment_id = ao.equipment_id
LEFT JOIN recent_quality rq ON cp.equipment_id = rq.equipment_id;

Gap Discovery Preparation

Layer 2 will identify gaps by comparing:

  1. Performance Gaps:
    • Theoretical maximum (from equipment specs)
    • Standard rate (from ERP)
    • Actual rate (from PLC)
    • Best demonstrated rate (from historical data)
  2. Quality Gaps:
    • Specification limits (from ERP)
    • Actual measurements (from sensors)
    • Test results (from users)
    • First-pass yield trends
  3. Availability Gaps:
    • Scheduled time (from ERP)
    • Actual runtime (from PLC)
    • Downtime reasons (from users)
    • Maintenance impact (from both PLC and users)

Pattern Detection Setup

The combined data enables pattern discovery:

python
# Layer 2 will correlate across streams
def identify_patterns():
    # Pattern 1: Temperature impact on quality
    correlate(
        plc_data.sensor_data.temperature,
        user_input.quality_tests.pass_fail
    )
    
    # Pattern 2: Order changes impact on efficiency
    correlate(
        erp_cache.order_changes,
        plc_data.production_rates
    )
    
    # Pattern 3: Operator actions impact on performance
    correlate(
        user_input.operator_adjustments,
        plc_data.efficiency_metrics
    )

The Data Quality Foundation

Layer 1 ensures Layer 2 receives:

  1. Synchronized Timestamps: All three sources aligned to same time base
  2. Consistent Identifiers: Equipment IDs match across all systems
  3. Validated Data: Quality flags and confidence scores included
  4. Complete Context: Every data point has business meaning attached
  5. Historical Depth: Patterns visible across time ranges

The Intelligence Multiplication

When Layer 2 processes this unified data:

Single Source Insight: "Machine stopped for 10 minutes" Combined Intelligence: "Machine stopped for 10 minutes during order changeover from Product A to Product B, operator noted material shortage, resulting in 150 pieces of lost production worth $450 based on current order margins"

This is the power of Layer 1's unified data platform - it transforms isolated data points into comprehensive business intelligence.

Conclusion

Layer 1 creates the foundation for the entire Factory Intelligence System by:

  1. Collecting data from three distinct sources with appropriate strategies
  2. Processing data at the edge for immediate value
  3. Unifying all streams in a single, time-series optimized platform
  4. Enriching technical data with business context
  5. Preparing everything for advanced analytics in subsequent layers

The success of the entire Factory Intelligence System depends on Layer 1's ability to deliver complete, accurate, timely, and contextualized data. With this foundation in place, Layer 2 can begin the real work of gap discovery and value creation.

Content is user-generated and unverified.
    Layer 1: Intelligent Data Collection - Implementation Guide | Claude