Memory Optimization
Handle large datasets efficiently with lazy generation mode.
When to Use Memory Optimization
Enable memory optimization when:
- Generating a large number of items (e.g. > 100,000)
- Working with limited memory
- Streaming data to external systems
- Processing data in batches
Enabling Memory Optimization
Generation generation = DslDataGenerator.create()
.withMemoryOptimization() // Enable lazy mode
.withSeed(123L)
.fromJsonString(dsl)
.generate();
Streaming Collections
Process collections one item at a time:
generation.streamJsonNodes("users").forEach(user -> {
// Process each user as JsonNode
System.out.println(user.get("name").asText());
// Or save to database, write to file, etc.
});
Eager vs Lazy Mode
Eager Mode (Default)
Generation generation = DslDataGenerator.create()
.fromJsonString(dsl)
.generate();
// Stream data (can be consumed once per call)
generation.streamJsonNodes("users").forEach(user -> process(user));
// Or get all streams at once
Map<String, Stream<JsonNode>> allData = generation.asJsonNodes();
Pros:
- Fast random access
- Can call streaming methods multiple times
- Simpler to use
Cons:
- High memory usage
- Slow for large datasets
- All data in memory
Lazy Mode
Generation generation = DslDataGenerator.create()
.withMemoryOptimization()
.fromJsonString(dsl)
.generate();
// Data generated on-demand
generation.streamJsonNodes("users").forEach(user -> {
process(user);
});
Pros:
- Low memory usage
- Handles huge datasets
- Efficient streaming
Cons:
- Streaming same collection multiple times yields different results
- No random access
- Slightly slower per-item
Example: Large Dataset
String dsl = """
{
"users": {
"count": 1000000,
"item": {
"id": {"gen": "uuid"},
"name": {"gen": "name.fullName"},
"email": {"gen": "internet.emailAddress"}
}
}
}
""";
Generation generation = DslDataGenerator.create()
.withMemoryOptimization()
.fromJsonString(dsl)
.generate();
// Process in batches
List<JsonNode> batch = new ArrayList<>();
generation.streamJsonNodes("users").forEach(user -> {
batch.add(user);
if (batch.size() >= 1000) {
saveBatch(batch);
batch.clear();
}
});
// Save remaining
if (!batch.isEmpty()) {
saveBatch(batch);
}
Streaming to Database
Generation generation = DslDataGenerator.create()
.withMemoryOptimization()
.fromJsonString(dsl)
.generate();
try (Connection conn = dataSource.getConnection()) {
String sql = "INSERT INTO users (id, name, email) VALUES (?, ?, ?)";
PreparedStatement stmt = conn.prepareStatement(sql);
generation.streamJsonNodes("users").forEach(user -> {
try {
stmt.setString(1, user.get("id").asText());
stmt.setString(2, user.get("name").asText());
stmt.setString(3, user.get("email").asText());
stmt.addBatch();
} catch (SQLException e) {
throw new RuntimeException(e);
}
});
stmt.executeBatch();
}
Streaming to File
Generation generation = DslDataGenerator.create()
.withMemoryOptimization()
.fromJsonString(dsl)
.generate();
try (BufferedWriter writer = Files.newBufferedWriter(Path.of("output.json"))) {
writer.write("[\n");
AtomicBoolean first = new AtomicBoolean(true);
generation.streamJsonNodes("users").forEach(user -> {
try {
if (!first.get()) {
writer.write(",\n");
}
first.set(false);
writer.write(" " + user.toString());
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
writer.write("\n]");
}
Best Practices
- Use for Large Datasets: Enable for > 100,000 items
- Stream Once: Remember you can only stream each collection once
- Batch Processing: Process in batches for efficiency
- Resource Management: Use try-with-resources for streams
- Error Handling: Handle errors gracefully during streaming
Limitations
- Cannot iterate collections multiple times
- No random access to items
- References may behave differently
- Some operations require eager mode
Next Steps
- Java API - Full API documentation
- Custom Generators - Create your own generators
- DSL Reference - Complete DSL syntax