dev-resources.site

for different kinds of informations.

Something could double the development efficiency of Java programmers

Published at

9/6/2024

Categories

4 categories in total

Author

10 person written this

esproc_spl

open

Something could double the development efficiency of Java programmers

Computing dilemma in the application
Development and Framework, which should be given the higher priority?
Java is the most commonly used programming language in application development. But writing code to process data in Java isn’t simple. For example, below is the Java code for performing grouping & aggregation on two fields:

Map<Integer, Map<String, Double>> summary = new HashMap<>();
    for (Order order : orders) {
        int year = order.orderDate.getYear();
        String sellerId = order.sellerId;
        double amount = order.amount;
        Map<String, Double> salesMap = summary.get(year);
        if (salesMap == null) {
            salesMap = new HashMap<>();
            summary.put(year, salesMap);
        }
        Double totalAmount = salesMap.get(sellerId);
        if (totalAmount == null) {
            totalAmount = 0.0;
        }
        salesMap.put(sellerId, totalAmount + amount);
    }
    for (Map.Entry<Integer, Map<String, Double>> entry : summary.entrySet()) {
        int year = entry.getKey();
        Map<String, Double> salesMap = entry.getValue();
        System.out.println("Year: " + year);
        for (Map.Entry<String, Double> salesEntry : salesMap.entrySet()) {
            String sellerId = salesEntry.getKey();
            double totalAmount = salesEntry.getValue();
            System.out.println("  Seller ID: " + sellerId + ", Total Amount: " + totalAmount);
        }
    }

By contrast, the SQL counterpart is much simpler. One GROUP BY clause is enough to close the computation.

SELECT year(orderdate),sellerid,sum(amount) FROM orders GROUP BY year(orderDate),sellerid

Indeed, early applications worked with the collaboration of Java and SQL. The business process was implemented in Java at the application side, and data was processed in SQL in the backend database. The framework was difficult to expand and migrate due to database limitations. This was very unfriendly to the contemporary applications. Moreover, on many occasions SQL was unavailable when there were no databases or cross-database computations were involved.

In view of this, later many applications began to adopt a fully Java-based framework, where databases only do simple read and write operations and business process and data processing are implemented in Java at the application side, particularly when microservices emerged. This way the application is decoupled from databases and gets good scalability and migratability, which helps gain framework advantages while having to face the Java development complexity mentioned previously.

It seems that we can only focus on one aspect – development or framework. To enjoy advantages of the Java framework, one must endure difficulty development; and to use SQL, one need to tolerate shortcomings of the framework. This creates a dilemma.

Then what can we do?
What about enhancing Java's data processing capabilities? This not only avoids SQL problems, but also overcomes Java shortcomings.

Actually, Java Stream/Kotlin/Scala are all trying to do so.

Stream

The Stream introduced in Java 8 added many data processing methods. Here is the Stream code for implementing the above computation:

Map<Integer, Map<String, Double>> summary = orders.stream()
            .collect(Collectors.groupingBy(
                    order -> order.orderDate.getYear(),
                    Collectors.groupingBy(
                            order -> order.sellerId,
                            Collectors.summingDouble(order -> order.amount)
                    )
            ));

    summary.forEach((year, salesMap) -> {
        System.out.println("Year: " + year);
        salesMap.forEach((sellerId, totalAmount) -> {
            System.out.println("  Seller ID: " + sellerId + ", Total Amount: " + totalAmount);
        });
    });

Stream indeed simplifies the code in some degree. But overall, it is still cumbersome and far less concise than SQL.

Kotlin

Kotlin, which claimed to be more powerful, improved furtherly:

val summary = orders
        .groupBy { it.orderDate.year }
        .mapValues { yearGroup ->
            yearGroup.value
                .groupBy { it.sellerId }
                .mapValues { sellerGroup ->
                    sellerGroup.value.sumOf { it.amount }
                }
        }
    summary.forEach { (year, salesMap) ->
        println("Year: $year")
        salesMap.forEach { (sellerId, totalAmount) ->
            println("  Seller ID: $sellerId, Total Amount: $totalAmount")
        }
    }

The Kotlin code is simpler, but the improvement is limited. There is still a big gap compared with SQL.

Scala

Then there was Scala:

val summary = orders
        .groupBy(order => order.orderDate.getYear)
        .mapValues(yearGroup =>
            yearGroup
                .groupBy(_.sellerId)
                .mapValues(sellerGroup => sellerGroup.map(_.amount).sum)
    )
    summary.foreach { case (year, salesMap) =>
        println(s"Year: $year")
        salesMap.foreach { case (sellerId, totalAmount) =>
            println(s"  Seller ID: $sellerId, Total Amount: $totalAmount")
        }
    }

Scala is a bit simpler than Kotlin, but still cannot be compared to SQL. In addition, Scala is too heavy and inconvenient to use.

In fact, these technologies are on the right path, though they are not perfect.

Compiled languages are non-hot-swappable
Additionally, Java, being a compiled language, lacks support for hot swapping. Modifying code necessitates recompilation and redeployment, often requiring service restarts. This results in a suboptimal experience when facing frequent changes in requirements. In contrast, SQL has no problem in this regard.

Java development is complicated, and there are also shortcomings in the framework. SQL has difficulty in meeting the requirements for framework. The dilemma is difficult to solve. Is there any other way?

The ultimate solution – esProc SPL
esProc SPL is a data processing language developed purely in Java. It has simple development and a flexible framework.

Concise syntax
Let’s review the Java implementations for the above grouping and aggregation operation:

Compare with the Java code, the SPL code is much more concise:

Orders.groups(year(orderdate),sellerid;sum(amount))

It is as simple as the SQL implementation:

SELECT year(orderdate),sellerid,sum(amount) FROM orders GROUP BY year(orderDate),sellerid

In fact, SPL code is often simpler than its SQL counterpart. With support for order-based and procedural computations, SPL is better at performing complex computations. Consider this example: compute the maximum number of consecutive rising days of a stock. SQL needs the following three-layer nested statement, which is difficult to understand, not to mention write.

select max(continuousDays)-1
  from (select count(*) continuousDays
    from (select sum(changeSign) over(order by tradeDate) unRiseDays
       from (select tradeDate,
          case when closePrice>lag(closePrice) over(order by tradeDate)
          then 0 else 1 end changeSign
          from stock) )
group by unRiseDays)

SPL implements the computation with just one line of code. This is even much simpler than SQL code, not to mention the Java code.

stock.sort(tradeDate).group@i(price&lt;price[-1]).max(~.len())

Comprehensive, independent computing capability
SPL has table sequence – the specialized structured data object, and offers a rich computing class library based on table sequences to handle a variety of computations, including the commonly seen filtering, grouping, sorting, distinct and join, as shown below:

Orders.sort(Amount) // Sorting
Orders.select(Amount*Quantity>3000 && like(Client,"*S*")) // Filtering
Orders.groups(Client; sum(Amount)) // Grouping
Orders.id(Client) // Distinct
join(Orders:o,SellerId ; Employees:e,EId) // Join
……

More importantly, the SPL computing capability is independent of databases; it can function even without a database, which is unlike the ORM technology that requires translation into SQL for execution.

Efficient and easy to use IDE
Besides concise syntax, SPL also has a comprehensive development environment offering debugging functionalities, such as “Step over” and “Set breakpoint”, and very debugging-friendly WYSIWYG result viewing panel that lets users check result for each step in real time.

Support for large-scale data computing
SPL supports processing large-scale data that can or cannot fit into the memory.

In-memory computation:

External memory computation:

We can see that the SPL code of implementing an external memory computation and that of implementing an in-memory computation is basically the same, without extra computational load.

It is easy to implement parallelism in SPL. We just need to add @m option to the serial computing code. This is far simpler than the corresponding Java method.

Seamless integration into Java applications
SPL is developed in Java, so it can work by embedding its JARs in the Java application. And the application executes or invokes the SPL script via the standard JDBC. This makes SPL very lightweight, and it can even run on Android.

Call SPL code through JDBC:

Class.forName("com.esproc.jdbc.InternalDriver");
    con= DriverManager.getConnection("jdbc:esproc:local://");
    st =con.prepareCall("call SplScript(?)");
    st.setObject(1, "A");
    st.execute();
    ResultSet rs = st.getResultSet();
    ResultSetMetaData rsmd = rs.getMetaData();

As it is lightweight and integration-friendly, SPL can be seamlessly integrated into mainstream Java frameworks, especially suitable for serving as a computing engine within microservice architectures.

Highly open framework
SPL’s great openness enables it to directly connect to various types of data sources and perform real-time mixed computations, making it easy to handle computing scenarios where databases are unavailable or multiple/diverse databases are involved.

Regardless of the data source, SPL can read data from it and perform the mixed computation as long as it is accessible. Database and database, RESTful and file, JSON and database, anything is fine.

Databases:

RESTful and file:

JSON and database:

Interpreted execution and hot-swapping
SPL is an interpreted language that inherently supports hot swapping while power remains switched on. Modified code takes effect in real-time without requiring service restarts. This makes SPL well adapt to dynamic data processing requirements.

This hot—swapping capability enables independent computing modules with separate management, maintenance and operation, creating more flexible and convenient uses.

SPL can significantly increase Java programmers’ development efficiency while achieving framework advantages. It combines merits of both Java and SQL, and further simplifies code and elevates performance.

SPL open source address

esproc Article's

30 articles in total

Add records that meet the criteria before each group after grouping ：From SQL to SPL

read article

Multi combination condition grouping and aggregation #eg93

read article

Split a Huge CSV File into Multiple Smaller CSV Files #eg69

read article

Group & Summarize a CSV File #eg68

read article

Getting positions of members according to primary key values #eg58