Logo

dev-resources.site

for different kinds of informations.

Exploring the Data Analysis: From Python Certification to the Elixir Challenge - Mean-Variance-Standard Deviation Calculator

Published at
12/27/2023
Categories
freecodecamp
elixir
datascience
machinelearning
Author
Herminio Torres
Exploring the Data Analysis: From Python Certification to the Elixir Challenge - Mean-Variance-Standard Deviation Calculator

Recently, I embarked on a learning journey, delving into Data Analysis and Data Science. Using tools such as Python, Numpy, Pandas, Matplotlib, and Seaborn, I decided to solidify my knowledge by undertaking the Data Analysis course at freeCodeCamp, aiming to achieve my first certification.

Simultaneously, I introduced myself to the universe of Elixir Nx. In this new journey, I explored tools like Nx, Explorer, and VegaLite, promising to add a unique dimension to my data analyses.

I will share a detailed tutorial in an upcoming post, providing practical insight into their application.

After completing the certification using Python, the idea of embracing a new challenge, I decided to tackle data analysis problems using the Elixir toolset exclusively.

My first project in this new phase is the Mean-Variance-Standard Deviation Calculator. An exciting opportunity to apply newly acquired concepts and explore the effectiveness of Elixir in handling complex statistical analysis challenges.

By sharing this experience, I hope to document my progress and encourage other enthusiasts to explore multiple languages and tools on their data analysis journey. I am eager to share the results and lessons learned throughout this exciting transition from Python to Elixir. Stay tuned for more updates and practical tutorials!

Setup

Mix.install([
  {:nx, "~> 0.6.4"}
])

Challenge

Mean-Variance-Standard Deviation Calculator.

Create a function named calculate() in the MeanVarStd module that uses Elixir Nx to output the mean, variance, standard deviation, max, min, and sum of the rows, columns, and elements in a 3 x 3 matrix.

The input of the function should be a list containing 9 digits. The function should convert the list into a 3 x 3 numerical array, and then return a map containing the mean, variance, standard deviation, max, min, and sum along both axes and for the flattened matrix.

The returned dictionary should follow this format:

%{
  mean: [axis1, axis2, flattened],
  variance: [axis1, axis2, flattened],
  standard_deviation: [axis1, axis2, flattened],
  max: [axis1, axis2, flattened],
  min: [axis1, axis2, flattened],
  sum: [axis1, axis2, flattened],
}

If a list containing less than 9 elements is passed into the function, it should raise a ValueError exception with the message: "List must contain nine numbers." The values in the returned map should be lists and not numerical arrays.

For example, calculate([0,1,2,3,4,5,6,7,8]) should return:

%{
  mean: [[3.0, 4.0, 5.0], [1.0, 4.0, 7.0], 4.0],
  variance: [
    [6.0, 6.0, 6.0],
    [0.6666666666666666, 0.6666666666666666, 0.6666666666666666],
    6.666666666666667
  ],
  standard_deviation: [
    [2.449489742783178, 2.449489742783178, 2.449489742783178],
    [0.816496580927726, 0.816496580927726, 0.816496580927726],
    2.581988897471611
  ],
  max: [[6, 7, 8], [2, 5, 8], 8],
  min: [[0, 1, 2], [0, 3, 6], 0],
  sum: [[9, 12, 15], [3, 12, 21], 36]
}

Solution

Create a custom exception:

defmodule ValueError do
  defexception message: "bad argument in arithmetic expression"
end

The MeanVarStr module:

defmodule MeanVarStd do
  def calculate(list) do
    if length(list) == 9 do
      tensor =
        list
        |> Nx.tensor()
        |> Nx.reshape({3, 3})

      %{
        mean: mean(tensor),
        variance: variance(tensor),
        standard_deviation: standard_deviation(tensor),
        max: max(tensor),
        min: min(tensor),
        sum: sum(tensor)
      }
    else
      raise ValueError
    end
  end

  defp mean(tensor) do
    [
      tensor
      |> Nx.mean(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.mean(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.mean()
      |> Nx.to_number()
    ]
  end

  defp variance(tensor) do
    [
      tensor
      |> Nx.variance(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.variance(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.variance()
      |> Nx.to_number()
    ]
  end

  defp standard_deviation(tensor) do
    [
      tensor
      |> Nx.standard_deviation(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.standard_deviation(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.standard_deviation()
      |> Nx.to_number()
    ]
  end

  defp max(tensor) do
    [
      tensor
      |> Nx.reduce_max(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.reduce_max(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.reduce_max()
      |> Nx.to_number()
    ]
  end

  defp min(tensor) do
    [
      tensor
      |> Nx.reduce_min(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.reduce_min(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.reduce_min()
      |> Nx.to_number()
    ]
  end

  defp sum(tensor) do
    [
      tensor
      |> Nx.sum(axes: [0])
      |> Nx.to_list(),
      tensor
      |> Nx.sum(axes: [1])
      |> Nx.to_list(),
      tensor
      |> Nx.sum()
      |> Nx.to_number()
    ]
  end
end

Test

ExUnit.start(auto_run: false)

defmodule ExampleTest do
  use ExUnit.Case, async: false

  # @tag :skip
  test "Expected different output when calling 'calculate()' with '[2,6,2,8,4,0,1,5,7]'" do
    actual = MeanVarStd.calculate([2, 6, 2, 8, 4, 0, 1, 5, 7])

    expected = %{
      max: [[8, 6, 7], [6, 8, 7], 8],
      mean: [
        [3.6666667461395264, 5.0, 3.0],
        [3.3333332538604736, 4.0, 4.333333492279053],
        3.8888888359069824
      ],
      min: [[1, 4, 0], [2, 0, 1], 0],
      standard_deviation: [
        [
          3.0912060737609863,
          0.8164966106414795,
          2.943920373916626
        ],
        [
          1.8856180906295776,
          3.265986442565918,
          2.494438409805298
        ],
        2.6434171199798584
      ],
      sum: [[11, 15, 9], [10, 12, 13], 35],
      variance: [
        [
          9.555554389953613,
          0.6666666865348816,
          8.666666984558105
        ],
        [
          3.555555582046509,
          10.666666984558105,
          6.222222805023193
        ],
        6.987654209136963
      ]
    }

    assert actual == expected
  end

  # @tag :skip
  test "Expected different output when calling 'calculate()' with '[9,1,5,3,3,3,2,9,0]'" do
    actual = MeanVarStd.calculate([9, 1, 5, 3, 3, 3, 2, 9, 0])

    expected = %{
      max: [[9, 9, 5], [9, 3, 9], 9],
      mean: [
        [
          4.666666507720947,
          4.333333492279053,
          2.6666667461395264
        ],
        [5.0, 3.0, 3.6666667461395264],
        3.8888888359069824
      ],
      min: [[2, 1, 0], [1, 3, 0], 0],
      standard_deviation: [
        [
          3.0912060737609863,
          3.399346351623535,
          2.054804801940918
        ],
        [3.265986442565918, 0.0, 3.858612298965454],
        3.034777879714966
      ],
      sum: [[14, 13, 8], [15, 9, 11], 35],
      variance: [
        [
          9.55555534362793,
          11.555556297302246,
          4.222222328186035
        ],
        [10.666666984558105, 0.0, 14.888888359069824],
        9.20987606048584
      ]
    }

    assert actual == expected
  end

  # @tag :skip
  test "List must contain nine numbers." do
    assert_raise ValueError, "bad argument in arithmetic expression", fn ->
      MeanVarStd.calculate([2, 6, 2, 8, 4, 0, 1])
    end
  end
end

Execute Test

PS: Remember to remove the @tag :skip.

ExUnit.run()
...
Finished in 0.00 seconds (0.00s async, 0.00s sync)
3 tests, 0 failures

Randomized with seed 433319

You can check out my livebook solution.

Featured ones: