A Unique Approach for Block Partitioning and Merging for Large-Scale
Structure-from-Motion using Apache Spark
Abstract: - In response to the growing demand for large-scale reconstructions, this paper addresses the scalabil-
ity challenges encountered by traditional Structure from Motion (SfM) methods. Our research aims to leverage
Apache Spark’s distributed computing capabilities to enhance the efficiency of SfM methodologies. The motiva-
tion behind this work lies in the increasing need for robust solutions capable of handling extensive reconstruction
tasks. To tackle this challenge, we propose a method that harnesses the advantages of Apache Spark, including
scalability, speed, fault-tolerance, flexibility, and ease of use. The abstracted problem centers around the limita-
tions inherent in Apache Spark’s traditional operations like maps, reduces, and joins. Our methodology focuses
on a block partitioning and merging strategy, strategically distributing the workload using Spark. Our paper also
presents experimental results showing the feasibility of our approach through the 3D reconstructions of multiple
datasets. The experiments were executed on a standalone Spark instance, demonstrating the potential of Apache
Spark in effectively distributing SfM workloads. In summary, this paper elucidates the necessity for addressing
scalability challenges in large-scale reconstructions, outlines the research goals, and details a method leveraging
Apache Spark to overcome limitations and enhance the efficiency of SfM.
Key-Words: - Structure-from-Motion, Apache Spark, 3D Reconstruction, Block partitioning and Merging,
Workload distribution, Large Scale 3D Reconstruction, Photogrammetry, Resilient Distributed Dataset.
Received: March 25, 2023. Revised: December 7, 2023. Accepted: December 24, 2023. Published: December 31, 2023.
1 Introduction
With the increasing number of applications needing
large-scale reconstructions, Structure-from-Motion
faces the problem of keeping up with these de-
manding workloads, [1]. Traditional Structure-from-
Motion approaches follow a sequential approach of
execution managing workloads by executing them se-
quentially, this is however inefficient at scale, [2]. In
more advanced Structure-from-Motion solutions, dis-
tribution algorithms are used to distribute workloads
across multiple computing resources. This often in-
volves dividing the main workload into smaller sub-
sets, referred to as blocks, that can be processed con-
currently on different computers or nodes, [3]. The
use of Apache Spark as a means of running workloads
in a distributed manner could yield several advantages
associated with Apache Spark, [4], such as:
Scalability: Apache Spark allows for the dis-
tributed processing of data across a cluster of
computers, enabling SfM solutions to scale up to
process larger datasets more efficiently.
Speed: The distributed nature of Apache Spark
allows SfM solutions to process data in parallel,
resulting in faster processing times.
Fault-tolerance: Apache Spark is fault-tolerant,
which means that it can recover from failures and
continue processing data without interruption.
Flexibility: Apache Spark supports multiple pro-
gramming languages, allowing SfM solutions to
be written in the language of choice if needed.
Ease of use: Apache Spark provides a simple and
intuitive programming model, making it easy to
write and debug SfM solutions.
Although Spark may seem like a optimal solution
for distributing SfM workloads it comes with a few
limitations, [4], such as only being able to perform
1L.A.H.NAURUNNA, 2S.C.PREMARATNE, 1T.N.D.S.GINIGE
1School of computing
Universal College Lanka
Colombo
SRI LANKA
2Faculty of Information Technology
University of Moratuwa
Colombo
SRI LANKA
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
581
Volume 18, 2023
operations in terms of maps, reduces, joins and a
few other operations. This may seem to limit the
reference to neighboring data or external data at first
glance but there are trivial work arounds to these
issues that can be utilized to use Apache Spark in the
distribution of SfM or 3D reconstruction workloads.
In a traditional workflow, the SfM dataset is
broken up into several smaller, more manageable
datasets and is sent to different machines for pro-
cessing, producing their own point cloud and camera
pose data, this data is then stitched together using
a 3D space similarity transformation, [5], Iterative
Closest Point or other methods, [6]. In our method
a similar approach will be taken but with the use of
spark. There are several differences in our method
that defer from other block partitioning and merging
algorithms using custom distribution techniques, [5].
2 Background and Related Work
2.1 Apache Spark
Apache Spark, developed at the University of Cali-
fornia, Berkeley in 2009, has emerged as a prominent
distributed computing framework for processing
large datasets, [4]. Facilitated by a resilient dis-
tributed dataset (RDD) abstraction, Spark ensures
efficient parallel processing and fault tolerance in
distributed environments.
Several studies highlight the efficacy of Apache
Spark in accelerating data processing and executing
computationally intensive algorithms. Notable ex-
amples include the Wale Optimization Algorithm
(WOA) study, [2], and the multiscale feature extrac-
tion and semantic classification of LiDAR point data,
[7].
In [2] Apache Spark was employed to implement
the WOA, utilizing maps and reduces for parallel
processing. The study compared Spark-based WOA
with traditional and Hadoop-based approaches,
demonstrating superior runtime and scalability in
Spark.
[7] illustrates the utilization of Apache Spark
and Cassandra for multiscale feature extraction
and semantic classification on point cloud data.
The study showcased promising results, affirming
Spark’s capability to handle complex scenarios and
algorithms efficiently.
2.2 Workload Distribution in
Structure-from-Motion (SfM)
Structure-from-Motion (SfM) stands as a widely-
used technique for constructing 3D models from 2D
images. The computational intensity of SfM, particu-
larly with larger datasets, has prompted research into
optimizing workload distribution.
In [5] a block partitioning and merging strategy
was proposed for efficient workload distribution in
SfM algorithms. This approach involves splitting im-
ages into overlapping subsets, processing them in par-
allel, and merging the results based on shared images
and tie points. The adoption of block partitioning and
merging significantly improved the time efficiency of
large-scale incremental structure-from-motion prob-
lems.
3 Proposed Method
In this proposed method, we will be taking several im-
ages and finally processing them into a sparse point
cloud, this approach currently will not use bundle
adjustment to reduce complexity but can always be
added in later studies. We implement our approach in
python using the pyspark API to utilize Apache Spark
in python. Taking advantage of Resilient Distributed
Datasets (RDDs) to handle the distribution process-
ing we process the image data to a point cloud in five
stages:
Image parallelization and preparation: Paral-
lelizing the image data into a ParallelCollec-
tionRDD and doing any basic image process-
ing as needed. Processing may include resiz-
ing, converting to grayscale or any other pro-
cesses preparing the images for the rest of the
SfM pipeline.
Feature detection and camera initialization: In
this step Features are detected using the SIFT fea-
ture detector and cameras are initialized with a
camera index, the image associated with the cam-
era, the key points, descriptors and the intrinsics
of the camera.
Camera partitioning: In this step the initialized
cameras are partitioned into overlapping blocks
of a pre-defined size for processing and overlap
to produce subsets.
3D point cloud generation: a point cloud in
the generated using traditional Structure-from-
motion techniques based on two initial seed cam-
eras and by registering the additional cameras in
the block via a Perspective-n-Point based camera
registration.
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
582
Volume 18, 2023
Collection and Merging: the blocks are then col-
lected after processing and merged together using
the overlapping camera’s adjusting the block’s
points and subsequent cameras accordingly.
Our approach stops here as the distribution strat-
egy of SfM workloads is the main focus but addition-
ally the point cloud can be further processed in Spark
after the collection and merging step to yield better
results. Local Bundle Adjustment (BA) can also be
done in the 3D point cloud generation stage to fur-
ther improve accuracy and furthermore global BA af-
ter the collection and merging stage.
3.1 3D Reconstruction
The creation of Three-dimensional (3D) reconstruc-
tions is the procedure of creating Three-dimensional
recreations of objects seen in the scene via the data
collected through equipment which may be a camera
or lidar sensor/scanner. In the context of this project,
3D Reconstruction via image data will be the main
focus. The 3D output for 3D Reconstruction can
vary depending on the context in which it is needed,
the base output that can be expected at a minimum
from a 3d reconstruction is a sparse point cloud of
the object, with further processing a dense point
cloud, geometric/mesh models can be produced.
Due to the cost effective and efficient nature of 3d
Reconstruction as a means to get a 3d representation
of an object it has been applied in fields such as
civil engineering, [8], capturing snapshots of historic
buildings, survey engineering, asset creation for
media, [9], and other applications.
In this section, our primary focus is to get an idea
on 3D Reconstruction Techniques from 2D Images,
providing an overview of general approaches, al-
gorithms, and techniques, giving us a good starting
point to choose the best approach for this study.
[10] serves as an excellent starting point, providing
valuable insights to guide our exploration.
In [10] it reviews 3d Reconstruction as a whole,
the algorithms as well as the various techniques used
in 3d Reconstruction. One of the things discussed is
the requirements/problems in 3d Reconstruction from
2d images, which are as follows, the calibration pa-
rameters for the cameras used, data associations be-
tween two or more images’ features, determining spa-
tial information of the features and producing a dense
reconstruction from the sparse data obtained. The
problem of obtaining a 3d reconstruction from a 2d
image is described as restoring the depth information
lost from the transformation from the 3d world to the
2d plane when taking a picture.
Figure 1: A simple overview of a SFM pipeline
Zc"u
v
1#=K[R T ]
Xw
Yw
Zw
1
=M
Xw
Yw
Zw
1
(1)
The equation above describes this transformation
as the camera projection from a point 3d point in
the world to 2d (and thus the loss of information)
with reference to the pinhole camera model. Zc
is is an arbitrary scaling perimeter, u,vdescribe
the image coordinates, Kis the camera intrinsics,
[R&T]describes the extrinsics and Mis the camera
matrix. The paper also describes Structure from
motion as one of the techniques that are used in
3d Reconstruction which will be touched on later.
Figure 1 shows a very basic outline of a SfM pipeline
The paper also mentions Multi-View Stereo, [11],
which focused on refining the 3D reconstruction
obtained from SfM by estimating detailed depth
information for each pixel in the images, leading to a
dense 3D representation.
In the review of 3d Reconstruction in [8] it goes
more into depth about MVS referencing, [12], and
their algorithm CMVS and PMVS and how they
can be used in conjunction with the output of SFM
algorithms to produce a dense reconstruction. Fur-
thermore, it elaborates on how mesh reconstruction
can be done with the PSR algorithm. The paper
also Illustrates the typical 3d reconstruction pipeline.
MVS, although a great addition to a 3d reconstruction
pipeline, is not considered with this paper as it devi-
ates from the main focus of exploring the feasibility
of using Apache Spark as a tool for distributing 3d
reconstruction workloads.
Although many 3d reconstruction methods exist,
Structure from Motion was chosen as it can be
fully parallelized at certain steps in the pipeline and
partially parallelized using the block partitioning
and merging approach to provide a final 3d point
reconstruction. Figure 2 outlines the pipeline for
classical SFM where it shows two flows in which
monocular images and stereo images can be used
to finally generate point clouds and how the point
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
583
Volume 18, 2023
Figure 2: An overview of a 3d reconstruction pipeline
that uses SfM
clouds can be processed to generate mesh models,
point cloud models and geometric models.
Within Structure from Motion (SfM) itself, there
are many approaches and pipelines which can ei-
ther improve or hinder the ability to parallelize the
pipeline. While numerous reviews cover diverse as-
pects of 3D reconstruction, our approach for SfM is
a very generic sequential SfM pipeline, which starts
with keypoint detection in each image, using the SIFT
descriptor, [13], for cross-image keypoint compari-
son, and employing random sampling and consensus
(RANSAC), [14], to discard outlier matches the es-
sential matrix is then calculated using the 8 point al-
gorithm. Our adaptation diverges from most classical
SfM pipelines like in [15] by excluding bundle adjust-
ment from the SfM pipeline. This choice to diverge
from this method was to achieve more parallelizabil-
ity as a global bundle adjustment is not possible with
block partitioning and merging. However a local bun-
dle adjustment with each block can be done but was
not considered for this study.
3.2 Image parallelization and preparation
The main goal of this stage is to parallelize and pre-
pare the image data for further processing. We first
parallelize the data with an appropriate number of par-
titions for the images, generally following the equa-
tion Pn= 4Cn, where Pnis the number of partition
and Cnis the number of total cores available in the
cluster. The data preparation step can be done using
a map with any processing needed to be done inside
the map function depending on the data-set.
3.3 Feature detection and camera
initialization
The main goal of this stage to initialize the cam-
eras with the appropriate data for the rest of the SfM
pipeline before partitioning into blocks. Feature de-
tection is an important step in any SfM pipeline, the
feature detector used is SIFT which can be computa-
tionally expensive and thus makes sense to continue
running parallelly. The SIFT features are computed
Figure 3: A generic SfM pipeline
and will become a part of the camera initialization in
the next step. The camera model used in our approach
is the pinhole camera model and as such the camera
calibration matrix, K, are defined as follows
"fxS cx
0fycy
0 0 1 #(2)
Where fx=F/Pxand fy=F/Py, describing
the focal length in pixels, where Fis the focal length
and Px,Pydescribing the size of a pixel in world
units. Cx,Cydescribes the optical center in pixels
and Sis the pixel skew. The cameras are initialized
with the image associated with the camera and the key
points, descriptors from the feature detector, the cali-
bration matrix and the index of the camera. The index
of the camera is taken as the frame index.
3.4 Structure from Motion (SfM)
Structure from Motion (Visual SfM) is a compu-
tational technique designed to unravel the three-
dimensional (3D) structure of a scene from a
collection of two-dimensional (2D) images.
Figure 3 describes a classical SfM pipeline which
gives the traditional way of doing sequential SfM with
no block partitioning and merging. The pipeline starts
off with the correspondences between the two images
extracted, then mentions the ways in which the funda-
mental matrix can be calculated, in our method uses
the same flow, we use the 8 point algorithm but with
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
584
Volume 18, 2023
RANSAC to get rid of outliers in the data before cal-
culation. With the fundamental matrix, the essential
matrix can be easily calculated by adding the cali-
bration parameters using E= (K2)TF K1where we
know K_1 = K_2because it is from the same cam-
era. Then we use Hartley’s method to extract R & T
to calculate the projection matrix and with linear tri-
angulation we achieve a sparse point cloud. With the
incorporation of block partitioning and merging the
steps are altered slightly with the addition of shuffles.
In more detail our pipeline follows the following steps
with block partitioning and merging:
3.4.1 Feature Detection and Matching
In the initial phase, distinctive features within each
image are identified using the SIFT feature detector,
[13]. These features serve as recognizable points in
the scene, such as corners or key landmarks. Sub-
sequently, corresponding features across different
images are matched, establishing associations that
allow for data association throughout the sequence.
RANSAC is also employed to detect and remove out-
liers and also lowe’s ratio test, [13], is also applied to
filter out non-discriminative matches. The outcome
of this step is a list of points that match between
each sequential image. The Feature detection part in
our block partitioning and merging strategy can be
all done in parallel, for the matching part an initial
shuffle is done breaking all the images along with
their detected features into chunks. Each chunk has
a predefined size and overlap and can be expressed
by the following equation, where N is the number of
chunks, C is the number of cameras, S is the chunk
size and O is the Overlap.
N= (CO)/(SO)(3)
Any remaining items are added as a remainder chunk
that might not be the predefined size.
3.4.2 Essential Matrix Calculation
The feature matching data can be used to then calcu-
late the essential matrix using the 8 point algorithm,
[16]. The essential matrix describes the relationship
between two calibrated cameras viewing the same
scene from different viewpoints. It encapsulates the
intrinsic parameters of the cameras as well as the rel-
ative pose between the two cameras. The following
equation describes the essential matrix.
x0TF x = 0 (4)
E= (K0
2)TF K1(5)
where K1=K2if it’s the same camera
The essential matrix as seen in the equation is an
encoding of the fundamental matrix with the intrinsic
calibration data. In the equation, E= (K0
2)TF K1,
Fis the fundamental matrix and K1is the calibration
of the first camera and K2is the calibration of the
second camera. x0TF x = 0 represents the epipolar
constraint in computer vision, where Fis the fun-
damental matrix, and xand x0are corresponding
points in two images taken by different cameras.
This process takes place within each chunk between
the first and the second camera.
3.4.3 Decomposition for Translation and
Rotation
The essential matrix can then be solved to give T and
R using the method proposed in [17], A. A brief out-
line of the method. Using the above-mentioned equa-
tion, E= (K0
2)TF K1, were
K1="fKu0u0
0fKvV0
0 0 1 #(6)
Then we perform an SVD of E as follows:
K1=Udiag(1,1,0)VT(7)
R and T have two solutions:
R1=UW V Tand R2=UW TVT(8)
T1=u3and T2=u3(9)
u3is the third column of U
W="01 0
100
001#(10)
The correct R and T are the parameters when used
can lead to 3d point that are In-front of the camera
when triangulated. In the Figure 4, the correct solu-
tion is the top left solution.
With the essential matrix for all the frames the
baseline and rotation can be obtained by decomposing
and solving as described above. With the baselines
and rotations, the path can be estimated by multiply-
ing each transformation R3×3T3×1
01×31one after the
other to obtain a relative path for the initial 2 cameras
are still kept within each chunk.
3.4.4 Relative Path Estimation
The obtained baseline and rotation matrices are used
to estimate a relative path, outlining the trajectory of
the camera through the scene.Note that this is for just
the first two cameras, we consider the first camera to
be at the world origin with a transformation matrix of
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
585
Volume 18, 2023
Figure 4: The four solutions from Hartley & Zisser-
man’s method
"1000
0100
0010#while the second camera has the trans-
formation matrix of R3×3T3×1
01×31once finally re-
solved. This sets the scale for each and every chunk
as the translation vector is a unit vector for the sec-
ond camera and subsequent camera’s added would be
relatively scaled to the initial baseline.
3.4.5 Triangulation for 3D Point Extraction
Key points identified in the two cameras can then
be triangulated to determine their 3D positions in
the scene. This step is done by using a linear trian-
gulation on all the corresponding points. This too
is to different scales relative to the initial baseline
produced by the initial two cameras.
3.4.6 Perspective-n-Point (PnP) for Camera
Registration
The Perspective-n-Point algorithm is employed to
register new cameras in the scene based on the 2D-
3D correspondences with the images and the point
cloud. This step adds camera’s to the graph whilst
also adding new points to the point cloud, the base-
line of the camera’s added by perspective-n-point will
have a base line relative to the initial baseline calcu-
lated.
3.4.7 Merging
In the proposed approach, the seed camera positions
and rotations of reference blocks are taken from
overlapping cameras in target blocks. The reference
block is transformed using the 3D space similarity
transformation model. For effective transformation,
an overlap of more than two cameras is required. If
the overlap is less than two cameras, an alternative
Perspective-n-Point (PnP) technique is employed to
determine the seed camera’s position and rotation
before transformation.
When the overlap is greater than two cameras,
it can be used to scale subsequent point clouds to
the first point cloud’s scale using an affine transfor-
mation. The transformation involves solving for an
affine transformation matrix to bring the points and
cameras of the subsequent point cloud to the same
scale and rotation as the reference point cloud. This
process is applied to all subsequent point clouds,
resulting in a fully connected graph of cameras and a
complete point cloud. All calculations are performed
in Homogeneous coordinates and will be described
in detail in Section 3.7.
3.4.8 Final Output
The final output is a sparse point cloud representing
the 3D structure and detailed camera extrinsics and
intrinsics for each frame. It is important to note that
while this framework provides a foundation for other
SfM pipelines with block partitioning and merg-
ing, additional enhancements, such as loop closing
algorithms, can be integrated for more advanced
applications.
3.5 Camera partitioning
In this stage the main goal is to partition the cameras
up into overlapping blocks of a pre-defined sized and
overlap to produce subsets of the original dataset for
further execution. The overlap needs to exist to facili-
tate the process of merging as the overlapping images
act as a tie point, [5]. Camera partitioning can be done
on spark through the following steps:
Generating a comprehensive associative array
where the id of the blocks is the key. And the val-
ues are the corresponding images in each block,
generated according to a pre-defined block size
and overlap
Each camera is given an index using Spark’s zip-
WithIndex() function
A Spark map() function is done where each im-
age is duplicated and stored with the index of the
blocks it appears in according to the index as-
signed in the last step
A Spark flatMap() function is executed to obtain
a list of cameras and their corresponding blocks.
The output from the previous step is flattened to
create this list. If a camera appears in multiple
blocks, it is listed more than once in this list.
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
586
Volume 18, 2023
A Spark groupByKey() function is finally run to
produce each block. Each block will contain the
block id and the list of cameras in the block.
3.6 Point cloud generation
In this stage the main goal is to produce sparse
point clouds from each associated block using SfM
techniques. While there are numerous approaches
to SfM, a detailed discussion of the specifics of the
methods is beyond the scope of this paper. However,
we will briefly outline the SfM approach used in our
approach although many other approaches can be
taken.
At this point we will find that each block has
a list of cameras and their respective camera data.
This camera data can be used like in a traditional
SfM pipeline to produce a sparse point cloud. In our
approach, a conventional approach is taken to, where
the detected features are matched between two corre-
sponding cameras and processed to get the essential
matrix of the initial pose of the 2 initial cameras.
The essential matrix can then be decomposed to get
four separate solution which can be disambiguated to
one solution. This solution can be used to initially
triangulate a sparse point cloud. Other cameras are
then registered using a Perspective-n-Point (PnP)
based camera registration. And thus, each block can
be processed out into a point cloud.
Once the block partitioning is completed, each
block will have a list of cameras and their corre-
sponding camera data. This camera data can be
used in a conventional SfM pipeline to produce a
sparse point cloud. In our approach, two cameras
are taken as seed cameras and an initial pose is
estimated. To produce this initial pose estimation,
the detected features are matched between the two
corresponding cameras, and the essential matrix of
the initial pose is computed. The essential matrix
is then decomposed into four separate solutions,
which are disambiguated to obtain one solution.
This solution is used to initially triangulate a sparse
point cloud. Subsequently, the other cameras are
registered using Perspective-n-Point (PnP) based
camera registration. This process is run parallelly for
each block, resulting in a point cloud for each block.
These point clouds will be the output for this stage
3.7 Collection and Merging
Once the block partitioning is complete the sparse
point clouds produced by each subset can be merged
using several methods such as with a least squares
solution where two point clouds with overlapping 3d
points are minimized such to minimize the squared
distance produced by trying to fit the reference point
cloud to the target point cloud. Another approach,
as proposed in [5] involves transforming a reference
block to match the targeted block using the 3D space
similarity transformation model. The transformation
parameters are calculated based on the shared points
between the two sub-blocks.
In our approach, the reference blocks seed cam-
era’s positions and rotations are taken from the target
blocks overlapping cameras and the reference block
is transformed accordingly using the 3D space sim-
ilarity transformation model. This method requires
the overlap to exceed two cameras in order to be
effective. Should the overlap be less than two, an
alternative approach utilizing the Perspective-n-Point
(PnP) technique must be utilized to determine the
position and rotation of the seed camera, prior to
transformation.
In the case of the overlap greater than two this
overlap can be used to scale each of the subsequent
point clouds to the first point clouds scale using an
affine transformation. The transformation in detail
means, if we have the base point cloud Ain it’s
own chunk with it’s camera’s Aa, Ba, Ca, Daand we
have point cloud B in the neighbouring chunk with
it’s camera’s Cb, Db, Eb, Fb, where Caand Dais the
same camera as Cband Dbrespectively as the chunks
overlap cameras. We can figure out the transforma-
tion needed to bring point cloud B’s points and cam-
era’s to be at the same scale as rotation as A by figur-
ing out an affine transformation Tthat can be solved
for by solving AT=TBTwhere ATis the base-
line from Cato Daand BTis the baseline from Cb
to Da. This Tis then used to transform the entire
point cloud Binto the same space as Aand then can
be appended onto Aby replacing Cawith Cband Da
with Dband their newly transformed points. Note that
all the calculations are done in Homogeneous coordi-
nates. Doing this process to all the subsequent point
clouds should yield a fully connected graph of cam-
era’s and full point cloud.
4 Experiments are Results
The experiments conducted in this section will be
done to provide evidence that our approach works
and is a feasible solution. However, it is important
to note that we do not compare the time efficiency
of our approach to other methods, as we were not
able to access a spark cluster for performance testing
at the time of writing. Any performance test done
on a standalone Spark instance would not be a fair
comparison and thus will not be carried out. The
experiments will include 3D reconstructing point
clouds for the observatory and statue data-sets using
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
587
Volume 18, 2023
Figure 5: Observatory data-set
Figure 6: Observatory point cloud
our proposed distribution approach. The experiments
will be run on a standalone Spark instance.
4.1 Observatory data-set
The observatory data-set is a set of undistorted
images with known calibration parameters of an
observatory. The data-set consists of twenty-seven
non-aerial images of the observatory taken from a
DSLR camera. For this experiment, each image is
scaled down to forty percent of its original size and
processed using our approach. The results can be
seen in Figure 5,)LJXUH 6 and)LJXUH 7
4.2 Statue data-set
The statue dataset, similar to the observatory dataset,
is a dataset of undistorted images with known camera
parameters. It contains ten images of a statue. These
Figure 7: Observatory point cloud
images are scaled to forty percent its original for
easier and faster processing. The results can be seen
in Figure 8,)LJXUH 9 and)LJXUH 10.
5 Discussion
The results obtained from the experiments conducted
on the observatory and statue datasets demonstrate
the feasibility of our proposed approach for large-
scale 3D reconstruction using Apache Spark. The
successful generation of 3D point clouds from these
datasets provides evidence that our methodology can
effectively distribute Structure from Motion (SfM)
workloads across a Spark platform.
The observatory dataset, consisting of twenty-
seven non-aerial images, was processed using our
approach, resulting in a detailed 3D point cloud of
the observatory. Similarly, the statue dataset, com-
prising ten images, was also successfully processed
to generate a 3D point cloud of the statue. These
results indicate that our approach can handle varying
dataset sizes and complexities, making it a versatile
solution for large-scale 3D reconstruction tasks.
However, it is important to note that the per-
formance of our approach was not compared with
other methods due to the unavailability of a Spark
cluster at the time of writing. While the experiments
conducted provide evidence of the feasibility of our
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
588
Volume 18, 2023
Figure 8: Statue data-set
Figure 9: Statue point cloud
Figure 10: Statue point cloud
approach, a comprehensive performance evaluation
would require a comparison with other methods on a
Spark cluster. This would allow for a fair assessment
of the efficiency and scalability of our approach in
a distributed computing environment. Despite this
limitation, the results obtained from our experiments
provide a promising foundation for future research.
The successful application of Apache Spark for
distributing SfM workloads opens up new possibil-
ities for enhancing the efficiency and scalability of
large-scale 3D reconstruction tasks. Future work
could focus on integrating additional enhancements,
such as loop closing algorithms, to further improve
the accuracy and detail of the 3D reconstructions.
In conclusion, our proposed approach for dis-
tributed 3D point cloud reconstruction using Apache
Spark demonstrates potential for large-scale recon-
structions. The approach leverages the advantages
of Apache Spark, including scalability, speed, fault-
tolerance, flexibility, and ease of use, to overcome
the limitations inherent in traditional SfM methods.
While further research is needed to fully assess the
performance of our approach, the results obtained
from our experiments provide a promising starting
point for future developments in this field.
6 Conclusion
Based on the results presented in this report, our
proposed approach for distributed 3D point cloud
reconstruction using Apache Spark demonstrates
feasibility and promise for large-scale reconstruc-
tions. The approach enables efficient distribution of
data processing and utilization of parallel computing
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
589
Volume 18, 2023
resources, which is crucial for reducing the compu-
tational time. Although the approach’s performance
wasn’t able to compared with other methods due
to the unavailability of a spark cluster at the time
of writing, the experiments provide evidence that
Apache Spark is a feasible solution for distributing
SfM workloads.
The project has successfully achieved its
objectives of designing and implementing a
photogrammetry-based system for creating ac-
curate and detailed 3D models from 2D images in
a distributed manner. The testing of the system
has shown that it can distribute SfM workloads in
a robust and reliable way, with functional testing
demonstrating this. Additionally, the non-functional
testing has shown that the system is capable of
running medium to small data sets in a reasonable
time-frame making the method suitable for commer-
cial use with more improvement to the underlying
SfM used. In summary, this project has demonstrated
the effectiveness of Apache spark and its capability in
the distribution of Structure from Motion workloads.
7 Further work
While this research has achieved its objectives and
made good progress in the field of distributed 3d
Reconstruction, there are still limitations and areas
for future work to be considered. One limitation
of this research is that the project does not utilize
bundle adjustment globally or locally as well as
lacking a robust merging solution. Additionally,
while the block partitioning used in this project was
effective, it may not be the most efficient or accurate
method for 3D Reconstruction. Other methods that
could be explored in future work include improving
the underlying SfM algorithms or incorporating
something like COLMAP on spark for better results.
This research gives a good starting point for fur-
ther research into using Apache Spark in the field of
3d Reconstruction. Overall, this project has made
good progress in the distribution of Structure from
motion, but there is still much to be explored and im-
proved upon. With further research and development,
the findings of this project could have important ap-
plications in various industries and fields.
Acknowledgment:
We would also like to acknowledge Universal
College Lanka and the University of Central
Lancashire for providing the opportunity to pursue
our academic goals and engage in research.
References:
[1] Snavely, N., Seitz, S.M., & Szeliski, R. (2008).
Modeling the world from internet photo collec-
tions. International journal of computer vision,
80, 189-210.
[2] AlJame, M., Ahmad, I. and Alfailakawi, M.,
2020. Apache spark implementation of whale
optimization algorithm. Cluster Computing, 23,
pp.2021-2034
[3] Jiang, S., Jiang, C. and Jiang, W., 2020. Efficient
structure from motion for large-scale UAV im-
ages: A review and a comparison of SfM tools.
ISPRS Journal of Photogrammetry and Remote
Sensing, 167, pp.230-251
[4] Zaharia, M., Chowdhury, M., Franklin, M.J.,
Shenker, S. and Stoica, I., 2010. Spark: Cluster
computing with working sets. In: Proceedings of
the 2Nd USENIX Conference on Hot Topics in
Cloud Computing. HotCloud’10.
[5] Lu, L., Zhang, Y. and Liu, K., 2019. Block par-
titioning and merging for processing large-scale
structure from motion problems in distributed
manner. IEEE Access, 7, pp.114400-114413
[6] Pomerleau, F., Colas, F., Siegwart, R. and Mag-
nenat, S., 2013. Comparing ICP variants on real-
world data sets: Open-source library and ex-
perimental protocol. Autonomous Robots, 34,
pp.133-148
[7] Singh, S. and Sreevalsan-Nair, J., 2020, Decem-
ber. A distributed system for multiscale feature
extraction and semantic classification of large-
scale LiDAR point clouds. In 2020 IEEE India
Geoscience and Remote Sensing Symposium (In-
GARSS) (pp. 74-77). IEEE
[8] Ma, Z., & Liu, S. (2018). A review of 3D recon-
struction techniques in civil engineering and their
applications. Advanced Engineering Informatics,
37, 163-174.
[9] Quixel, (2023). 3D world-building made easy.
Quixel. 2023. Retrieved February 6, 2023, from
https://quixel.com/
[10] Aharchi, M., & Ait Kbir, M. (2020). A re-
view on 3D reconstruction techniques from 2D
images. InInnovations in Smart Cities Applica-
tions Edition 3: The Proceedings of the 4th Inter-
national Conference on Smart City Applications
4(pp. 510-522). Springer International Publish-
ing.
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
590
Volume 18, 2023
[11] Furukawa, Y., & Hernández, C. (2015).
Multi-view stereo: A tutorial.Foundations and
Trends® in Computer Graphics and Vision,9(1-
2), 1-148.
[12] Furukawa, Y., Curless, B., Seitz, S. M., &
Szeliski, R. (2010, June). Towards internet-scale
multi-view stereo. In2010 IEEE computer soci-
ety conference on computer vision and pattern
recognition(pp. 1434-1441). IEEE.
[13] Lowe, D. G. Distinctive Image Features from
Scale-Invariant Keypoints. International Journal
of Computer Vision, 20(2):91– 110, November
2004
[14] Bolles, R. C., & Fischler, M. A. (1981, August).
A RANSAC-based approach to model fitting and
its application to finding cylinders in range data.
In IJCAI (Vol. 1981, pp. 637-643).
[15] Snavely, Seitz and Szeliski (2006): Snavely, N.,
Seitz, S. M., & Szeliski, R. (2006). Photo tourism:
exploring photo collections in 3D. In ACM sig-
graph 2006 papers (pp. 835-846).
[16] Longuet-Higgins, H. C. (1981). A computer al-
gorithm for reconstructing a scene from two pro-
jections. Nature,293(5828), 133-135.
[17] Hartley, R., & Zisserman, A. (2003).Multiple
view geometry in computer vision. Cambridge
university press.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present re-
search, at all stages from the formulation of the prob-
lem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflicts of Interest
The authors have no conflicts of interest to
declare that are relevant to the content of this
article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on SYSTEMS and CONTROL
DOI: 10.37394/23203.2023.18.60
L. A. H. Naurunna, S. C. Premaratne,
T. N. D. S. Ginige
E-ISSN: 2224-2856
591
Volume 18, 2023