What’s new

30.09.16The official opening.



The Video­Com­ple­tion pro­ject in­tro­duces the first bench­mark for video-com­ple­tion meth­ods. We pre­sent re­sults for dif­fer­ent meth­ods on a range of di­verse test se­quences which are avail­able for view­ing on a player equipped with a mov­able zoom re­gion. Ad­di­tion­ally, we pro­vide the re­sults of an ob­jec­tive analy­sis us­ing qual­ity met­rics that were care­fully se­lected in our study on video-com­ple­tion per­cep­tual qual­ity. We be­lieve that our work can help rank ex­ist­ing meth­ods and as­sist de­vel­op­ers of new gen­eral-pur­pose video-com­ple­tion meth­ods.

Data set

Our cur­rent data set con­sists of 7 video se­quences with ground-truth com­ple­tion re­sults. We con­sider ob­ject re­moval, so the test se­quences are con­structed by com­pos­ing var­i­ous fore­ground ob­jects over a set of back­ground videos. Some of these back­ground videos in­clude left-view se­quences from the stereo­scopic-video data set RMIT3dv [1]. As fore­ground ob­jects we use those em­ployed in the video-mat­ting bench­mark [2] as well as sev­eral 3D mod­els. To seam­lessly in­sert a 3D model in a back­ground video we use Blender [3] mo­tion-track­ing tools. Each video-com­ple­tion method takes the com­pos­ited se­quence and the cor­re­spond­ing ob­ject mask as in­put.

A 3D model inserted in the background video using motion tracking to construct a test sequence with ground truth.

Evaluation Methodology

Video-com­ple­tion re­sults are sel­dom ex­plic­itly ex­pected to ad­here to ground truth and are usu­ally judged only by their plau­si­bil­ity, which is as­sessed by a hu­man ob­server. It makes ob­jec­tive qual­ity as­sess­ment of video com­ple­tion an in­her­ent prob­lem. How­ever, by re­lax­ing the re­quire­ment of com­plete ad­her­ence to ground truth we can in­crease cor­re­la­tion with per­cep­tual com­ple­tion qual­ity. This bench­mark em­ploys four qual­ity met­rics: MS-DSSIM, MS-DSSIMdt, MS-CDSSIM, MS-CDSSIMdt. Thor­ough de­scrip­tion and com­par­a­tive analy­sis of these and other met­rics can be found in our pa­per (to be pub­lished soon).

MS-DSSIM met­ric mea­sures ad­her­ence of com­ple­tion re­sult V to ground truth video Vref in a multi-scale fash­ion with scale weights de­ter­mined us­ing per­cep­tual qual­ity data. It is based on the struc­tural sim­i­lar­ity in­dex (SSIM) [4] val­ues com­puted for all 9×9 lu­mi­nance patches P(x) within the spa­tio-tem­po­ral hole Equationnormal upper Omega.

Equationmultiline equation

Here su­per­script i de­notes the level of the Gauss­ian pyra­mid—that is, Vref0 is the orig­i­nal ground-truth video, and Vref1 is the video blurred and sub­sam­pled by a fac­tor of two in both spa­tial di­men­sions.

MS-DSSIMdt met­ric cap­tures tem­po­ral co­herency along ground-truth op­ti­cal-flow vec­tors Equations Subscript x Baseline equals left-parenthesis v x comma v y comma negative 1 right-parenthesis.

Equationmultiline equation

MS-CDSSIM re­lies on the as­sump­tion that com­ple­tion re­sult should be lo­cally sim­i­lar to the ground truth—that is, each patch P(x) within the spa­tio-tem­po­ral hole Equationnormal upper Omega should have a sim­i­lar ground-truth patch Pref(y).

Equationmultiline equation

MS-CDSSIMdt is a tem­po­ral sta­bil­ity met­ric that uses the same as­sump­tions as MS-CDSSIM. Es­sen­tially it cap­tures the changes in patch ap­pear­ance from frame to frame, as op­posed to eval­u­at­ing con­sis­tency with ground-truth op­ti­cal flow us­ing MS-DSSIMdt. To do so we find for a given patch the most sim­i­lar patch from the pre­vi­ous frame within a cer­tain win­dow, com­pute the dis­tances from these patches to the most sim­i­lar ground-truth patches and then com­pare the re­spec­tive dis­tances.

Equationmultiline equation

Here Equationnormal upper Omega Subscript p r e v Superscript w times w Baseline left-parenthesis x right-parenthesis de­notes a square win­dow of Equationw times w pix­els (we use Equationw equal to 1/​10th of the frame width) spa­tially cen­tered at Equationx and lo­cated in the pre­vi­ous frame.

Ex­act com­pu­ta­tion of MS-CDSSIM and MS-CDSSIMdt quickly be­comes im­prac­ti­cal for larger spa­tio-tem­po­ral holes, so we re­sort to ap­prox­i­mate so­lu­tions based on the Patch­Match [5] al­go­rithm.


We in­vite de­vel­op­ers of video-com­ple­tion meth­ods to use our bench­mark. We can eval­u­ate the sub­mit­ted data and re­port qual­ity scores to the de­vel­oper. In cases where the de­vel­oper specif­i­cally grants per­mis­sion, we will pub­lish the re­sults on our site. The test se­quences with the re­spec­tive com­ple­tion masks are avail­able for down­load: Deck, Li­brary, Foun­tain, Wires, Tower, Sky­scrap­ers, Sign.

For eval­u­a­tion re­quests or if you have any ques­tions or sug­ges­tions please feel free to con­tact us by email: abokov@graph­ics.cs.msu.ru.


Objective metric values
Background Reconstruction+ [6]1.90.22110.19220.21740.07010.09020.10820.0831
PFClean Remove Rig+ [7]2.70.30740.18710.07710.09420.14340.16340.1063
Planar Structure Guidancei [8]5.40.31850.60350.68260.24060.17750.30250.4386
Nuke F_RigRemoval+ [9]2.00.29120.21130.07820.12030.06810.09110.1042
Telea Inpaintingi [10]4.70.33360.62360.61450.20650.14130.13330.3675
Complex Scenesm [11]4.30.30730.25240.11630.16240.19560.35560.2374
Background Reconstruction+ [6]1.70.01310.00510.03640.00710.00720.00920.0041
PFClean Remove Rig+ [7]2.90.01830.00720.00310.01130.01350.01640.0092
Planar Structure Guidancei [8]6.00.15660.30160.45860.09260.06760.14560.1866
Nuke F_RigRemoval+ [9]2.90.02050.01240.00420.01240.00310.00610.0093
Telea Inpaintingi [10]4.40.01320.09250.19750.01650.01040.01950.0465
Complex Scenesm [11]3.10.01840.00930.00830.01120.00930.01630.0214
Background Reconstruction+ [6]2.00.11810.05630.11940.04010.05520.08120.0431
PFClean Remove Rig+ [7]2.40.14140.04510.02010.05620.08830.12240.0462
Planar Structure Guidancei [8]5.90.20060.29360.40960.15860.11860.22750.2886
Nuke F_RigRemoval+ [9]2.40.14030.07540.02720.07230.04110.06910.0503
Telea Inpaintingi [10]4.60.18350.28650.31350.13050.09240.10230.2125
Complex Scenesm [11]3.70.11920.05320.02830.09040.11750.27860.0994
Background Reconstruction+ [6]1.90.01510.00720.02440.00910.00920.01020.0071
PFClean Remove Rig+ [7]2.00.01720.00710.00410.01120.01530.01330.0092
Planar Structure Guidancei [8]6.00.07160.10560.12860.06060.03960.08160.0916
Nuke F_RigRemoval+ [9]2.60.02040.01540.00620.01430.00810.00910.0113
Telea Inpaintingi [10]4.90.02350.06550.09650.02750.01950.01640.0495
Complex Scenesm [11]3.70.01830.01130.00830.01840.01740.02350.0214

+ regions that weren't reconstructed by the algorithm were filled afterwards using Telea image inpainting [10]

m owing to prohibitively high memory consumption the test sequences were downscaled to 1280×720 resolution

i image inpainting algorithms

  1. Deck
  2. Library
  3. Fountain
  4. Wires
  5. Tower
  6. Skyscrapers
  7. Sign
  1. Source
  2. Mask
  3. BGR [6]
  4. PFClean [7]
  5. Planar [8]
  6. RigRemoval [9]
  7. Telea [10]
  8. Complex [11]
0 %
Note: Make sure you are using the latest version of your web browser (we recommend to use chromium-based web browsers)

Overall Plots



[1]E. Cheng, P. Bur­ton, J. Bur­ton, A. Jos­eski, and I. Bur­nett. RMIT3DV: Pre-an­nounce­ment of a cre­ative com­mons un­com­pressed HD 3D video data­base. Fourth In­ter­na­tional Work­shop on Qual­ity of Mul­ti­me­dia Ex­pe­ri­ence (QoMEX), pages 212–217, 2012.
[2]Mikhail Ero­feev, Yury Git­man, Dmitriy Va­tolin, Alexey Fe­dorov, Jue Wang. Per­cep­tu­ally Mo­ti­vated Bench­mark for Video Mat­ting. British Ma­chine Vi­sion Con­fer­ence (BMVC), pages 99.1–99.12, 2015. [ doi ,  pro­ject page ]
[3]Blender https://​www.blender.org/
[4]Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Si­mon­celli Im­age qual­ity as­sess­ment: from er­ror vis­i­bil­ity to struc­tural sim­i­lar­ity. IEEE Trans­ac­tions on Im­age Pro­cess­ing (TIP), pages 600–612, 2004.
[5]C. Barnes, E. Shecht­man, A. Finkel­stein, and D. Gold­man. Patch­Match: A ran­dom­ized cor­re­spon­dence al­go­rithm for struc­tural im­age edit­ing. ACM Trans­ac­tions on Graph­ics (TOG), 2009.
[6]YU­VSoft Back­ground Re­con­struc­tion http://​www.yu­vsoft.com/​stereo-3d-tech­nolo­gies/​back­ground-re­con­struc­tion/
[7]Pixel Farm PF­Clean http://​www.thep­ix­el­farm.co.uk/​pf­clean/
[8]J.-B. Huang, S. B. Kang, N. Ahuja, and J. Kopf. Im­age com­ple­tion us­ing pla­nar struc­ture guid­ance. ACM Trans­ac­tions on Graph­ics (TOG), 2014.
[9]The Foundry Nuke https://​www.the­foundry.co.uk/​prod­ucts/​nuke/
[10]A. Telea. An im­age in­paint­ing tech­nique based on the fast march­ing method. Jour­nal of graph­ics tools, pages 23–34, 2004.
[11]A. New­son, A. Al­mansa, M. Fradet, Y. Gousseau, and P. Perez Video in­paint­ing of com­plex scenes. SIAM Jour­nal on Imag­ing Sci­ences, pages 1993–2019, 2014.