Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Tags: Architectures Distributions Performance Style Tests
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

Architectures Distributions Performance Style Tests
Tony Hutter
Add enclosure_symlinks option to vdev_id

Add an 'enclosure_symlinks' option to vdev_id.conf.  This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf.  The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>.  The links make it
make it easy to run sg_ses on a particular enclosure device.  The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.

'enclosure_symlinks' is only valid in sas_direct configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Simon Guest <simon.guest@tesujimath.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8194
Tom Caputi
Fix zap_update() ASSERT from ztest

This patch simply removes an invalid assert from the zap_update()
function. The ASSERT is invalid because it does not hold the zap
lock from the time it fetches the old value to the time it confirms
that it is what it should be.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8209
Brian Behlendorf
ztest: ENOSPC in ztest_objset_destroy_cb()

While unlikely it is possible for dsl_destroy_head() to return
ENOSPC in the ztest_objset_destroy_cb().  This can occur even
when ZFS_SPACE_CHECK_DESTROY is used with the dsl_sync_task().
Both the existence of a checkpoint and pending deferred frees
can cause this.

Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8206
Paul Dagnelie
OpenZFS 9559 - zfs diff handles files on delete queue in fromsnap poorly

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Joshua M. Clulow <josh@sysmgr.org>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9559
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d7e45412
Closes #8211
Andriy Gapon
OpenZFS 9630 - add lzc_rename and lzc_destroy to libzfs_core

Porting Notes:
* Additional changes to recv_rename_impl() were required due to
  encryption code not being merged in OpenZFS yet.
* libzfs_core python bindings (pyzfs) were updated to fully support
  both lzc_rename() and lzc_destroy()

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: loli10K <ezomori.nozomu@gmail.com>

OpenZFS-issue: https://www.illumos.org/issues/9630
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/049ba63
Closes #8207
Brian Behlendorf
OpenZFS 9559 - zfs diff handles files on delete queue in fromsnap poorly

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Joshua M. Clulow <josh@sysmgr.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9559
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d7e45412

Pull-request: #8211 part 1/1
Ben Cordero
Add `cut` binary to the initramfs

Since the `cut -b` command is used by `parse-zfs.sh`,
ensure that it is copied to the initramfs.

Fix spl_hostid when set by cmdline. This follows a
similar logic from the `zgenhostid` script, using `echo`
instead of `printf`.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ben Cordero <bencord0@condi.me>
Closes #8197
Tom Caputi
Fix resilver writes in vdev_indirect_io_start

This patch addresses an issue found in ztest where resilver
write zios that were passed to an indirect vdev would end up
being handled as though they were resilver read zios. This
caused issues where the zio->io_abd would be both read to
and written from at the same time, causing asserts to fail.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8193
Tom Caputi
ZTS: fix wait_scrubbed()

Currently, wait_scrubbed() is the only function of its kind that
accepts a timeout, which is 10s by default. This timeout is pretty
short for a scrub and causes test failures if we run too long. This
patch remves the timeout, instead leaning on the global test suite
timeout to ensure the tests keep moving.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Pull-request: #8210 part 1/1
Tom Caputi
Fix zap_update() ASSERT from ztest

This patch simply removes an invalid assert from the zap_update()
function. The ASSERT is invalid because it does not hold the zap
lock from the time it fetches the old value to the time it confirms
that it is what it should be.

Signed-off-by: Tom Caputi <tcaputi@datto.com>

Pull-request: #8209 part 1/1
loli10K
OpenZFS 9630 - add lzc_rename and lzc_destroy to libzfs_core

Porting Notes:
* Additional changes to recv_rename_impl() were required due to
  encryption code not being merged in OpenZFS yet.
* libzfs_core python bindings (pyzfs) were updated to fully support
  both lzc_rename() and lzc_destroy()

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: loli10K <ezomori.nozomu@gmail.com>

OpenZFS-issue: https://www.illumos.org/issues/9630
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/049ba63

Pull-request: #8207 part 1/1
Brian Behlendorf
ztest: scrub verification and ddt repair fix

By design ztest will never inject non-repairable damage in to the
pool.  Update the ztest_scrub() test case such that it waits for
the scrub to complete and verifies the pool is always repairable.

After enabling scrub verification several failures were observed.
These were determined to be:

1) The pool must be scrubbed prior to detaching a mirror device.
  Failure to do so can potentially lock in data corruption which
  was injected to remaining half of the mirror and wasn't detected
  prior to the detach.

2) The child/offset selection logic in ztest_fault_inject() depends
  on the calculated number of leaves always remaining constant
  between injection passes.  This is true within a single execution
  of ztest, but when using zloop.sh random values are selected for
  each restart.  Therefore, when ztest imports an existing pool
  it must be scrubbed before failure injection can be enabled.

3) The ztest_ddt_repair() test was determined be able to damage the
  pool in a fashion non-repairable by the scrub.  The root cause
  was identified to be the ddt_bp_create() function called by
  dsl_scan_ddt_entry() which did not set the dedup bit of the
  generated block pointer.

  The consequence of this was that the ZIO_DDT_READ_PIPELINE was
  never enabled for the block pointer during the scrub, and the
  dedup ditto repair logic was never run.  Note that for demand
  reads which don't rely on ddt_bp_create() the required pipeline
  stages would be enabled and the repair performed.

  This was resolved by unconditionally setting the dedup bit in
  ddt_bp_create().  This way all codes paths which may need to
  perform a repair from a block pointer generated from the dtt
  entry will be able too.  The only exception is that the dedup
  bit is cleared in ddt_phys_free() which is required to avoid
  leaking space.

4) Increase the default allowed number of reconstruction attempts.
  There's not an exact right number for this setting.  It needs
  to be set large enough to cover any realistic failure scenarios
  and small enough to avoid stalling the IO pipeline and invoking
  the dead man detection.

  The current value of 256 was empirically determined to be to
  low based on multi-day runs of ztest.  The fault injection code
  would inject more damage than could be reconstructed given the
  low number of attempts.  However, in all observed cases the
  block could be reconstructed using a slightly higher limit.

  Based on local testing increasing the default value to 8192 was
  determined to strike the best balance.  Checking all combinations
  takes less than 10s in the worst case, and has so far eliminated
  the false positives detected by ztest.  This delay is roughly on
  par with how long retries may be performed to a misbehaving HDD
  and was deemed to be reasonable.  Better to err on the side of
  a brief delay rather than fail to reconstruct the data.

Lastly, the -Y flag has been added to zdb to make it easy to try all
possible combinations when performing split block reconstruction.
For badly damaged blocks with 18 splits, they can be fully enumerated
within a few minutes.  This has been done to ensure permanent errors
are never incorrectly reported when ztest verifies the pool with zdb.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Pull-request: #8203 part 1/1
Brian Behlendorf
ztest: scrub verification and ddt repair fix

By design ztest will never inject non-repairable damage in to the
pool.  Update the ztest_scrub() test case such that it waits for
the scrub to complete and verifies the pool is always repairable.

After enabling scrub verification several failures were observed.
There were determined to be:

1) The pool must be scrubbed prior to detaching a mirror device.
  Failure to do so can potentially lock in data corruption which
  was injected to remaining half of the mirror and wasn't detected
  prior to the detach.

2) The child/offset selection logic in ztest_fault_inject() depends
  on the calculated number of leaves always remaining constant
  between injection passes.  This is true within a single execution
  of ztest, but when using zloop.sh random values are selected for
  each restart.  Therefore, when ztest imports an existing pool
  it must be scrubbed before failure injection can be enabled.

3) The ztest_ddt_repair() test was determined be able to damage the
  pool in a fashion non-repairable by the scrub.  The root cause
  was identified to be the ddt_bp_create() function called by
  dsl_scan_ddt_entry() which did not set the dedup bit of the
  generated block pointer.

  The consequence of this was that the ZIO_DDT_READ_PIPELINE was
  never enabled for the block pointer during the scrub, and the
  dedup ditto repair logic was never run.  Note that for demand
  reads which don't rely on ddt_bp_create() the required pipeline
  stages would be enabled and the repair performed.

  This was resolved by unconditionally setting the dedup bit in
  ddt_bp_create().  This way all codes paths which may need to
  perform a repair from a block pointer generated from the dtt
  entry will be able too.  The only exception is that the dedup
  bit is cleared in ddt_phys_free() which is required to avoid
  leaking space.

4) Increase the default allowed number of reconstruction attempts.
  There's not an exact right number for this setting.  It needs
  to be set large enough to cover any realistic failure scenarios
  and small enough to avoid stalling the IO pipeline and invoking
  the dead man detection.

  The current value of 256 was empirically determined to be to
  low based on multi-day runs of ztest.  The fault injection code
  would inject more damage than could be reconstructed given the
  low number of attempts.  However, in all observed cases the
  block could be reconstructed using a slightly higher limit.

  Based on local testing increasing the default value to 8192 was
  determined to strike the best balance.  Checking all combinations
  takes less than 10s in the worst case, and has so far eliminated
  the false positives detected by ztest.  This delay is roughly on
  par with how long retries may be performed to a misbehaving HDD
  and was deemed to be reasonable.  Better to err on the side of
  a brief delay rather than fail to reconstruct the data.

Lastly, the -Y flag has been added to zdb to make it easy to try all
possible combinations when performing split block reconstruction.
For badly damaged blocks with 18 splits, they can be fully enumerated
within a few minutes.  This has been done to ensure permanent errors
are never incorrectly reported when ztest verifies the pool with zdb.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Pull-request: #8203 part 1/1
  • Debian 8 arm (BUILD): cloning zfs -  stdio
  • Debian 8 ppc (BUILD): cloning zfs -  stdio
  • Fedora 29 x86_64 (BUILD): cloning zfs -  stdio
  • Debian 9 x86_64 (TEST): removed zfs failed -  stdio
Ben Cordero
dracut: fix initramfs hostid

This follows a similar logic from the `zgenhostid` script,
using `echo` instead of `printf`.

Signed-off-by: Ben Cordero <bencord0@condi.me>

Pull-request: #8197 part 3/3
Ben Cordero
dracut: fix spl_hostid when set by cmdline

This follows a similar logic from the `zgenhostid` script,
using `echo` instead of `printf`.

Signed-off-by: Ben Cordero <bencord0@condi.me>

Pull-request: #8197 part 2/3
Tony Hutter
Add enclosure_symlinks option to vdev_id

Add an 'enclosure_symlinks' option to vdev_id.conf.  This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf.  The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>.  The links make it
make it easy to run sg_ses on a particular enclosure device.  The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.

'enclosure_symlinks' is only valid in sas_direct configurations.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #8194 part 1/1
Tony Hutter
Address behlendorf's comments

Pull-request: #8194 part 2/2
Tony Hutter
Add enclosure_symlinks option to vdev_id

Add an 'enclosure_symlinks' option to vdev_id.conf.  This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf.  The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>.  The links make it
make it easy to run sg_ses on a particular enclosure device.  The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.

'enclosure_symlinks' is only valid in sas_direct configurations.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #8194 part 1/2
loli10K
ZVOLs should not be allowed to have children

zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.

Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ENOTDIR).

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Pull-request: #8181 part 1/1
Brad Lewis
OpenZFS 9284 - arc_reclaim_thread has 2 jobs

Authored by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Tim Kordas <tim.kordas@joyent.com>
Ported-by:  Brad Lewis <brad.lewis@delphix.com>
Signed-off-by: Brad Lewis <brad.lewis@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/9284
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de753e34f9

Pull-request: #8165 part 1/1
Brian Behlendorf
Consolidate arc_summary test case

Since we're only installing one version of arc_summary we only
need one test case.  Update the test to determine which version
is available and then test its supported flags.

Remove files for misc tests which should have been cleaned up.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Pull-request: #8096 part 10/10
Brian Behlendorf
Allow test runner to use python 2 or 3

Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>

Pull-request: #8096 part 9/10
Brian Behlendorf
Update Python packaging

Almost all of the Python code in the respository has been updated
to be compatibile with Python 2.6, Python 3.4, or newer.  The only
exceptions are arc_summery3.py which requires Python 3, and pyzfs
which requires at least Python 2.7.  This allows us to maintain a
single version of the code and support most default versions of
python.  This change does the following:

* Sets the default shebang for all Python scripts to python3.  If
  only Python 2 is available, then at install time scripts which
  are compatible with Python 2 will have their shebangs replaced
  with /usr/bin/python.  This is done for compatibility until
  Python 2 goes end of life.  Since only the installed versions
  are changed this means Python 3 must be installed on the system
  for test-runner when testing in-tree.

* Added --with-python=<2|3|3.4,etc> configure option which sets
  the PYTHON environment variable to target a specific python
  version.  By default the newest installed version of Python
  will be used or the preferred distribution version when
  creating pacakges.

* Fixed --enable-pyzfs configure checks so they are run when
  --enable-pyzfs=check and --enable-pyzfs=yes.

* Enabled pyzfs for Python 3.4 and newer, which is now supported.

* Renamed pyzfs package to python<VERSION>-pyzfs and updated to
  install in the appropriate site location.  For example, when
  building with --with-python=3.4 a python34-pyzfs will be
  created which installs in /usr/lib/python3.4/site-packages/.

* Renamed the following python scripts according to the Fedora
  guidance for packaging utilities in /bin

  - dbufstat.py    -> dbufstat
  - arcstat.py      -> arcstat
  - arc_summary.py  -> arc_summary
  - arc_summary3.py -> arc_summary3

* Updated python-cffi package name.  On CentOS 6, CentOS 7, and
  Amazon Linux it's called python-cffi, not python2-cffi.  For
  Python3 it's called python3-cffi or python3x-cffi.

* Install one version of arc_summary.  Depending on the version
  of Python available install either arc_summary2 or arc_summary3
  as arc_summary.  The user output is only slightly different.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Pull-request: #8096 part 8/10
Brian Behlendorf
pyzfs python3 support (7): relative import fixes

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
TEST_ZFSTESTS_TAGS="pyzfs"

Pull-request: #8096 part 7/10
Brian Behlendorf
pyzfs python3 support (6): revert unicode removals

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 6/10
Brian Behlendorf
pyzfs python3 support (5): integer division

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 5/10
Brian Behlendorf
pyzfs python3 support (4): compatible changes

These changes are slightly less idomatic python, but
are valid and efficient in both python 2 and 3.

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 4/10
Brian Behlendorf
pyzfs python3 support (3): compatible changes

Revert unnecessary 2to3 changes, and work around iterator changes.

These changes are efficient and valid in python 2 and 3. For the
most part, they are also pythonic.

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 3/10
Brian Behlendorf
pyzfs python3 support (2): add __future__ imports

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 2/10
Brian Behlendorf
pyzfs python3 support (1): 2to3 conversion

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Requires-builders: none

Pull-request: #8096 part 1/10
George Wilson
initialize performance improvements

Pull-request: #7955 part 2/2
George Wilson
OpenZFS 9102 - zfs should be able to initialize...

storage devices

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb877c0bfb01c9ce117d0e7c1ac272e4
OpenZFS-issue: https://www.illumos.org/issues/9102

PROBLEM
========

The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.

SOLUTION
=========

This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.

When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
        - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
                - start, suspend, or cancel initialization
        - Creates new open-context thread for each vdev
        - Thread iterates through all metaslabs in this vdev
        - Each metaslab:
                - select a metaslab
                - load the metaslab
                - mark the metaslab as being zeroed
                - walk all free ranges within that metaslab and translate
                  them to ranges on the leaf vdev
                - issue a "zeroing" I/O on the leaf vdev that corresponds to
                  a free range on the metaslab we're working on
                - continue until all free ranges for this metaslab have been
                  "zeroed"
                - reset/unmark the metaslab being zeroed
                - if more metaslabs exist, then repeat above tasks.
                - if no more metaslabs, then we're done.

        - progress for the initialization is stored on-disk in the vdev’s
          leaf zap object. The following information is stored:
                - the last offset that has been initialized
                - the state of the initialization process (i.e. active,
                  suspended, or canceled)
                - the start time for the initialization

        - progress is reported via the zpool status command and shows
          information for each of the videos that are initializing

Porting notes:

Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".

Pull-request: #7955 part 1/2
George Wilson
initialize performance improvements

Pull-request: #7955 part 2/2
George Wilson
OpenZFS 9102 - zfs should be able to initialize...

storage devices

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb877c0bfb01c9ce117d0e7c1ac272e4
OpenZFS-issue: https://www.illumos.org/issues/9102

PROBLEM
========

The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.

SOLUTION
=========

This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.

When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
        - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
                - start, suspend, or cancel initialization
        - Creates new open-context thread for each vdev
        - Thread iterates through all metaslabs in this vdev
        - Each metaslab:
                - select a metaslab
                - load the metaslab
                - mark the metaslab as being zeroed
                - walk all free ranges within that metaslab and translate
                  them to ranges on the leaf vdev
                - issue a "zeroing" I/O on the leaf vdev that corresponds to
                  a free range on the metaslab we're working on
                - continue until all free ranges for this metaslab have been
                  "zeroed"
                - reset/unmark the metaslab being zeroed
                - if more metaslabs exist, then repeat above tasks.
                - if no more metaslabs, then we're done.

        - progress for the initialization is stored on-disk in the vdev’s
          leaf zap object. The following information is stored:
                - the last offset that has been initialized
                - the state of the initialization process (i.e. active,
                  suspended, or canceled)
                - the start time for the initialization

        - progress is reported via the zpool status command and shows
          information for each of the videos that are initializing

Porting notes:

Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".

Pull-request: #7955 part 1/2
George Wilson
OpenZFS 9102 - zfs should be able to initialize...

storage devices

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb877c0bfb01c9ce117d0e7c1ac272e4
OpenZFS-issue: https://www.illumos.org/issues/9102

PROBLEM
========

The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.

SOLUTION
=========

This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.

When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
        - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
                - start, suspend, or cancel initialization
        - Creates new open-context thread for each vdev
        - Thread iterates through all metaslabs in this vdev
        - Each metaslab:
                - select a metaslab
                - load the metaslab
                - mark the metaslab as being zeroed
                - walk all free ranges within that metaslab and translate
                  them to ranges on the leaf vdev
                - issue a "zeroing" I/O on the leaf vdev that corresponds to
                  a free range on the metaslab we're working on
                - continue until all free ranges for this metaslab have been
                  "zeroed"
                - reset/unmark the metaslab being zeroed
                - if more metaslabs exist, then repeat above tasks.
                - if no more metaslabs, then we're done.

        - progress for the initialization is stored on-disk in the vdev’s
          leaf zap object. The following information is stored:
                - the last offset that has been initialized
                - the state of the initialization process (i.e. active,
                  suspended, or canceled)
                - the start time for the initialization

        - progress is reported via the zpool status command and shows
          information for each of the videos that are initializing

Porting notes:

Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".

Pull-request: #7954 part 1/1
Nathaniel Wesley Filardo
Initial zhack scrub documentation

Pull-request: #6209 part 5/5
Nathaniel Wesley Filardo
new zhack scrub subcommand for offline scrubs

Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>

Pull-request: #6209 part 4/5
Nathaniel Wesley Filardo
zhack scrub: initial test harness wiring

Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>

Pull-request: #6209 part 3/5
Nathaniel Wesley Filardo
zhack: optionally kernel_init with FCREAT

Not at any existing use; no functional changes

Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>

Pull-request: #6209 part 2/5
Nathaniel Wesley Filardo
zhack: make zfeature_checks_disable conditional

Always disable on existing code paths; no functional changes.

Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>

Pull-request: #6209 part 1/5