summaryrefslogtreecommitdiffstats
path: root/crypto
Commit message (Collapse)AuthorAgeFilesLines
* dmaengine, async_tx: support alignment checksDan Williams2009-09-084-6/+9
| | | | | | | | Some engines have transfer size and address alignment restrictions. Add a per-operation alignment property to struct dma_device that the async routines and dmatest can use to check alignment capabilities. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* dmaengine, async_tx: add a "no channel switch" allocatorDan Williams2009-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | Channel switching is problematic for some dmaengine drivers as the architecture precludes separating the ->prep from ->submit. In these cases the driver can select ASYNC_TX_DISABLE_CHANNEL_SWITCH to modify the async_tx allocator to only return channels that support all of the required asynchronous operations. For example MD_RAID456=y selects support for asynchronous xor, xor validate, pq, pq validate, and memcpy. When ASYNC_TX_DISABLE_CHANNEL_SWITCH=y any channel with all these capabilities is marked DMA_ASYNC_TX allowing async_tx_find_channel() to quickly locate compatible channels with the guarantee that dependency chains will remain on one channel. When ASYNC_TX_DISABLE_CHANNEL_SWITCH=n async_tx_find_channel() may select channels that lead to operation chains that need to cross channel boundaries using the async_tx channel switch capability. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* dmaengine: add fence supportDan Williams2009-09-085-27/+50
| | | | | | | | | | | | | Some engines optimize operation by reading ahead in the descriptor chain such that descriptor2 may start execution before descriptor1 completes. If descriptor2 depends on the result from descriptor1 then a fence is required (on descriptor2) to disable this optimization. The async_tx api could implicitly identify dependencies via the 'depend_tx' parameter, but that would constrain cases where the dependency chain only specifies a completion order rather than a data dependency. So, provide an ASYNC_TX_FENCE to explicitly identify data dependencies. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Merge branch 'md-raid6-accel' into ioat3.2Dan Williams2009-09-089-201/+1247
|\ | | | | | | | | Conflicts: include/linux/dmaengine.h
| * async_tx: raid6 recovery self testDan Williams2009-08-292-0/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Port drivers/md/raid6test/test.c to use the async raid6 recovery routines. This is meant as a unit test for raid6 acceleration drivers. In addition to the 16-drive test case this implements tests for the 4-disk and 5-disk special cases (dma devices can not generically handle less than 2 sources), and adds a test for the D+Q case. Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: add support for asynchronous RAID6 recovery operationsDan Williams2009-08-293-0/+454
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | async_raid6_2data_recov() recovers two data disk failures async_raid6_datap_recov() recovers a data disk and the P disk These routines are a port of the synchronous versions found in drivers/md/raid6recov.c. The primary difference is breaking out the xor operations into separate calls to async_xor. Two helper routines are introduced to perform scalar multiplication where needed. async_sum_product() multiplies two sources by scalar coefficients and then sums (xor) the result. async_mult() simply multiplies a single source by a scalar. This implemention also includes, in contrast to the original synchronous-only code, special case handling for the 4-disk and 5-disk array cases. In these situations the default N-disk algorithm will present 0-source or 1-source operations to dma devices. To cover for dma devices where the minimum source count is 2 we implement 4-disk and 5-disk handling in the recovery code. [ Impact: asynchronous raid6 recovery routines for 2data and datap cases ] Cc: Yuri Tikhonov <yur@emcraft.com> Cc: Ilya Yanok <yanok@emcraft.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: add support for asynchronous GF multiplicationDan Williams2009-08-294-1/+394
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Based on an original patch by Yuri Tikhonov ] This adds support for doing asynchronous GF multiplication by adding two additional functions to the async_tx API: async_gen_syndrome() does simultaneous XOR and Galois field multiplication of sources. async_syndrome_val() validates the given source buffers against known P and Q values. When a request is made to run async_pq against more than the hardware maximum number of supported sources we need to reuse the previous generated P and Q values as sources into the next operation. Care must be taken to remove Q from P' and P from Q'. For example to perform a 5 source pq op with hardware that only supports 4 sources at a time the following approach is taken: p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08})) p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10})) p' = p + q + q + src4 = p + src4 q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4 Note: 4 is the minimum acceptable maxpq otherwise we punt to synchronous-software path. The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as sources (in the above manner) and fill the remaining slots up to maxpq with the new sources/coefficients. Note1: Some devices have native support for P+Q continuation and can skip this extra work. Devices with this capability can advertise it with dma_set_maxpq. It is up to each driver how to handle the DMA_PREP_CONTINUE flag. Note2: The api supports disabling the generation of P when generating Q, this is ignored by the synchronous path but is implemented by some dma devices to save unnecessary writes. In this case the continuation algorithm is simplified to only reuse Q as a source. Cc: H. Peter Anvin <hpa@zytor.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: remove walk of tx->parent chain in dma_wait_for_async_txDan Williams2009-08-291-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently walk the parent chain when waiting for a given tx to complete however this walk may race with the driver cleanup routine. The routines in async_raid6_recov.c may fall back to the synchronous path at any point so we need to be prepared to call async_tx_quiesce() (which calls dma_wait_for_async_tx). To remove the ->parent walk we guarantee that every time a dependency is attached ->issue_pending() is invoked, then we can simply poll the initial descriptor until completion. This also allows for a lighter weight 'issue pending' implementation as there is no longer a requirement to iterate through all the channels' ->issue_pending() routines as long as operations have been submitted in an ordered chain. async_tx_issue_pending() is added for this case. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: kill needless module_{init|exit}Dan Williams2009-08-293-40/+3
| | | | | | | | | | | | | | | | | | | | | | If module_init and module_exit are nops then neither need to be defined. [ Impact: pure cleanup ] Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: add sum check flagsDan Williams2009-08-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Replace the flat zero_sum_result with a collection of flags to contain the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed solomon syndrome) zero-sum result. Use the SUM_CHECK_ namespace instead of DMA_ since these flags will be used on non-dma-zero-sum enabled platforms. Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_xor: permit callers to pass in a 'dma/page scribble' regionDan Williams2009-06-031-31/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | async_xor() needs space to perform dma and page address conversions. In most cases the code can simply reuse the struct page * array because the size of the native pointer matches the size of a dma/page address. In order to support archs where sizeof(dma_addr_t) is larger than sizeof(struct page *), or to preserve the input parameters, we utilize a memory region passed in by the caller. Since the code is now prepared to handle the case where it cannot perform address conversions on the stack, we no longer need the !HIGHMEM64G dependency in drivers/dma/Kconfig. [ Impact: don't clobber input buffers for address conversions ] Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: structify submission arguments, add scribbleDan Williams2009-06-034-114/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare the api for the arrival of a new parameter, 'scribble'. This will allow callers to identify scratchpad memory for dma address or page address conversions. As this adds yet another parameter, take this opportunity to convert the common submission parameters (flags, dependency, callback, and callback argument) into an object that is passed by reference. Also, take this opportunity to fix up the kerneldoc and add notes about the relevant ASYNC_TX_* flags for each routine. [ Impact: moves api pass-by-value parameters to a pass-by-reference struct ] Signed-off-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: kill ASYNC_TX_DEP_ACK flagDan Williams2009-06-034-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In support of inter-channel chaining async_tx utilizes an ack flag to gate whether a dependent operation can be chained to another. While the flag is not set the chain can be considered open for appending. Setting the ack flag closes the chain and flags the descriptor for garbage collection. The ASYNC_TX_DEP_ACK flag essentially means "close the chain after adding this dependency". Since each operation can only have one child the api now implicitly sets the ack flag at dependency submission time. This removes an unnecessary management burden from clients of the api. [ Impact: clean up and enforce one dependency per operation ] Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx: rename zero_sum to valDan Williams2009-04-081-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | 'zero_sum' does not properly describe the operation of generating parity and checking that it validates against an existing buffer. Change the name of the operation to 'val' (for 'validate'). This is in anticipation of the p+q case where it is a requirement to identify the target parity buffers separately from the source buffers, because the target parity buffers will not have corresponding pq coefficients. Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * Merge branch 'dmaengine' into async-tx-raid6Dan Williams2009-04-082-8/+5
| |\
* | | crypto: hash - Fix handling of sg entry that crosses page boundaryHerbert Xu2009-05-311-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A quirk that we've always supported is having an sg entry that's bigger than a page, or more generally an sg entry that crosses page boundaries. Even though it would be better to explicitly have to sg entries for this, we need to support it for the existing users, in particular, IPsec. The new ahash sg walking code did try to handle this, but there was a bug where we didn't increment the page so kept on walking on the first page over an dover again. This patch fixes it. Tested-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2009-05-172-2/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: padlock - Revert aes-all alias to aes crypto: api - Fix algorithm module auto-loading crypto: eseqiv - Fix IV generation for sync algorithms crypto: ixp4xx - check firmware for crypto support
| * | | crypto: api - Fix algorithm module auto-loadingHerbert Xu2009-04-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit a760a6656e6f00bb0144a42a048cf0266646e22c (crypto: api - Fix module load deadlock with fallback algorithms) broke the auto-loading of algorithms that require fallbacks. The problem is that the fallback mask check is missing an and which cauess bits that should be considered to interfere with the result. Reported-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | crypto: eseqiv - Fix IV generation for sync algorithmsSteffen Klassert2009-04-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If crypto_ablkcipher_encrypt() returns synchronous, eseqiv_complete2() is called even if req->giv is already the pointer to the generated IV. The generated IV is overwritten with some random data in this case. This patch fixes this by calling eseqiv_complete2() just if the generated IV has to be copied to req->giv. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | Merge branch 'next' of ↵Linus Torvalds2009-04-032-8/+5
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup dw_dmac: add cyclic API to DW DMA driver dmaengine: Add privatecnt to revert DMA_PRIVATE property dmatest: add dma interrupts and callbacks dmatest: add xor test dmaengine: allow dma support for async_tx to be toggled async_tx: provide __async_inline for HAS_DMA=n archs dmaengine: kill some unused headers dmaengine: initialize tx_list in dma_async_tx_descriptor_init dma: i.MX31 IPU DMA robustness improvements dma: improve section assignment in i.MX31 IPU DMA driver dma: ipu_idmac driver cosmetic clean-up dmaengine: fail device registration if channel registration fails
| * | | dmaengine: allow dma support for async_tx to be toggledDan Williams2009-03-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a config option for blocking the allocation of dma channels to the async_tx api. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | | async_tx: provide __async_inline for HAS_DMA=n archsDan Williams2009-03-251-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow an async_tx routine to be compiled away on HAS_DMA=n arch it needs to be declared __always_inline otherwise the compiler may emit code and cause a link error. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2009-04-031-0/+3
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ixp4xx - Fix handling of chained sg buffers crypto: shash - Fix unaligned calculation with short length hwrng: timeriomem - Use phys address rather than virt
| * | | crypto: shash - Fix unaligned calculation with short lengthYehuda Sadeh2009-03-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the total length is shorter than the calculated number of unaligned bytes, the call to shash->update breaks. For example, calling crc32c on unaligned buffer with length of 1 can result in a system crash. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds2009-04-031-1/+1
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://neil.brown.name/md: (53 commits) md/raid5 revise rules for when to update metadata during reshape md/raid5: minor code cleanups in make_request. md: remove CONFIG_MD_RAID_RESHAPE config option. md/raid5: be more careful about write ordering when reshaping. md: don't display meaningless values in sysfs files resync_start and sync_speed md/raid5: allow layout and chunksize to be changed on active array. md/raid5: reshape using largest of old and new chunk size md/raid5: prepare for allowing reshape to change layout md/raid5: prepare for allowing reshape to change chunksize. md/raid5: clearly differentiate 'before' and 'after' stripes during reshape. Documentation/md.txt update md: allow number of drives in raid5 to be reduced md/raid5: change reshape-progress measurement to cope with reshaping backwards. md: add explicit method to signal the end of a reshape. md/raid5: enhance raid5_size to work correctly with negative delta_disks md/raid5: drop qd_idx from r6_state md/raid6: move raid6 data processing to raid6_pq.ko md: raid5 run(): Fix max_degraded for raid level 4. md: 'array_size' sysfs attribute md: centralize ->array_sectors modifications ...
| * | md: move lots of #include lines out of .h files and into .cNeilBrown2009-03-311-1/+1
| |/ | | | | | | | | | | | | | | | | | | This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2009-03-2620-144/+1130
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (29 commits) crypto: sha512-s390 - Add missing block size hwrng: timeriomem - Breaks an allyesconfig build on s390: nlattr: Fix build error with NET off crypto: testmgr - add zlib test crypto: zlib - New zlib crypto module, using pcomp crypto: testmgr - Add support for the pcomp interface crypto: compress - Add pcomp interface netlink: Move netlink attribute parsing support to lib crypto: Fix dead links hwrng: timeriomem - New driver crypto: chainiv - Use kcrypto_wq instead of keventd_wq crypto: cryptd - Per-CPU thread implementation based on kcrypto_wq crypto: api - Use dedicated workqueue for crypto subsystem crypto: testmgr - Test skciphers with no IVs crypto: aead - Avoid infinite loop when nivaead fails selftest crypto: skcipher - Avoid infinite loop when cipher fails selftest crypto: api - Fix crypto_alloc_tfm/create_create_tfm return convention crypto: api - crypto_alg_mod_lookup either tested or untested crypto: amcc - Add crypt4xx driver crypto: ansi_cprng - Add maintainer ...
| * crypto: testmgr - add zlib testGeert Uytterhoeven2009-03-044-1/+158
| | | | | | | | | | Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: zlib - New zlib crypto module, using pcompGeert Uytterhoeven2009-03-043-0/+388
| | | | | | | | | | | | Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: testmgr - Add support for the pcomp interfaceGeert Uytterhoeven2009-03-042-0/+193
| | | | | | | | | | Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: compress - Add pcomp interfaceGeert Uytterhoeven2009-03-043-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current "comp" crypto interface supports one-shot (de)compression only, i.e. the whole data buffer to be (de)compressed must be passed at once, and the whole (de)compressed data buffer will be received at once. In several use-cases (e.g. compressed file systems that store files in big compressed blocks), this workflow is not suitable. Furthermore, the "comp" type doesn't provide for the configuration of (de)compression parameters, and always allocates workspace memory for both compression and decompression, which may waste memory. To solve this, add a "pcomp" partial (de)compression interface that provides the following operations: - crypto_compress_{init,update,final}() for compression, - crypto_decompress_{init,update,final}() for decompression, - crypto_{,de}compress_setup(), to configure (de)compression parameters (incl. allocating workspace memory). The (de)compression methods take a struct comp_request, which was mimicked after the z_stream object in zlib, and contains buffer pointer and length pairs for input and output. The setup methods take an opaque parameter pointer and length pair. Parameters are supposed to be encoded using netlink attributes, whose meanings depend on the actual (name of the) (de)compression algorithm. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: Fix dead linksAdrian-Ken Rueegsegger2009-03-042-2/+2
| | | | | | | | | | Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: chainiv - Use kcrypto_wq instead of keventd_wqHuang Ying2009-02-192-1/+3
| | | | | | | | | | | | | | | | keventd_wq has potential starvation problem, so use dedicated kcrypto_wq instead. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cryptd - Per-CPU thread implementation based on kcrypto_wqHuang Ying2009-02-192-117/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original cryptd thread implementation has scalability issue, this patch solve the issue with a per-CPU thread implementation. struct cryptd_queue is defined to be a per-CPU queue, which holds one struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a struct crypto_queue holds all requests for the CPU, a struct work_struct is used to run all requests for the CPU. Testing based on dm-crypt on an Intel Core 2 E6400 (two cores) machine shows 19.2% performance gain. The testing script is as follow: -------------------- script begin --------------------------- #!/bin/sh dmc_create() { # Create a crypt device using dmsetup dmsetup create $2 --table "0 `blockdev --getsize $1` crypt cbc(aes-asm)?cryptd?plain:plain babebabebabebabebabebabebabebabe 0 $1 0" } dmsetup remove crypt0 dmsetup remove crypt1 dd if=/dev/zero of=/dev/ram0 bs=1M count=4 >& /dev/null dd if=/dev/zero of=/dev/ram1 bs=1M count=4 >& /dev/null dmc_create /dev/ram0 crypt0 dmc_create /dev/ram1 crypt1 cat >tr.sh <<EOF #!/bin/sh for n in \$(seq 10); do dd if=/dev/dm-0 of=/dev/null >& /dev/null & dd if=/dev/dm-1 of=/dev/null >& /dev/null & done wait EOF for n in $(seq 10); do /usr/bin/time sh tr.sh done rm tr.sh -------------------- script end --------------------------- The separator of dm-crypt parameter is changed from "-" to "?", because "-" is used in some cipher driver name too, and cryptds need to specify cipher driver name instead of cipher name. The test result on an Intel Core2 E6400 (two cores) is as follow: without patch: -----------------wo begin -------------------------- 0.04user 0.38system 0:00.39elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6566minor)pagefaults 0swaps 0.07user 0.35system 0:00.35elapsed 121%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6567minor)pagefaults 0swaps 0.06user 0.34system 0:00.30elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6562minor)pagefaults 0swaps 0.05user 0.37system 0:00.36elapsed 119%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6607minor)pagefaults 0swaps 0.06user 0.36system 0:00.35elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6562minor)pagefaults 0swaps 0.05user 0.37system 0:00.31elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6594minor)pagefaults 0swaps 0.04user 0.34system 0:00.30elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6597minor)pagefaults 0swaps 0.06user 0.32system 0:00.31elapsed 125%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6571minor)pagefaults 0swaps 0.06user 0.34system 0:00.31elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6581minor)pagefaults 0swaps 0.05user 0.38system 0:00.31elapsed 138%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6600minor)pagefaults 0swaps -----------------wo end -------------------------- with patch: ------------------w begin -------------------------- 0.02user 0.31system 0:00.24elapsed 141%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6554minor)pagefaults 0swaps 0.05user 0.34system 0:00.31elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6606minor)pagefaults 0swaps 0.07user 0.33system 0:00.26elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6559minor)pagefaults 0swaps 0.07user 0.32system 0:00.26elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6562minor)pagefaults 0swaps 0.05user 0.34system 0:00.26elapsed 150%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6603minor)pagefaults 0swaps 0.03user 0.36system 0:00.31elapsed 124%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6562minor)pagefaults 0swaps 0.04user 0.35system 0:00.26elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6586minor)pagefaults 0swaps 0.03user 0.37system 0:00.27elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6562minor)pagefaults 0swaps 0.04user 0.36system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6594minor)pagefaults 0swaps 0.04user 0.35system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6557minor)pagefaults 0swaps ------------------w end -------------------------- The middle value of elapsed time is: wo cryptwq: 0.31 w cryptwq: 0.26 The performance gain is about (0.31-0.26)/0.26 = 0.192. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: api - Use dedicated workqueue for crypto subsystemHuang Ying2009-02-193-0/+43
| | | | | | | | | | | | | | | | | | | | | | Use dedicated workqueue for crypto subsystem A dedicated workqueue named kcrypto_wq is created to be used by crypto subsystem. The system shared keventd_wq is not suitable for encryption/decryption, because of potential starvation problem. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: testmgr - Test skciphers with no IVsHerbert Xu2009-02-181-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it is an skcipher with no IV escapes testing altogether because we only test givcipher objects. This patch fixes the bypass logic to test these algorithms. Conversely, we're currently testing nivaead algorithms with IVs, which would have deadlocked had it not been for the fact that no nivaead algorithms have any test vectors. This patch also fixes that case. Both fixes are ugly as hell, but this ugliness should hopefully disappear once we move them into the per-type code (i.e., the AEAD test would live in aead.c and the skcipher stuff in ablkcipher.c). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aead - Avoid infinite loop when nivaead fails selftestHerbert Xu2009-02-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an aead constructed through crypto_nivaead_default fails its selftest, we'll loop forever trying to construct new aead objects but failing because it already exists. The crux of the issue is that once an aead fails the selftest, we'll ignore it on the next run through crypto_aead_lookup and attempt to construct a new aead. We should instead return an error to the caller if we find an an that has failed the test. This bug hasn't manifested itself yet because we don't have any test vectors for the existing nivaead algorithms. They're tested through the underlying algorithms only. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: skcipher - Avoid infinite loop when cipher fails selftestHerbert Xu2009-02-182-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an skcipher constructed through crypto_givcipher_default fails its selftest, we'll loop forever trying to construct new skcipher objects but failing because it already exists. The crux of the issue is that once a givcipher fails the selftest, we'll ignore it on the next run through crypto_skcipher_lookup and attempt to construct a new givcipher. We should instead return an error to the caller if we find a givcipher that has failed the test. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: api - Fix crypto_alloc_tfm/create_create_tfm return conventionHerbert Xu2009-02-183-23/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based on a report and patch by Geert Uytterhoeven. The functions crypto_alloc_tfm and create_create_tfm return a pointer that needs to be adjusted by the caller when successful and otherwise an error value. This means that the caller has to check for the error and only perform the adjustment if the pointer returned is valid. Since all callers want to make the adjustment and we know how to adjust it ourselves, it's much easier to just return adjusted pointer directly. The only caveat is that we have to return a void * instead of struct crypto_tfm *. However, this isn't that bad because both of these functions are for internal use only (by types code like shash.c, not even algorithms code). This patch also moves crypto_alloc_tfm into crypto/internal.h (crypto_create_tfm is already there) to reflect this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: api - crypto_alg_mod_lookup either tested or untestedHerbert Xu2009-02-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | As it stands crypto_alg_mod_lookup will search either tested or untested algorithms, but never both at the same time. However, we need exactly that when constructing givcipher and aead so this patch adds support for that by setting the tested bit in type but clearing it in mask. This combination is currently unused. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ansi_cprng - Panic on CPRNG test failure when in FIPS mode Neil Horman2009-02-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | FIPS 140-2 specifies that all access to various cryptographic modules be prevented in the event that any of the provided self tests fail on the various implemented algorithms. We already panic when any of the test in testmgr.c fail when we are operating in fips mode. The continuous test in the cprng here was missed when that was implmented. This code simply checks for the fips_enabled flag if the test fails, and warns us via syslog or panics the box accordingly. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ansi_cprng - Force reset on allocationNeil Horman2009-02-181-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Pseudo RNGs provide predictable outputs based on input parateters {key, V, DT}, the idea behind them is that only the user should know what the inputs are. While its nice to have default known values for testing purposes, it seems dangerous to allow the use of those default values without some sort of safety measure in place, lest an attacker easily guess the output of the cprng. This patch forces the NEED_RESET flag on when allocating a cprng context, so that any user is forced to reseed it before use. The defaults can still be used for testing, but this will prevent their inadvertent use, and be more secure. Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platformHuang Ying2009-02-181-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES may be used in soft_irq context, but MMX/SSE context can not be touched safely in soft_irq context. So in_interrupt() is checked, if in IRQ or soft_irq context, the general x86_64 implementation are used instead. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cryptd - Add support to access underlying blkcipherHuang Ying2009-02-181-0/+35
| | | | | | | | | | | | | | | | | | | | cryptd_alloc_ablkcipher() will allocate a cryptd-ed ablkcipher for specified algorithm name. The new allocated one is guaranteed to be cryptd-ed ablkcipher, so the blkcipher underlying can be gotten via cryptd_ablkcipher_child(). Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: shash - Remove superfluous check in init_tfmHerbert Xu2009-02-181-2/+0
| | | | | | | | | | | | | | | | We're currently checking the frontend type in init_tfm. This is completely pointless because the fact that we're called at all means that the frontend is ours so the type must match as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: api - Fix module load deadlock with fallback algorithmsHerbert Xu2009-02-261-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the mandatory algorithm testing at registration, we have now created a deadlock with algorithms requiring fallbacks. This can happen if the module containing the algorithm requiring fallback is loaded first, without the fallback module being loaded first. The system will then try to test the new algorithm, find that it needs to load a fallback, and then try to load that. As both algorithms share the same module alias, it can attempt to load the original algorithm again and block indefinitely. As algorithms requiring fallbacks are a special case, we can fix this by giving them a different module alias than the rest. Then it's just a matter of using the right aliases according to what algorithms we're trying to find. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: ahash - Fix digest size in /proc/cryptoLee Nipper2009-02-191-1/+1
|/ | | | | | | crypto_ahash_show changed to use cra_ahash for digestsize reference. Signed-off-by: Lee Nipper <lee.nipper@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: lrw - Fix big endian supportHerbert Xu2009-02-171-1/+7
| | | | | | | | | | | | | | | It turns out that LRW has never worked properly on big endian. This was never discussed because nobody actually used it that way. In fact, it was only discovered when Geert Uytterhoeven loaded it through tcrypt which failed the test on it. The fix is straightforward, on big endian the to find the nth bit we should be grouping them by words instead of bytes. So setbit128_bbe should xor with 128 - BITS_PER_LONG instead of 128 - BITS_PER_BYTE == 0x78. Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: scatterwalk - Avoid flush_dcache_page on slab pagesHerbert Xu2009-02-091-1/+2
| | | | | | | | | | | | It's illegal to call flush_dcache_page on slab pages on a number of architectures. So this patch avoids doing so if PageSlab is true. In future we can move the flush_dcache_page call to those page cache users that actually need it. Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: api - Fix zeroing on freeHerbert Xu2009-02-051-10/+10
| | | | | | | | | Geert Uytterhoeven pointed out that we're not zeroing all the memory when freeing a transform. This patch fixes it by calling ksize to ensure that we zero everything in sight. Reported-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>